Error during analysis: "The end tag </1> does not have a matching start tag."

This error also occurs on Task "Create project tm".

Source file type: xml (Schema ST4).

I used the Sooper_Dooper_TMX_X_Attribute_Fixer as recommended in this Knowledge Base article. I then created a new TM based on the old one and imported the fixed tmx. This did not solve the issue though.

I thought maybe the problem is with the already created project tm, so I removed this manually from the project, and tried running the task again - no deal.

I did a quality check on the target sdlxliff files - no problem.

I then wrote my own tmx tag checker to find the culprit like this:

 

var allSegsWithTags = tmx.SelectNodes("//seg[ept]");
foreach (XmlNode tagSeg in allSegsWithTags)
{
    var endTags = tagSeg.SelectNodes("ept");
    foreach (XmlNode ept in endTags)
    {
        var tagNum = ept.Attributes.GetNamedItem("i").Value;
        var bpt = tagSeg.SelectSingleNode($"bpt[@i='{tagNum}']");
        if (bpt == null) //end tag without starting tag
        {
            using (var sw = new StreamWriter(logFile, true))
            {
                sw.WriteLine(tagSeg.InnerXml);
            }
        }
    }
}

 

Code ran without errors - and without finding a single erroneous segment!

tl;dr: I cannot get rid of this error. Any ideas what causes it?

 

Best,

Andreas

  • Did you try to search for the text "</1>" in the XML?
    It could be that it contains such kind of (unescaped) placeholder, which the parser then considers to be a tag...
    Also, shouldn't the error message say the place (line, column) where the error occurred?
  • I searched for </1> in the tmx but not in the XML. It’s from a well-known CMS that provides its own Studio settings.

    The error shows no line details only a stack trace showing the error occurs on querying the tm. All exactly like described in the linked KB article...
  • Well, then I'm out of ideas... IMO only an SDL person with detailed knowledge of the actual Studio source can give satisfactory answer to the "what causes it?" question.
    But these people do not come to the forum...
  • Hi ,

    Did you try creating a new project to test this with your files?

    Is it possible to have a copy of the problem project and the relevant resources to reproduce? If so you can send them to pfilkin@sdl.com

    Regards

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Hi Paul (pfilkin) ,

    I tried with re-creating the project from scratch (no template) - same outcome.

    No error upon conversion to translatable. The error consistently pops up on Analyze/Perfect Match/Populate Project TM Task bundle, and is of type Sdl.LanguagePlatform.TranslationMemoryTools.InvalidSegmentContentException. Absolutely no further details provided in the error details.

    So it is clearly caused by the TM. To verify this, I tried once with a new, blank one instead of the genuine one, ran the task - no error.

    Since the TM is rather big, I zipped it all, uploaded it to our transfer and sent you a link. Really great of you to look into this. If you have any TM checking tool that would detect such an error, it might even suffice to just run it over the TM. The X_Attribute_Fixer tool from the Knowledge Base article unfortunately does not seem to find the troubling segment.

    Thanks a lot!

    Andreas

  • Thank you ,

    I have now created a support ticket for this. Really curious what this is about.

    Best,
    Andreas
  • Update: Problem solved.

    Support could not fully identify the cause. They came across some issue involving line breaks between tag pairs but in the end that proved to be not the root cause here. There does not seem to be anything truly wrong with the TM which makes it very difficult to find a root cause.

    To track down the error, I had to use divide and conquer: split the tmx into two, import each half into an empty TM, analyze with this, check which half produces the error, split that half into parts...

    Rinse and repeat for a couple of hours...

    It turned out to be caused by two segments. If I remove them the error is gone. To double-check this, I created an empty TM and imported a TMX that contained only one "normal" segment, one with a line break and tag, the two problematic ones, and analyzed with that: for sure the error did pop up.

    Behold the (anonymized) beauty of these segments:

    <tuv xml:lang="de-DE">
        <seg><ph x="1" type="1" />bla <bpt i="2" type="2" x="1" />yadda<ept i="2" /></seg>
    </tuv>
    <tuv xml:lang="en-US">
        <seg><bpt i="1" type="2" x="1" />          <ph x="1" type="4" />foo<ept i="1" /> bar</seg>
    </tuv>

    and

    <tuv xml:lang="de-DE">
        <seg><ph x="1" type="1" />yaddayadda<bpt i="2" type="2" x="1" />foobar<ept i="2" /></seg>
    </tuv>
    <tuv xml:lang="en-US">
        <seg><bpt i="1" type="2" x="1" />wombat<ept i="1" /> <ph x="1" type="1" />wombat</seg>
    </tuv>

     

    Yes, there are several superfluous spaces in the first en-US segment and yes, there is a placeholder tag in between bpt and ept - but none of these are in the second segment, yet that also triggers the error. No invisible control characters contained either.

    P.S: the bla, yadda, foobar and wombat have a combined wordcount (source+target) of 13. So, no biggies either.

    Colour me baffled.