Bug importing xliff 1.2

Hi, I downloaded a trial version of SDL Trados Studio 2019 and I found a bug preventing me to import an xliff generated with another tool.

 

I think the issue relates to an opening tag closing in a different segment of the same translation unit because when I generate the file without segmenting the source, the problem does not occur.

 

Here is the error dump generated by Trados:

 

<SDLErrorDetails time="01/04/2019 12:25:43">
  <ErrorMessage>Index out of range (3). It must be a number from 0 up to the number of items in the collection (2).
Nombre del parámetro: index</ErrorMessage>
  <Exception>
    <Type>System.ArgumentOutOfRangeException, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089</Type>
    <ParamName>index</ParamName>
    <HelpLink />
    <Source>Sdl.FileTypeSupport.Framework.Implementation</Source>
    <HResult>-2146233086</HResult>
    <StackTrace><![CDATA[   en Sdl.FileTypeSupport.Framework.Bilingual.MarkupDataContainer.CheckIndexValue(Int32 index)
   en Sdl.FileTypeSupport.Framework.Bilingual.MarkupDataContainer.get_Item(Int32 index)
   en Sdl.FileTypeSupport.Framework.Bilingual.AbstractMarkerWithContent.get_Item(Int32 index)
   en Sdl.FileTypeSupport.Filters.Xliff.Infrastructure.ParagraphUnitBuilder.GetPlaceholderAt(Int32 position)
   en Sdl.FileTypeSupport.Filters.Xliff.Infrastructure.ParagraphUnitBuilder.ConvertPlaceholderToPairTagFrom(Int32 position)
   en Sdl.FileTypeSupport.Filters.Xliff.Infrastructure.Consumers.Parser.PairedConsumers.EndPairedPlaceholderElementConsumer.ConvertPlaceholdersToPairTagIfIdsMatch(String messageName)
   en Sdl.FileTypeSupport.Filters.Xliff.Infrastructure.Consumers.Parser.PairedConsumers.EndPairedPlaceholderElementConsumer.Consume(XmlNodeParsed message)
   en lambda_method(Closure , IMessage )
   en Sdl.FileTypeSupport.Filters.Xliff.Infrastructure.InMemoryBus.Publish(IMessage message)
   en Sdl.FileTypeSupport.Filters.Xliff.ParserImpl.Publish(XmlNodeParsed message)
   en Sdl.FileTypeSupport.Filters.Xliff.Infrastructure.XmlParser.Parse(XmlTextReader reader)
   en Sdl.FileTypeSupport.Filters.Xliff.ParserImpl.Parse(String xliffPath)
   en Sdl.FileTypeSupport.Filters.Xliff.Parser.ParseNext()
   en Sdl.FileTypeSupport.Framework.Integration.FileExtractor.ParseNext()
   en Sdl.FileTypeSupport.Framework.Integration.MultiFileConverter.ParseNext()
   en Sdl.FileTypeSupport.Framework.Integration.MultiFileConverter.Parse()
   en Sdl.ProjectApi.AutomaticTasks.Conversion.ConversionTask.ProcessFile(IExecutingTaskFile executingTaskFile)
   en Sdl.ProjectApi.AutomaticTasks.AbstractFileLevelAutomaticTaskImplementation.Execute()]]></StackTrace>
  </Exception>
  <Environment>
    <ProductName>SDL Trados Studio</ProductName>
    <ProductVersion>15.0.0.0</ProductVersion>
    <EntryAssemblyFileVersion>15.0.0.29074</EntryAssemblyFileVersion>
    <OperatingSystem>Microsoft Windows 10 Pro</OperatingSystem>
    <ServicePack>NULL</ServicePack>
    <OperatingSystemLanguage>3082</OperatingSystemLanguage>
    <CodePage>1252</CodePage>
    <LoggedOnUser>DESKTOP-DIBTBC1\Miguel Alejandrez</LoggedOnUser>
    <DotNetFrameWork>4.0.30319.42000</DotNetFrameWork>
    <ComputerName>DESKTOP-DIBTBC1</ComputerName>
    <ConnectedToNetwork>True</ConnectedToNetwork>
    <PhysicalMemory>16659932 MB</PhysicalMemory>
  </Environment>
</SDLErrorDetails>

Thank you very much.

Parents
  • Hello,

    The issue here is caused by the segmentation markers.

    To be more specific, the <mrk> segmentation markers break the nesting of the <bx> / <ex> pairs.

    While the file is valid from an XLIFF perspective, it makes no sense to have these tag pairs in different segments. Also, Studio's segmentation engine is not able to "clone" ending (ex) and then starting (bx) elements for each segment (in order to have an XML valid structure) because the elements and segmentation are already determined by the presence of the mrk markers.

    I can raise this with engineering, but realistically speaking I do not think they will treat this as a defect.

    Vlad Bondor | Senior Technical Support Manager | RWS

  • Hi Vlad,

    Thank you for your response.

    As XLIFF specification states in section 2.4 "Inline elements" subsection C:

    "Use <bpt> or <bx/> for opening each code that has a corresponding closing code in the content. Use <bpt> to mask the code and <bx/> to replace the code. The <bpt> and <bx/> elements should be followed by a matching <ept> or <ex/> element, respectively, within the same translation unit." (http://docs.oasis-open.org/xliff/v1.2/os/xliff-core.html)

    Segments do not constitute a boundary for <bx> / <ex> pairs, translation units do.

    From an xml perspective <bx> / <ex> tags are self-closing tags, so xml integrity is not an issue here.

    Cloning ending (ex) and then starting (bx) would also not be a failproof solution because those tags could be moved away from the end/beginning of the segment defeating its purpose (I considered this solution when building the xliff, but discarded it for this reason).

     

    The issue seems more complicated than it seemed a first glance, if Studio internally is managing tag consistency on segment level instead of translation unit level a proper solution is difficult to find (maybe consider them internally as standalone tags and perform a validation afterwards?).

    Anyways, I would be grateful if you could raise the isssue to engineering, as other tools I've checked compatibility with doesn't seem to have this problem, but most of our partners use Trados SDL and we would like to provide them xliffs that they can use in the painless way possible.

     

    Thank you again,

     

    Miguel

  • Hi Miguel,

    Yes, we're currently handling <bpt>/<bx> & <ept>/<ex> as tag pairs; hence the segmentation problem. That's why I was mentioning an XML structure here: if you oversimplify things, the first <mrk> element is closing before you have the chance to close the two <bx>'s.

    I believe that having this addressed might involve some serious dev work (as we'll have to re-think the segmentation engine) but I logged it with engineering anyway (I see where you’re coming from): CRQ-14138.

    Thank you.

    Vlad Bondor | Senior Technical Support Manager | RWS

  • Hi Vlad,

    Thank you for your help.
    It seems that is an issue that can't be expected to be solved anytime soon.

    The only temporary solution that I can come up with to workaround the issue is to generate the xliff without source segmentation. But as you can see is a serious drawback (uncomfortable to work with, messes with transtation memories, etc...).

    In order to mitigate it, there's any options in SDL Trados Studio that allows to achieve the following workflow?
    - Import the unsegmented xliff (a single segment for each translation-unit to avoid cross-segment tagging)
    - Re - Segment the text inside Studio (in order to work with user friendly segments)
    - Export an unsegmented translated xliff (This would allow automatic mapping of the target translation units to the source ones)

    Many thanks,

    Miguel
  • Hi Miguel,

     

    1. Segmentation

    There are two other segmentation “levels” that Studio applies, so even if you have a “big block” of text inside a <trans-unit> element, not all of it will be thrown in one segment:

    • TM Segmentation – The translation memory you use will also enforce segmentation. As a default, segmentation occurs at a sentence level (create a new segment after a full stop, question mark, etc …) and when encountering colons. However, this can be customized if needed in the TM settings
    • Embedded Content – You can define non-translatable text strings (such as HTML content or other data) in the Embedded Content sub-menu of the XLIFF filter. These strings will be transformed in to tags which can also be set to segment text when being parsed.

     

    2. <mrk>

    Go to “File \ Options \ File Types \ XLIFF \ Settings” and enable the “Do not store segmentation information in the translated file” option. Pretty self-explanatory: enabling this option means that no <mrk> elements will be added by Studio in the translated file.

     

    Thanks

    Vlad Bondor | Senior Technical Support Manager | RWS

Reply
  • Hi Miguel,

     

    1. Segmentation

    There are two other segmentation “levels” that Studio applies, so even if you have a “big block” of text inside a <trans-unit> element, not all of it will be thrown in one segment:

    • TM Segmentation – The translation memory you use will also enforce segmentation. As a default, segmentation occurs at a sentence level (create a new segment after a full stop, question mark, etc …) and when encountering colons. However, this can be customized if needed in the TM settings
    • Embedded Content – You can define non-translatable text strings (such as HTML content or other data) in the Embedded Content sub-menu of the XLIFF filter. These strings will be transformed in to tags which can also be set to segment text when being parsed.

     

    2. <mrk>

    Go to “File \ Options \ File Types \ XLIFF \ Settings” and enable the “Do not store segmentation information in the translated file” option. Pretty self-explanatory: enabling this option means that no <mrk> elements will be added by Studio in the translated file.

     

    Thanks

    Vlad Bondor | Senior Technical Support Manager | RWS

Children