How to automatically split segment after dot and BR with custom XML filetype

Hi!

I created custom XML filetype and have 3 issues with segmentation.

1. After dot - see segment 6. After the word офис there is a dot and two breaks. Then the new sentence stars. This should be a new segment.

2. After semicolon - see again segment 6. After word адрес there is semilocon and two breaks. This should be a new segment.

3. Despite the added abbreviation to the Translation Memory Settings, the text is split after the abbreviation - see segment 6, 7. Words бул. and гр. are abbreviations.

Example:

Screenshot of Trados Studio showing incorrect segmentation in segment 6 where text does not split after a dot following the word 'ofis'.



Generated Image Alt-Text
[edited by: Trados AI at 8:55 PM (GMT 0) on 28 Feb 2024]
emoji
  • You could set the segmentation hint for br tags to "Exclude" and this may be somewhat better (apologies for the Bulgarian... a sample file would have been useful ;-)):

    Screenshot of Trados Studio showing two columns with segmented text. The left column has text with multiple line breaks, while the right column shows the same text with fewer line breaks.

    I used a file like this:

    1
    2
    3
    4
    5
    6
    <?xml version="1.0" encoding="UTF-8"?>
    <rootelement>
    <to>офис.<br class="xlf-newline" /><br class="xlf-newline" />отправить оплату за товар в офис по адрес:<br class="xlf-newline" /><br class="xlf-newline" />бул.</to>
    <to>6 септември, № 152, партер, офис - 8<br class="xlf-newline" /><br class="xlf-newline" />ап.</to>
    <to>Пловдивски 4000, България</to>
    </rootelement>
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    But even this isn't really what you want in a perfect world.  But then I don't know what your XML looks like.  I can get to this:

    Screenshot of an XML file opened in Trados Studio with segmented text. The text includes an address with line breaks between segments.

    If the XML is like this I can use an exception to the full stop rule and achieve this:

    1
    2
    3
    4
    5
    <?xml version="1.0" encoding="UTF-8"?>
    <rootelement>
    <to>офис.<br class="xlf-newline" /><br class="xlf-newline" />отправить оплату за товар в офис по адрес:<br class="xlf-newline" /><br class="xlf-newline" />бул.</to>
    <to>6 септември, № 152, партер, офис - 8<br class="xlf-newline" /><br class="xlf-newline" />ап. Пловдивски 4000, България</to>
    </rootelement>
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    If none of this helps perhaps you can share a small sample like this so we can see what you're working with?

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub



    Generated Image Alt-Text
    [edited by: Trados AI at 8:55 PM (GMT 0) on 28 Feb 2024]
  • Here is my xml file, which is an xliff file from WPML.

    job-43.zip

    I created a custom XML filetype filter as per your instructions in youtube. No special settings, only Note is set to untranslatable:

    Screenshot of Trados Studio custom XML filetype filter settings showing various rules such as 'target', 'source', 'trans-unit' set to 'Translatable' and 'note' set to 'Not translatable'.

    So how to "set the segmentation hint for br tags to "Exclude"? Where should I do this - in the Translation memory settings?

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 8:55 PM (GMT 0) on 28 Feb 2024]
  • Another issue is that the Title and href attributes of the A element are not translatable:

    Screenshot of Trados Studio showing non-translatable title and href attributes within a hyperlink element in the source code view.

    Whereas I must translate them... Do you have any suggestion how to correct the filter to be able to translate the title and the href within the link?

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 8:55 PM (GMT 0) on 28 Feb 2024]
  • ok... I think you need to wind back a little first because I'm not sure you have grasped the task here.  You have created your filetype translating everything in it apart from the note.  This is clearly incorrect.  An XLIFF is a bilingual file so you should only be translating the target element, maybe something like this:

    Trados Studio screenshot showing a list of rules for translating file types with 'target' set as 'Always translatable' and 'note' as 'Not translatable'.

    Note I disabled everything, added a rule to make sure nothing was translated and set the target as always translatable.

    Next I set the embedded content processor to handle the CDATA sections using the HTML processor like this:

    Screenshot of Trados Studio's 'Embedded content' settings with 'Html Embedded Content 5.2.0.0' processor selected for 'CDATA sections'.

    This alone gives me this preview:

    Preview window in Trados Studio displaying a bilingual file with source and target text, showing neat alignment and break tags inline.

    This looks fairly neat and the only inline tags are the break tags.  If these tags should be excluded I can do this to get this:

    Trados Studio preview window with source and target text, break tags excluded resulting in a cleaner view without inline tags.

    Maybe better... you'll be a better judge of this than me.  But to do this you need to remember that these tags are being handled by the embedded content processor, in this case the HTML 5.2.0.0 processor.  So you need to go to this processor:

    Trados Studio options menu with a red arrow pointing to 'HTML 5.2.0.0' processor under 'File Types' settings.

    Select the br parser rule and change it to reflect the change you want... in this to set it as structure so that you break at every br tag:

    Edit Rule dialog in Trados Studio with 'Tag type' set to 'Structure' for the 'br' parser rule to handle break tags.

    So in this case I didn't bother setting the segmentation hint, I just made the rule structure which achieves the same thing.

    See if this helps you?

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 8:55 PM (GMT 0) on 28 Feb 2024]
  • Thank you, Paul. This works as expected!
    I also managed to edit the parser so that I am able to translate the URLs of the links Slight smile

    I marked href for translation and prepared the file again.

    Screenshot of Trados Studio Edit Rule window showing empty Name and Condition fields with Attributes section displaying Name as 'href', Translate as 'True', and Target as 'False'.

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 8:56 PM (GMT 0) on 28 Feb 2024]