Improve segmentation with embedded content tags in bilingual xlsx

Hi everybody,

I have created a project with a bilingual Excel.

Despite setting all tags as "exclude" in the advanced view in the Filetype settings, I get this:

Screenshot of Trados Studio showing code with tags such as 'endif', 'assign gift card line items', and 'for line item in gift card line items'. Text in French is visible at the bottom.

Is there a way to set the file so that I get a new segment at each soft line break?

Thanks



Generated Image Alt-Text
[edited by: Trados AI at 1:07 PM (GMT 0) on 29 Feb 2024]
emoji
  •  

    Is there a way to set the file so that I get a new segment at each soft line break?

    Yes.  Use a regex rule in your embedded content settings like this:

    \n

    Make sure it's non-translatable and set to exclude.  But note you will have a problem related to this:

    Despite setting all tags as "exclude" in the advanced view in the Filetype settings, I get this:

    The Bilingual Excel filetype will not segment within a cell at all when you have content in the source and target segments.  Unfortunately this is a deliberate restriction because we can segment on the source, but not on the target because the content could be different in the target and we have no idea how it should be segmented.

    So whilst I can do this:

    A screenshot of the Bilingual Excel Filetype preview in Trados Studio. The window shows a spreadsheet with five rows, each containing one line from a poem. The lines are correctly segmented, indicating that the text has been properly processed for translation or editing.

    I cannot do this:

    A screenshot from Trados Studio's preview of a bilingual Excel file, titled 'segment - Copy.xlsx', showing an issue with segmentation. The preview displays a single cell containing an English poem and its French translation, but the text is not properly segmented. The lines of the poem and their corresponding translations are placed continuously in one cell, with visible segmentation markers (purple squares) inserted at incorrect positions.

    Your best bet is to use the Multilingual Excel filetype from the appstore and use that to segment on your tags.  That should work... but segmenting on the soft return won't work with the Multilingual Excel.

    Would be good to see the same behaviour to be able to segment on a soft return in the Multilingual Excel... maybe a future enhancement.

    fyi  

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Hi Paul,

    I tried using the Multilingual Filetype (I guess if this should be reposted as a new request) and I'm afraid I spot a bug.

    I would really love to use the HTML embedded content processor, since the file is made up mainly of HTML strings, but I need to use it in all paragraphs and not in CDATA sections only. If I try to do so, Studio fails to convert the file to translatble format.

    Here are the settings:

    Trados Studio project settings showing Multilingual XML file type with Embedded Content processor set to 'HTML Embedded Content 5.20.0.0' and processing option selected for 'All paragraphs'.

    Here is the error (at "Convert to translatable format" stage):

    Error message in Trados Studio stating 'Unable to open the input file for translation. Invalid syntax found at line: 1, column: 11527.' with options for Knowledge Base and Community.

    /cfs-file/__key/communityserver-discussions-components-files/90/sdlerror_2D00_202412_2D00_12h51m10s.sdlerror.xml

    If I apply embedded content processing to CDATA sections only, it works, yet the result is not good:

    HTML code snippet with placeholders like 'email_title' and 'shop.email_accent_color' indicating areas where translation might be applied.

    The placeholders are correctly located, but the HTML tags aren't.

    I assume I could manually insert all the HTML tags as regex rules in the Plain text embedded content processor (apparently it works if applied to all paragraphs), but I would really love not to!

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 1:08 PM (GMT 0) on 29 Feb 2024]
  • Hello,

    Based on your query, it seems you want to improve segmentation in Trados Studio when dealing with embedded content tags in a bilingual Excel file. Here's how you can achieve this:

    Step 1: Open your translation memory in Trados Studio.

    Step 2: In the Translation Memories View, select your translation memory and navigate to File > Settings.

    Step 3: In the Translation Memory Settings window, select Language Resources.

    Step 4: In the right pane, select Segmentation Rules and click on the Edit button.

    Step 5: In the Segmentation Rules window, select the option Sentence based segmentation and click Add.

    Step 6: In the Add Segmentation Rule window, click on the Advanced View button.

    Step 7: Give your rule a name, for example, "Soft Line Break" and add the following:

      - Before break: .[\n]+

      - After break: .

    Step 8: Click OK to save your rule and close the Translation Memory Settings window.

    After following these steps, when you re-open your document with the modified translation memory, the cell should be distributed over multiple lines, taking the soft line breaks as a segment delimiter. This should improve your segmentation and make your translation process smoother.

    I hope this helps! If you have any other questions, feel free to ask.

    Best regards,

    TradosAI

    emoji