Segmentation rules

Hi Community!

I'm trying to prepare a file for translation and I don't know how to segment it. It contains product descriptions that if segmented properly are quite repetitive. I'm using Studio 2021 Pro, the file type is .xlsx (Microsoft Excel 2007-2019, SpreadsheetML v. 1).

The file is big, but I selected these three cells as an example:

Screenshot of an Excel spreadsheet cell containing product description with HTML tags for material, design, and features in Swedish.

I created a new empty TM using default settings (I didn't change segmentation rules) and in Project Settings I ticked "Enable embedded content processing" under File types - Microsoft Excel 2007-2019 - Embedded content. I didn't make any other changes.

And here are the same cells in Studio:

Screenshot of Trados Studio interface showing the same product description from Excel with segments separated by tags, highlighted in purple.

Is it possible to split these segments between the tags, so that each segment has its own opening and closing tag? For example, if I can have this as one segment:

Close-up screenshot of a segment in Trados Studio with HTML tags highlighted in purple, indicating the need for segmentation.

Or is it also possible to segment the file in a way that the segments only contain text and not the tags? Like this:

Screenshot of a segment in Trados Studio without HTML tags, showing only the text 'Natventilation pa sidorna och under armarna' in Swedish.

The target file needs to contain the same tags/code that are in the source file, so I can't just remove them from the Excel file.

Another option that I could think of is to untick "Embedded content processing" and try to write a regex to separate the code from the text. I don't know if that is feasible nor how to write that regex, though.

I'm not sure if I think in the right direction(s), so any suggestion about how to prepare this file is more than welcome. :)

Thanks!

Milena



Generated Image Alt-Text
[edited by: Trados AI at 12:10 AM (GMT 0) on 29 Feb 2024]
emoji
Parents
  • Another option that I could think of is to untick "Embedded content processing" and try to write a regex to separate the code from the text. I don't know if that is feasible nor how to write that regex, though.

    I think you're on the right lines... although you don't want to untick it.  You probably need to remove the default rule and then create your own as you thought.  But try this first and see if you get lucky:

    Screenshot of Trados Studio showing the 'Tag Verifier' window with 'Edit Rule' dialog open. Arrows point to 'Advanced Settings' and 'Segmentation Hint' set to 'Exclude'. A 'Save' button is highlighted.

    1. edit your default rule
    2. click on advanced
    3. set the segmentation hint to exclude
    4. finally save it and then create your project again with this updated rule

    You may get lucky, all depending on your actual tags.  But if not then you'll just have to remove the default rule and create your own, excluding in this was as needed.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 12:10 AM (GMT 0) on 29 Feb 2024]
  • Hi ,

    Thank you very much for your help! It worked perfectly!

    This is how the segments looked like after following the four steps you mentioned :

    Screenshot of Trados Studio showing a list of segments with materials and percentages, such as 'Material: 70% Polyester, 30% Spandex.'

    Now, I tried to split the segments with materials such as 1, 9 or 15 in the screenshot above with intention to have the name of material in separate segments, e.g. based on the segment 15, I would like to see this:

    ______________

    Seg. 1 > Material:

    Seg. 2 > 94%

    Seg. 3 > Nylon

    Seg. 4 > 6%

    Seg. 5 > Spandex

    _____________

    My idea is to have such material names, like Nylon, Spandex, etc. as repetitions in the whole file, but since they are followed by different characters (full stop, comma, space) I don't know how to define the segmentation rule.

    I managed to get to this point. It's not possible to see this from the screenshot, but in the big file I now have 3x each material i.e. 3 unique occurrences for each material, e.g. "Spandex" "Spandex," "Spandex.".

    Screenshot of Trados Studio with segments split to show material names and percentages on separate lines, like '94% Nylon, 6% Spandex.'

    These are the segmentation rules I added:

    Screenshot of Trados Studio's segmentation rules window with a rule named 'including numbers' and break characters defined.

    Screenshot of Trados Studio's segmentation rules window with a rule named 'Colon' and break characters including colon and space.

    Screenshot of Trados Studio's segmentation rules window with a rule named 'percentage_space_colon' and break characters including percentage and space.

    Do you have any suggestion how this can be improved?

    Thank you!

    Milena

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 12:11 AM (GMT 0) on 29 Feb 2024]
  • It's not possible to see this from the screenshot, but in the big file I now have 3x each material i.e. 3 unique occurrences for each material, e.g. "Spandex" "Spandex," "Spandex.".

    I don't understand what you mean here.  Can you show the screenshot with this in it and explain what you get compared to what you actually need?

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

Reply Children
  • Sorry if I wasn't clear enough. The example cells that I selected in my first message come from a big file that has lots of words and it takes time for Studio to process, so that it why I took only three cells where some of materials appear as a sample. After following the steps you mentioned and adding the segmentation rules above I managed to get to this point where these materials are in separate segments (seg. 11-15 and 21-25): (this is the same screenshot from above from the small sample I selected):

    Screenshot of Trados Studio showing text segments with material names, including 'Spandex' in segments 11-15 and 21-25.

    Then I took the big file with all content, processed it with the same rules, got the same expected results and in Advanced Display Filter I filtered for Unique Occurrences under Filter Attributes and spandex in Source under Content - I wanted to see the unique occurrences that have this material name. (There are lots of different materials in the file, I just used spandex as an example.) So, now I have this:

    Screenshot of Trados Studio's Advanced Display Filter showing unique occurrences of the word 'Spandex' in segments 12, 137, 249, and 400.

    And I want to have only one Unique Occurrence like the segment 249 above (without dot, comma, and to break the segments such as seg. 400).

    Hope it makes more sense now. Slight smile

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 12:11 AM (GMT 0) on 29 Feb 2024]