Segmentation rules

Milena Martinez de Vargas over 5 years ago

Hi Community!

I'm trying to prepare a file for translation and I don't know how to segment it. It contains product descriptions that if segmented properly are quite repetitive. I'm using Studio 2021 Pro, the file type is .xlsx (Microsoft Excel 2007-2019, SpreadsheetML v. 1).

The file is big, but I selected these three cells as an example:

Screenshot of an Excel spreadsheet cell containing product description with HTML tags for material, design, and features in Swedish.

I created a new empty TM using default settings (I didn't change segmentation rules) and in Project Settings I ticked "Enable embedded content processing" under File types - Microsoft Excel 2007-2019 - Embedded content. I didn't make any other changes.

And here are the same cells in Studio:

Screenshot of Trados Studio interface showing the same product description from Excel with segments separated by tags, highlighted in purple.

Is it possible to split these segments between the tags, so that each segment has its own opening and closing tag? For example, if I can have this as one segment:

Close-up screenshot of a segment in Trados Studio with HTML tags highlighted in purple, indicating the need for segmentation.

Or is it also possible to segment the file in a way that the segments only contain text and not the tags? Like this:

Screenshot of a segment in Trados Studio without HTML tags, showing only the text 'Natventilation pa sidorna och under armarna' in Swedish.

The target file needs to contain the same tags/code that are in the source file, so I can't just remove them from the Excel file.

Another option that I could think of is to untick "Embedded content processing" and try to write a regex to separate the code from the text. I don't know if that is feasible nor how to write that regex, though.

I'm not sure if I think in the right direction(s), so any suggestion about how to prepare this file is more than welcome. :)

Thanks!

Milena

Generated Image Alt-Text
[edited by: Trados AI at 12:10 AM (GMT 0) on 29 Feb 2024]

Translate

Rate translation

Suggest better translation

Moderator UI

Thread Subject & Description
Segmentation rules Hi Community! I'm trying to prepare a file for translation and I don't know how to segment it. It contains product descriptions that if segmented properly are quite repetitive. I'm using Studio 2021 Pro, the file type is .xlsx (Microsoft Excel 2007-2019, SpreadsheetML v. 1). The file is big, but I selected these three cells as an example: https://community.rws.com/resized-image/__size/320x240/__key/communityserver-discussions-components-files/90/pastedimage1608051669756v2.png I created a new empty TM using default settings (I didn't change segmentation rules) and in Project Settings I ticked "Enable embedded content processing" under File types - Microsoft Excel 2007-2019 - Embedded content. I didn't make any other changes. And here are the same cells in Studio: https://community.rws.com/resized-image/__size/320x240/__key/communityserver-discussions-components-files/90/pastedimage1608052383481v3.png Is it possible to split these segments between the tags, so that each segment has its own opening and closing tag? For example, if I can have this as one segment: https://community.rws.com/resized-image/__size/320x240/__key/communityserver-discussions-components-files/90/pastedimage1608052629733v4.png Or is it also possible to segment the file in a way that the segments only contain text and not the tags? Like this: https://community.rws.com/resized-image/__size/320x240/__key/communityserver-discussions-components-files/90/pastedimage1608052714762v5.png The target file needs to contain the same tags/code that are in the source file, so I can't just remove them from the Excel file. Another option that I could think of is to untick "Embedded content processing" and try to write a regex to separate the code from the text. I don't know if that is feasible nor how to write that regex, though. I'm not sure if I think in the right direction(s), so any suggestion about how to prepare this file is more than welcome. :) Thanks! Milena
Get AI Suggestion

AI Reply

Accept answer Reject Answer

Parents

0 Paul Filkin over 5 years ago
Milena Martinez de Vargas

Milena Martinez de Vargas said:
Another option that I could think of is to untick "Embedded content processing" and try to write a regex to separate the code from the text. I don't know if that is feasible nor how to write that regex, though.

I think you're on the right lines... although you don't want to untick it. You probably need to remove the default rule and then create your own as you thought. But try this first and see if you get lucky:

edit your default rule

click on advanced

set the segmentation hint to exclude

finally save it and then create your project again with this updated rule

You may get lucky, all depending on your actual tags. But if not then you'll just have to remove the default rule and create your own, excluding in this was as needed.

Paul Filkin | RWS

Design your own training!
You've done the courses and still need to go a little further, or still not clear?
Tell us what you need in our Community Solutions Hub
Generated Image Alt-Text
[edited by: Trados AI at 12:10 AM (GMT 0) on 29 Feb 2024]
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Reject Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
0 Milena Martinez de Vargas over 5 years ago in reply to Paul Filkin

Hi ,

Thank you very much for your help! It worked perfectly!

This is how the segments looked like after following the four steps you mentioned :

Now, I tried to split the segments with materials such as 1, 9 or 15 in the screenshot above with intention to have the name of material in separate segments, e.g. based on the segment 15, I would like to see this:

______________

Seg. 1 > Material:

Seg. 2 > 94%

Seg. 3 > Nylon

Seg. 4 > 6%

Seg. 5 > Spandex

_____________

My idea is to have such material names, like Nylon, Spandex, etc. as repetitions in the whole file, but since they are followed by different characters (full stop, comma, space) I don't know how to define the segmentation rule.

I managed to get to this point. It's not possible to see this from the screenshot, but in the big file I now have 3x each material i.e. 3 unique occurrences for each material, e.g. "Spandex" "Spandex," "Spandex.".

These are the segmentation rules I added:

Do you have any suggestion how this can be improved?

Thank you!

Milena

Generated Image Alt-Text
[edited by: Trados AI at 12:11 AM (GMT 0) on 29 Feb 2024]
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
0 Paul Filkin over 5 years ago in reply to Milena Martinez de Vargas

Milena Martinez de Vargas

Milena Martinez de Vargas said:
It's not possible to see this from the screenshot, but in the big file I now have 3x each material i.e. 3 unique occurrences for each material, e.g. "Spandex" "Spandex," "Spandex.".

I don't understand what you mean here. Can you show the screenshot with this in it and explain what you get compared to what you actually need?

Paul Filkin | RWS

Design your own training!
You've done the courses and still need to go a little further, or still not clear?
Tell us what you need in our Community Solutions Hub
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate

Reply

0 Paul Filkin over 5 years ago in reply to Milena Martinez de Vargas

Milena Martinez de Vargas

Milena Martinez de Vargas said:
It's not possible to see this from the screenshot, but in the big file I now have 3x each material i.e. 3 unique occurrences for each material, e.g. "Spandex" "Spandex," "Spandex.".

I don't understand what you mean here. Can you show the screenshot with this in it and explain what you get compared to what you actually need?

Paul Filkin | RWS

Design your own training!
You've done the courses and still need to go a little further, or still not clear?
Tell us what you need in our Community Solutions Hub
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate

Children

0 Milena Martinez de Vargas over 5 years ago in reply to Paul Filkin

Sorry if I wasn't clear enough. The example cells that I selected in my first message come from a big file that has lots of words and it takes time for Studio to process, so that it why I took only three cells where some of materials appear as a sample. After following the steps you mentioned and adding the segmentation rules above I managed to get to this point where these materials are in separate segments (seg. 11-15 and 21-25): (this is the same screenshot from above from the small sample I selected):

Then I took the big file with all content, processed it with the same rules, got the same expected results and in Advanced Display Filter I filtered for Unique Occurrences under Filter Attributes and spandex in Source under Content - I wanted to see the unique occurrences that have this material name. (There are lots of different materials in the file, I just used spandex as an example.) So, now I have this:

And I want to have only one Unique Occurrence like the segment 249 above (without dot, comma, and to break the segments such as seg. 400).

Hope it makes more sense now.

Generated Image Alt-Text
[edited by: Trados AI at 12:11 AM (GMT 0) on 29 Feb 2024]
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate

Trados Studio > 1. Trados Studio

Segmentation rules