Fixing Segmentation in TM

Hello team,

 

I have an issue with a Large Segments TM that I would like to break/fix the segmentation in order to use the Fuzzy Matches in the new files version.

The files are XLIFF files which are an export from WPML I received from client.

We created the project in SDL Studio which created the project with very large segments (although there were punctuation), so now we have fixed the issues and we have a perfect segmentation, but the TM includes the large segments translation.

 

It would be great if you can help us ASAP.

 

Best Regards

Fouad

Parents Reply
  • There's no way to perform this kind of operation directly in Studio, so I'm afraid it's not as simple as applying a file type to your TM (though that would be a neat feature for future versions of Studio, cause I have a similar cleanup job coming up myself ;)).

    Assuming you do use Olifant to split the segments, and you just want to apply basic segmentation rules, it could be as simple as splitting at periods or semicolons.

    So you'd open your TM in Olifant and use the following search/replace patterns:

    Find: ". "

    Replace with: ". [$SPLIT$]"

    When you have done that, select all segments and use the split command. Then import the clean TMX into a new SDLTM.

    If you need to, say, split after all tags in a TM, you'd have to use regex:

    Find: \<.*\>

    Replace with: $& [$SPLIT$]

    Make sure that "use regular expressions" is checked when you do this.

    There are some helpful examples of how to use regex in Olifant here: http://okapi.sourceforge.net/Release/Shared/Help/regex.htm

Children