Fixing Segmentation in TM

Hello team,

 

I have an issue with a Large Segments TM that I would like to break/fix the segmentation in order to use the Fuzzy Matches in the new files version.

The files are XLIFF files which are an export from WPML I received from client.

We created the project in SDL Studio which created the project with very large segments (although there were punctuation), so now we have fixed the issues and we have a perfect segmentation, but the TM includes the large segments translation.

 

It would be great if you can help us ASAP.

 

Best Regards

Fouad

Parents Reply
  • Oh dear... One problem might have been the regex I gave you didn't work. I'm Sorry.. :(

    I just sent you a new copy of the file. This is what I did:

    I first searched for <.*?> and replaced with $&[$SPLIT$] to add split markers after all the tags. For some reason Olifant won't split next to tags though. However, the tags don't look like they should go into translatable segments at all, so I stripped the file off the tags completely, and performed the split afterwards.

    Then I simply split after periods as mentioned above, and removed any stray split markers that were left because of a mismatch of periods in the source and target segments.

    I didn't touch the colons though, as different punctuation seemed to be used in EN and IT (e.g. dashes in EN were replaced with colons in IT), and I was worried I might break more than I fix.

    I did a couple spot checks, and overall the new file looks OK. Still what I did was purely mechanical, so it might be advisable to apply a penalty to the TM just in case some segments are mismatched.

    Let me know if you get better results with the new TM.

    Ta

    Stephan

Children