Customizing segmentation of PO file

Hello Community!

I am working on a PO file where some MSGID fields contain multiple very very long series of sentences and paragraphs. So far, studio recognizes separate sentences within certain tags and segments the file, but I'd like to further refine this to force a segmentation when certain codes are present (example: </p> or \n)

Here's an simplified example of a segment in the PO file (bearing in mind that some msgid content represents several pages of text) : 

<p class="MsoNormal"><strong><span style="font-size: 11.0pt; font-family: 'Lucida Grande'; mso-bidi-font-family: 'Lucida Grande'; color: #123456;">TITLE TO BE TRANSLATED</span></strong></p>
<p class="MsoNormal"><span style="font-size: 11.0pt; font-family: 'Lucida Grande'; mso-bidi-font-family: 'Lucida Grande'; color: #123456;">SENTENCE 1 TO BE TRANSLATED. SENTENCE 2 TO BE TRANSLATED.</span></p>

Studio's sentenced based segmentation will split this in 2 segments (creating placeholder tags for each string of code within <>), with the split occurring between sentence 1 and sentence 1. (because of the period?)

I'd like to add a rule of sort to recognize </p> and/or \n as markers for segmentation. IS this possible? I've checked out this post (https://community.sdl.com/product-groups/translationproductivity/f/studio/25104/po-portable-object-file-segmentation-on-a-sentence-basis?ReplySortBy=CreatedDate&ReplySortOrder=Ascending) and didn't find an answer really. 
I feel like this one might be helpful, but I don't know how to adapt it to my PO file.
https://community.sdl.com/product-groups/translationproductivity/f/studio/12708/how-to-fix-very-large-segments-in-studio-in-xliff-projects-from-wordpress-wpml?ReplyFilter=Answers&ReplySortBy=Answers&ReplySortOrder=Descending

Would appreciate any help with this! 

Kindly, 

Marie

Parents Reply Children
  • Hi Paul! Thanks a lot for all this!

    I've tried using the paragraph segmentation - and it actually made matters worst. Here's a photo of just one of problematic segments. (What happens is that studio considers each MSGTR as single segment.)

    Screenshot of Trados Studio showing paragraph segmentation with each MSGTR considered as a single segment, resulting in a cluttered and hard to manage text.

    The sentence based segmentation works better - but I still end up with segments looking like this:

    Close-up view of Trados Studio interface displaying sentence-based segmentation with segments still appearing cluttered and unmanageable.

    Ideally, I'd like to continue using the sentence-based segmentation, which already breaks down the strings from the PO file into more manageable segments, and enhance the feature to define \n, and even </p> as break characters in the segmentation rules, much in the way a period or a hard break is handled in regular text documents. Is it possible?

    I tried the following from one of your earlier posts: 

    Before break:

    .[\n]+

    After break

    .

    But it didn't work. Disappointed

    There's gotta be a way to make this work - am I wrong in thinking the answer lies in a clever regex segmentation rule? 

    Marie

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 2:55 AM (GMT 0) on 29 Feb 2024]