Preserving line break/paragraph break information in sdlxliff files with merged/split segments

Hi,
I'm currently trialling Trados 2022. I have a very specific use case that will influence my purchase decision (inclusive support): creating sdlxliff files with specific properties for editing in other CAT tools.
I created two sdlxliff from the same docx file. The first version is untouched after import. The sectonf had 3 segments merged into one in Studio 2022.
The two sdlxliffs look near-identical in Trados's editor. But a colleague works with OmegaT, and there we see a difference:
This is first (pixellated):
Screenshot of Trados Studio editor showing a pixellated view of an sdlxliff file with visible line breaks and paragraph breaks in the segmentation.
And this is the second (so, the file where I merged segments in Trados):
Screenshot of Trados Studio editor displaying a pixellated view of a modified sdlxliff file where segments have been merged, showing all breaks as paragraph breaks.
As you can see, in the first image, both line breaks and paragraph breaks are visible in the segmentation. In the second, all are shown as paragraph breaks.
For sharing files with my colleague, it's important that line breaks and paragraph breaks are still distinguishable in their CAT tool - that's a vital part of their approach to translation. In Trados, is there a way of merging/splitting segments, while not losing whatever information in the sdlxliff files allows my colleague to distinguish line breaks from paragraph breaks?

I'm happy to share the sdlxliff files with Trados staff but cannot post them here publicly.



Edited for clarity
[edited by: Matthew Scown at 8:14 AM (GMT 1) on 22 Apr 2024]
emoji
  •  

    By default Trados will always group paragraphs within a Paragraph boundary.  Line breaks are not grouped that way.  So for example:

    That's two paragraphs.  Each have three sentences, but the first separates the sentences within the paragraph with line breaks.  In Studio you will get this:

    You can tell where the paragraphs are because of the 'P' in the document structure column because an SDLXLIFF contains this information.  So any CAT tool that properly interprets an SDLXLIFF and has a way to display it will be able to see this too.  The line breaks are less important structurally because Studio will put them back when the document is translated.

    However, these are also available in the SDLXLIFF if another tool correctly interprets and displays them.  If I show "All Content" to look between the segments I see this:

    Note I can see the line breaks between, so if it's important for me to know this and I don't want to use the preview, I can see them this way too.  However, in another CAT tool... very doubtful!!

    If I merge the segments in the first paragraph I would see this:

    The line breaks are retained in the source.  So if Omega-T cannot see that, and if that's a deal breaker for you because your colleagues will only use Omega-T then you better use it too!!  We have no influence over their tool. 

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Thanks for the answer - the issue is that the paragraph and line breaks were distinct in OmegaT for the first sdlxliff.

    But the second sdlxliff (which was also much larger, 200kb vs 130kb), in which I'd merged some segments, did something to the file that hid the paragraph breaks from OmegaT. The puzzle is to work out what Trados did to the file - to hopefully prevent that happening.

    emoji
  •   

    Merging segments in Studio changes the structure of the xliff in the end. The segmenting changes and depending if the merge was over paragraph breaks or not it may cause a huge difference in the target file. The segments may appear empty in the other tool (for example in memoQ) and you may have no chance to edit this there. I do not know OmegaT, but in case of such problems I would NOT merge anythin in Trados Studio.

    emoji
  •  

    If you were to provide a small example file(s)... which surely isn't beyond your capabilities to create, then it would be much easier to see what you mean.

    The biggest problem will be if you merged over paragraph breaks, otherwise there should be no problem.  And if you had to do this this then I reckon your source files would benefit from editing before you put them through any CAT tool.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Surely isn't beyond my capabilities??

    The source files were run through the document cleaner in the TransTools plugin for MS Word, then loaded into Libreoffice and saved from there. I find this two-step method a very reliable way of removing junk tags from a source file. But editing would not help with my problem - I translate many academic texts from the humanities. And CAT tools are messy when importing citations, which of course include many colons and full stops:

    Screenshot of a text document showing a list of academic citations with titles, authors, and publication details.

    So, I want to be able to merge each footnote (or each whole citation, depending on the context) into a single segment. And I'm assuming that this must be done manually, and that there's no special tool for specifying segmentation rules just for academic citations.

    I can share the file that has been causing me problems. But via email or some other direct method - out of respect for my client, I cannot post it here on the open internet. Let me know how I can get the file to you.

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 2:40 PM (GMT 1) on 22 Apr 2024]
  •  

    I understand your way of working and also the need to merge such texts. From what I see these are NOT separated by any breakds, but simply segmented on punctuation marks. Does this cause also problems in OmegaT? Maybe a way to work could be generating a XLF file in OmegaT and translating this in Studio? I do not know, if that will work, as I do not use OmegaT. However, working with Transit files (via memoQ) and memoQ files in Trados Studio causes usually no problems. So maybe this way round will be easier?

    emoji
  •  

    Surely isn't beyond my capabilities??

    Maybe worth explaining after reading this next:

    I can share the file that has been causing me problems. But via email or some other direct method - out of respect for my client, I cannot post it here on the open internet.

    From the sounds of it you need to create a Word file containing:

    • a few segments with any old text... what it says is not important at all
    • a few footnotes or academic citations
    • maybe some separate sentences with line breaks

    So one page in Word with just a little bit of text to represent the problem you are trying to solve.  That is something you could share here and it would help everyone who would like to help you.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji