How to prevent merging segments in SDL Trados Studio

Hi, is it possible to prevent merging segments within same paragraph when creating a project in SDL Trados Studio and sending a package for translation? Further question if we cannot prevent merging segments, is there a way to check if segments have been merged? I know that you can open an sdlxliff file in Studio and visually check if the segment numbering is continous, but is there a way to automate the check?

Many thanks

Jouni

emoji
  •  

    I did something like this in AutoHotkey, adding split segments and a bunch of other checks directly in the XLIFF files selected in Windows Explorer. :-)

    Very cool... I definitely think the ability of everyone to make the current access to AI work for them is under utilised.  I'm enjoying Python more for this stuff than AHK, but primarily because I think it's easier to do things since it's so well understood by most AI solutions whereas I'm still finding AHK can be a bit of a slog unless you know what you're doing beforehand.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Thanks , much appreciated, I owe you a beer or two! Need to look into this, currently did not manage to get result with Studio 2019 sdlxliff files, so I wonder  that maybe this requires newer sdlxliff files.

    Jouni

    emoji
  •  

    The script identifies segments that have been merged in SDLXLIFF files by looking for the MergeStatus attribute inside <sdl:seg-defs>.  I don't have 2019 installed, but if you check your merged files and don't see that then that would most likely be the reason.

    Happy to check if you send me a couple of files?

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Thanks Paul for taking your time. I seems that the MergeStatus feature has been introduced to Studio after Studio 2019, could not find the attribute. Find two Studio 2019 files with merged segments in fi-FI_merged_segments.zip

    Jouni

    emoji
  •  

    Indeed, much trickier to reliably find them.  I note the advanced display filter cannot find these ether... probably why the approach changed in later versions!  However, you could try this:

    import os
    import xml.etree.ElementTree as ET
    from pathlib import Path
    
    def parse_sdlxliff_file(file_path):
        try:
            tree = ET.parse(file_path)
            root = tree.getroot()
            
            namespaces = {
                'sdl': 'http://sdl.com/FileTypes/SdlXliff/1.0',
                '': 'urn:oasis:names:tc:xliff:document:1.2'
            }
            
            results = []
            for trans_unit in root.findall('.//trans-unit', namespaces):
                seg_source = trans_unit.find('.//seg-source', namespaces)
                if seg_source is None:
                    continue
                    
                seg_markers = seg_source.findall('.//mrk[@mtype="seg"]', namespaces)
                location_markers = seg_source.findall('.//mrk[@mtype="x-sdl-location"]', namespaces)
                
                # Only process if multiple segments and any location markers exist
                if len(seg_markers) > 1 and len(location_markers) > 0:
                    for i, marker in enumerate(seg_markers):
                        segment_id = marker.get('mid')
                        segment_text = ''.join(marker.itertext()).strip()
                        
                        # Consider it merged if:
                        # - It's the first segment (often merged in your examples)
                        # - OR it contains an x-sdl-location marker
                        has_location_within = len(marker.findall('.//mrk[@mtype="x-sdl-location"]', namespaces)) > 0
                        is_first_segment = (i == 0)
                        
                        if is_first_segment or has_location_within:
                            results.append({
                                'segment_id': segment_id,
                                'source_text': segment_text,
                                'merge_type': 'MergedSegment (old format)',
                                'filename': file_path.name
                            })
            
            return results
            
        except ET.ParseError:
            return []
        except Exception:
            return []
    
    def process_sdlxliff_folder():
        folder_path = input("Please enter the folder path containing sdlxliff files: ")
        
        if not os.path.isdir(folder_path):
            return
        
        sdlxliff_files = list(Path(folder_path).glob('*.sdlxliff'))
        
        for file_path in sdlxliff_files:
            results = parse_sdlxliff_file(file_path)
            
            for result in results:
                print(f"File: {result['filename']}")
                print(f"Segment #{result['segment_id']}:")
                print(f"Source: {result['source_text']}")
                print(f"Merge Type: {result['merge_type']}")
                print("-" * 50)
    
    def main():
        try:
            process_sdlxliff_folder()
        except KeyboardInterrupt:
            pass
    
    if __name__ == "__main__":
        main()

    The approach here identifies merged segments in older SDLXLIFF files by:

    1. Targeting multi-segment <trans-unit> elements with <mrk mtype="x-sdl-location"> tags.
    2. Flagging the first segment and any with internal x-sdl-location markers as merged.
    3. Reporting these with filename, ID, text, and merge type.

    This balances specificity (catching #1 and #5 as your samples) with generality (no ID hardcoding), using the best structural clues available.  So surely not perfect but might be helpful if you're having problems and need to find them!

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Great Paul, I admire your patience and perseverance with this! I was able to confirm that this really works Slight smile.

    Jouni

    emoji