Missing tabs in repeat match segments

In edit mode, when SDL encounters a repeat match of a previous segment with a tab at the end of the segment, it does not insert the tab at the end of the repeat match segment. That's not good. The Verify function picks this up (flags the segments as inconsistent translations), but if you run Verify again after manually correcting the bad segments it happily deletes the terminal tags in the repeat match segments and flags them as inconsistent translations again. That's even worse.

  • Hello Kenneth,

    It probably isn't as starightforward as you suggest because as a rule Studio always moves tabs at the end of a sentence as external structure like this:

    I'm displaying the external structures between the segments and you can see the tabs are not in the part that goes into the TM.  So to deal with your question properly we'd need an example.  Can you provide one?

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Hi Paul,

    Thanks for the fast response. I agree with you about this not being straightforward, and actually I think the basic issue is how whitespace characters are handled, because I see the same problem with segments ending with a soft return.

    First a bit of background on how I got into this situation: I received three files from a client in CSV format, with the elements separated by semicolons (sample file available). SDL could not import these files. I tried changing the separation character in the standard CSV file type definition to a semicolon, but that didn't work (maybe because that file type is intended for importing bilingual files). I also created a new file type definition for .csv files with a semicolon separator, but that didn't work either (maybe due to my extremely limited knowledge of regular expression syntax). So then I converted the document to .docx with the semicolons replaced by tabs (sample file available) and added a setting to the TM for this project to segment on tabs (before and after: Anything). With that SDL imported and segmented the files properly, but when I started editing them I ran into the problem with missing tab characters in repeat segments. That was so irritating that I converted the content of the other two files to tables, which eliminates the problem of missing tabs.

    Now to the nitty-gritty: I understand what you say about the tabs being treated as external structure (and not saved in the TM), which is perfectly reasonable. However, I find it inconsistent that this is not also done with the tabs in the source segments, which results in the "inconsistent translation" problem with repeat segments because the easiest way to transfer tags in source segments is to copy the segment to the target and edit the text, and this copies the tab from the source segment. Deleting the tab in the first target segment would cure the "inconsistent translation" problem, but it's not a solution because the tabs (supposedly there in the external structure) are not inserted in the target document if they are not explicitly present in the target segments (despite being present in the corresponding source segments), which is of course disastrous.

    The .sdlxliff file from the example source file is also available (with all the tabs manually added as necessary; try changing one of the segments with repeat matches to see what happens with the tabs in the repeat segments).

    As mentioned initially, I also see the same problem with repeat matches of segments having a soft return at the end of the segment. That situation can arise when I split segments after soft returns, which is often a good idea to improve TM leverage or simply to make translation reasonably possible (I get a lot of documents from people who prefer soft returns to hard returns, and translating half a page of text as a single segment is not a pleasant task).

    If you tell me how to upload or otherwise provide sample files, I will do so.

    Best regards,
    Ken
  • Hi,

    I guess I'll start with this... did you try it?

    You could have changed the filetype delimiter to a semicolon.  If you have the source file I'll happily take a look... you can email it to me, pfilkin@sdl.com.

    On your other comments... no real comment until you send me a test file with the tabs you're talking about so I can reproduce the effect.

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Unknown said:
    So then I converted the document to .docx with the semicolons replaced by tabs (sample file available) and added a setting to the TM for this project to segment on tabs (before and after: Anything).

    ...As mentioned initially, I also see the same problem with repeat matches of segments having a soft return at the end of the segment. 

    If I could chime in, the above makes me think that there might be an issue with the way your segmentation rule is set up, but without seeing the actual rule, it's hard to say.  However, the very first thing that comes to mind is that you can segment on semicolons if you want, no need to replace the semicolons with tabs.

    Having said that, it may be even easier to handle the file in its native CSV format as Paul has suggested.

    Best,

    Nora

  • Hi Nora,

    I generated the segmentation rule as instructed on your blog (noradiaz.blogspot.nl/.../changing-segmentation-in-sdl-trados.html). The rule works fine; what doesn't work is how Studio Editor handles tabs at the end of segments.

    Best regards,
    Ken
  • Unknown said:
    what doesn't work is how Studio Editor handles tabs at the end of segments.

    Can you send me an sdlxliff file already segmented to show this so I can test it?  I'd like to use exactly what you are having problems with please.

    Thank you

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Hi Paul,

    I'll send you the test files and some screenshots. Basically, the CSV file definition only imports two columns (apparentely what it is designed to do) and I can't see any obvious way to make it handle more columns. The "natural" solution would be to convert the CSV file to XLSX , but Excel apparently does not like CSV files with semicolon delimiters and quote-mark text enclosure and puts each line in a single cell. There may be a clever (i.e. non-obvious) way to get around that, but I'm not aware of any.

    I leave on holiday tomorrow, so you may not hear from me for a while.

    Best regards,
    Ken
  • I'm intrigued. I did a few tests and I can't get Studio to include the tab at the end of the segment. In my case, just as what Paul has said, all the end of segment tabs are simply removed from the segment during segmentation, as I think they're supposed to. The only thing I can think of is that there may be something after the tab.
  • Indeed... in fact looking at the text file Kenneth just sent me the best option is to bring it into Excel (too many columns for translation).  If you use the Text import wizard you can select the delimiter and then the file imports perfectly.

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Unknown said:
    In edit mode, when SDL encounters a repeat match of a previous segment with a tab at the end of the segment, it does not insert the tab at the end of the repeat match segment.

    ok - I also have the test file from Kenneth.  I think it's quite a feat to have been able to create a file like this in the first place!!  But indeed, Studio will not save the tab at the end of the segment.  I started with this and confirmed it:

    Then if I redid it Studio does this from the TM:

    So this does look like a bug.  Clearly not something you see everyday and far easier to avoid it in the first place than it is to get into this situation, but it seems like a bug.  I'll report this to development along with the test file.

    Thanks for finding it, and I hope you'll have a better way to handle these CSV files in the future using the Excel text import wizard!

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub