How to fix very-large-segments in Studio in xliff projects from WordPress/WPML

Hi Team, 

I was wondering whether someone could help with a recent issue we're having. 

Essentially, we get website translation projects in the form of xliff files, generated from WordPress with WPML. There seems to be a consistent issue with such xliffs in Studio in that the translation projects are not being broken down into segments properly. The whole file becomes just a few segments, with lots of line spaces in between whole blocks of text.

The consequences of this are not nice: you can't use MT, you can't use your TMs, you can't even translate anything manually either, for fear of damaging the file so it won't fit back into the website. 

I don't want to make any unfair comparisons but in the interest of research the following might also be useful to pre-empt: I initially thought the segmentation error could be a WPML issue. But then I've put the same xliff into one of popular alternative CATs and hey - what was 3 segments in Studio 2017 (one big blob plus 2 one-liners at the bottom) turned out to be 94 proper segments in the other CAT! I suppose this makes WordPress / WPML innocent enough.  

I admit I noticed some relevant information on this Forum regarding legacy Studio 2014 but that piece of advice is thankfully very now old - because it seemed to be a multi-stage workaround of incredible technical complexity. To be honest I'd rather need a real-life fix which simply works, just like the other CAT does it. Surely, there must be some setting deep inside the newest Studio that I'm not noticing? 

Please would anyone be able to advise the quickest way to correct that segmentation issue?

Many thanks indeed, 

Adam

Parents
  • WordPress is a quite bad format. What I do, if I need to work with it, is to try to adapt the file type including the tag types to have them rule the segmentation. From what I remember WPML is a XLIFF full of CDATA... Last time I had that I developed a customized file type to handle this as a "normal" xml file. Might be, that this could be helpful.
    Or you just let the segmentation be done in the other popular CAT tool, then take the file from there and translate in Studio.

    _________________________________________________________

    When asking for help here, please be as accurate as possible. Please always remember to give the exact version of product used and all possible error messages received. The better you describe your problem, the better help you will get.

    Want to learn more about Trados Studio? Visit the Community Hub. Have a good idea to make Trados Studio better? Publish it here.

  • Hi Jerzy,

    Thank you for your very quick reply.

    1. What do you mean by saying that WPML makes bad xliffs? Yes, there are CDATA tags in there but they are are just marked out as any other tag and they don't seem to interfere with anything. Is there something WPML should work on?

    2. What do you mean by "handle this as a normal XML file" exactly? I apologise but that might be not obvious enough for our level of expertise to solve my problem. Is that a functionality in Studio?

    3. I tried the final option you suggest and failed: open in the "other CAT" (nice segments) export out and open in Studio (no segments).

    So essentially, how do you think I could easily fix this problem in Studio?

    Many thanks indeed,

    Adam
  • 1. and 2. From what I remember, WPML delivers tags in <> and in []. Studio can recognize only tags written in <>. For all contents in [] one needs to define embedded content rules.
    So in case of XLF I told Studio to open it like XML and translate only the content of "target" tags. Then I defined corresponding embedded content rules.
    However, from the last CU, Studio 2017 will allow you to process embedded content in XLF file, so this will replace my process then.

    3. If you preprocess the file in MemoQ, you must use the option "Export bilingual" - "Plain XLF for other CAT tools". And then you must make sure, your Studio has the file type for MemoQ (not all versions did have it out of the box).

    _________________________________________________________

    When asking for help here, please be as accurate as possible. Please always remember to give the exact version of product used and all possible error messages received. The better you describe your problem, the better help you will get.

    Want to learn more about Trados Studio? Visit the Community Hub. Have a good idea to make Trados Studio better? Publish it here.

Reply
  • 1. and 2. From what I remember, WPML delivers tags in <> and in []. Studio can recognize only tags written in <>. For all contents in [] one needs to define embedded content rules.
    So in case of XLF I told Studio to open it like XML and translate only the content of "target" tags. Then I defined corresponding embedded content rules.
    However, from the last CU, Studio 2017 will allow you to process embedded content in XLF file, so this will replace my process then.

    3. If you preprocess the file in MemoQ, you must use the option "Export bilingual" - "Plain XLF for other CAT tools". And then you must make sure, your Studio has the file type for MemoQ (not all versions did have it out of the box).

    _________________________________________________________

    When asking for help here, please be as accurate as possible. Please always remember to give the exact version of product used and all possible error messages received. The better you describe your problem, the better help you will get.

    Want to learn more about Trados Studio? Visit the Community Hub. Have a good idea to make Trados Studio better? Publish it here.

Children