Please weigh in on the XLIFF standard and how it affects the Translation Memory workflow

I would like to ask the community to weigh in on the XLIFF standard. It occurs to me that the development of the bilingual XLIFF files we often see in WordPress exports are not at all optimized for Translation Memory workflows. A couple of issues (let's assume all examples define English as the source language):

  1. XLIFF files by its bilingual nature are pre-segmented by each instance of <source> and <target> content. I understand the practical limitation of Studio's full stop segmentation rules because of the bilingual nature of the file. I've read elsewhere on SDL's community page that users report very long paragraphs extracted. This is our experience as well. This makes working with segmented Translation Memories impossible. The only resolution here seems to be to manually split segments which would be a cumbersome process. 
  2. Although the XLIFF standard doesn't seem to have provisions for embedded content or CDATA support, the reality is that WordPress exports do generate HTML content and WordPress shortcodes. Studio's XLIFF filter does not allow processing embedded content. The workaround typically is to simply use a custom XML filter. This works great, but it does require you to process the <target> segments because XML filters replaces text. It does not map the source and target in Studio like it does with the XLIFF filter. Using the XML filter also takes care of segmentation.
    • I understand that there are some standards around tag element conversion in the native XLIFF file that may help (have not tested it yet) but it doesn't resolve the segmentation issue. 
  3. The XLIFF standard defines that <target> should always contain the latest translation. When the source content gets updated, we receive an updated <source> English and pre-populated old <target> translation. This wreaks total havoc with our XML filter because we lose the ability to process <target> to get the latest source. Processing the source is an option, but it replaces <source>. Manual workaround would be needed. There are several creative ways I could foresee. One options is to strip the source XLIFF file from the <target> entities, process the <source> segments using our XML filter, translate with the help of our TM and then export out to XLIFF. Now what we have is a XLIFF file with target translations in <source>. The manual workaround would be to replace <source> with <target> and </source> with </target>. Now the only thing you need to do is somehow replace the English <source> for each item. Maybe a compare of the files could work where you reject the deletion of the original source and accept the addition of the target. However, how cumbersome is that in case you deal with many files?

In conclusion: Right now the only productive process I can figure out is the XML route for original content as long as <target> is populated with English source content. However, updated files become a cumbersome process.

I've tried to research the history behind XLIFF a little bit but I can't necessarily determine whether Translation Memory workflow was ever considered in developing this standard. The bilingual nature of XLIFF may be very helpful to developers and certainly extends translation to a wider audience, but I feel that this standard could have a negative impact on translation quality by LSPs. XLIFF seems to take us away from having the Translation Memory (MT, TM, etc) as the central database for all translation management. Nowhere else in translation management do we typically deal with bilingual files because we manage translation and translation updates from the English source at all times.

I hope SDL can weigh in on this on best practices in working with XLIFF and any translation professional or LSP to weigh in on the impact it has on your workflow. I'm looking particularly for ideas that either remedy the problems I indicated with XLIFF or best practices on workarounds, including perhaps custom filters. I found one post that mentioned a filter that handles embedded content in XLIFF but the link is down. Even if that is resolved, I don't foresee that the segmentation issue realistically can be resolved in bilingual files.

Many thanks! 

Jeroen Tetteroo

Language Solutions

St. Louis, MO

Using Studio 2017, 2014 Professional 

  • As I said, I have read all the discussions on this topic in this forum, so I am well aware of your position, Evzen. It seems to me that you are the one barking up the wrong tree by venting your frustration on this forum whenever someone posts on this topic. I do not think this is helpful or productive and might keep folks from continuing to engage in the conversation. Paul has mentioned working with WPML and I am raising the issue with them as well. That doesn't mean I will sit back and wait for them to fix it if there is also a possibility of working around it on the Studio side. As Paul mentioned, they have worked on a filter to address some of it, so why not continue that work at the same time. I will also try the XML route and may need some guidance on it from other forum users. Clearly, several have dealt with this and could work together to help each other out.

    Thanks,

  • Just a follow-up. Heard back very quickly from my contact at WPML. They are working on a solution and I will test it in the next days. I will post with more info at that time. Thanks to anyone before me who has pushed for a better solution, advised and tested with them.
  • My point was that without enough "barking up the WPML" and just keeping doing guerilla workarounds forever (not mentioning even establishing some of these workarounds as standard in Studio!) there wouldn't be much (if any) movement at WPML's side at all.
    I dare to say that actually the "barking" here has played a role in the latest developments at WPML...
  • Maybe. A little bark can go a long way, but too much bark can be counter-productive. Behind these usernames are real humans who might be doing their very best job. Chances are, the one rep of theirs who had commented earlier was scared away from our forum, which in turn may have kept them from notifying us proactively of any beta tests, which means I (and maybe others) wasted a bunch of time looking for other solutions. I was quite apprehensive asking here myself having seen previous interactions.

    Either way... As it turns out, they currently have an easy-to-use web-based tool in beta with some of their LSPs, which converts their XLIFFs to beautifully-segmented monolingual XLIFFs and then back to WPML-suitable XLIFFs. Once it's through beta, it will go into production as part of the process of downloading XLIFFs from Translation Hub. I just tested it successfully. It went so well that I'm wondering if I missed something ;)