XLF has no segments in Studio 2017, but parses fine in 2015

XLF files from a web proxy translation tool we use are parsed fine in Studio 2015. However when I open the same file in 2017, there are no segments at all. 

I tried 'highly encouraging' it by adding translate="yes" into each trans-unit and group tag, but no change. Would anyone have any other suggestions? 

Thanks! - Clark

Here's the first trans-unit, if it helps to diagnose. (you can test by pasting this into a text file and giving it extension .xlf)

<?xml version="1.0" encoding="utf-8"?>
<xliff xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="urn:oasis:names:tc:xliff:document:1.2" version="1.2">
<file source-language="en-US" target-language="es-ES" original="zejs1qho" datatype="xml">
<header>
<tool tool-id="easyling" tool-name="easyling"/>
</header>
<body>
<group translate="yes" id="zejs1qho_tm:S6V8yv0KmwDaSu8kYzyWNaP4Rn5xfPIFy+hz+piaqeY=47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=" xlink:href="www.URL.com/asset.php extradata="{&quot;trimmed&quot;:false}" datatype="plaintext">
<trans-unit id="zejs1qho_tm:S6V8yv0KmwDaSu8kYzyWNaP4Rn5xfPIFy+hz+piaqeY=47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=#0" datatype="plaintext">
<source>source content for translation</source>
<note annotates="general" from="easyling">Dom-Path: html:-1/head:-1/title:18/#text:0</note>
<note annotates="source" from="easyling">www.URL.com/asset.php
<note annotates="target" from="easyling">es-es-zejs1qho-p.app.easyling.com/asset.php
<note annotates="source" from="easyling">created-at: 2018-03-09 15:04:14</note>
</trans-unit>
</group>
</body>
</file>
</xliff>

Parents Reply
  • Hi ,

    The reason for the error seems to be that there are note fields added to a non-existent target element so 2017 doesn't like it. Studio 2015 didn't map comments to target so it just ignored their existence. The issue is logged as we clearly need to fix this, for your reference it is CRQ-8888.

    So you could remove the comments and then use 2017. Adding an empty target element causes an error too so you'd need to add content if you wanted to keep the notes and use 2017. You could do that with a little regex to just copy the source when there was no target present.

    If you want to use 2015 then you best bet is to handle the file as an XML. But this would mean adding a target element anyway so you had something to translate, so you're better off using 2017 and the workaround I mentioned above.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

Children
  • Hi Paul. Thanks a ton! That worked! A couple of (hopefully easy) follow-ups:

    1. I've got lots of segments of this type of format: [phrase 1] - [phrase 2] | [phrase 3]
    Those phrase 2 and phrase 3s are really repetitive. Is there perhaps a 'tricky' thing I could add in front of those phrases to force Studio to break the segment there? (and which will be easy to remove in the target XLF)

    2. It is including all tags at the beginning and end of segments, which will cut into some of my TM matching. "Include leading/trailing tags in segments" is not checked in my XLIFF filetype settings. Any suggestion to get rid of those tags?
  • Unknown said:
    Is there perhaps a 'tricky' thing I could add in front of those phrases to force Studio to break the segment there?

    You can create separate TM and try to play with segmentation rules: set the " - " or " | " sequence as "before break" and see how the segmented file will look like.

    Unknown said:
    Any suggestion to get rid of those tags?

    Probably only one: let the author create SENSIBLY PARSED XLIFFs.
    The point is that XLIFFs are expected to be processed as-is, without any further "fiddling". All the "fiddling" is expected to be done during parsing of the SOURCE FORMAT, BEFORE/WHILE CREATING THE XLIFF.

    If the author of the XLIFF is incapable of making the XLIFF content sensible, then (s)he should NOT be doing that at all... and better send the original format, so that capable people can process it using sensible way.

    Seeing that "easyling", my wild guess is that it's some WordPress-based webpage translation... isn't it? :-\