Under Community Review

More flexible XLIFF parser behavior - option to ignore target entries

Problem and use case: We have clients hosting websites using WordPress with our SDL connectors solution providing WPML to TMS integration. We love this integration with WordPress at large, but we struggle with localization unfriendly XLIFF files from WordPress. I think this is one of those unfortunate uses of XLIFF format for translation purposes. <trans-unit> should contain properly segmented content, but in this case, paragraph level content containing a thousand of words appears as a segment in prepped SDLXLIFF.

Current solutions: here is what we tell our clients, and neither is perfect. I’ve never used WordPress before, but the problems stated for either scenario are coming through connector, i.e. error-prone. 
We can only handle XLIFF files in one of the two ways:

  1. Treat XLIFF as bilingual XLIFF, by extracting <source> content.
    1. Problem solved: we can treat XLIFF as XLIFF. We will extract <source> content as source.
    2. Problem remaining: If <target> entries are prepopulated (not empty), Studio/TMS cannot segment the content any further. In the source XLIFF, there was one entry with over 1K words, which will become a segment in prepped SDLXLIFF.
    3. For localization friendliness, <target> entries should be either removed or emptied.
  2. Or, we treat XLIFF as monolingual XML, by extracting <target> content.
    1. Problem solved: we can segment <target> content properly as regular XML content.
    2. Problem remaining: If <target> entries contain localized text, it will become source segment content in prepped SDLXLIFF.
    3. For localization friendliness, <target> entries should always contain source text.

 Solutions: can we have an option to ignore <target> entries, in which case Studio/TMS is free to segment <source> content properly. 

Thanks for considering!

Naoko