Custom XML filetype adds unexpected spacing in target files

Hi,

I created a custom XML filetype to manage a fileformat. The filetype works perfectly fine in input:

  1. it correctly recognizes the file structure
  2. it correctly extracts translatable content
  3. it correctly converts entities into placeholder
  4. it generates the target file (here comes the issue)

I imported it in the app the file came from to find that some of the text went missing. Comparing the source and target xmls I found out that Trados Studio add some spacing into closing tags that compromises the import.

Here you can see a comparison:

Side-by-side comparison of XML code in Trados Studio. Left side shows original code with green boxes highlighting closing tags and pink boxes around source text. Right side shows target text in pink boxes and additional spaces in red boxes.

On the left you can see:

original closing marked in green boxes

source text in pink box

On the right:

The proper target text in pink box

Some weird additional spaces in red boxes

NOTE: the affected character are not extracted as translatable content! Even using the "show all content" filter they are not visible to the user

I manually removed them, and it worked! I checked the filetype settings, and apparently none of the settings is related with spacing but "Whitespaces" options, which I didn't edit:

Screenshot of Trados Studio Project Settings dialog showing 'Whitespaces' options under 'File Types'. Options for 'Whitespace in content' and 'Whitespace in tags' are visible.

I also checked the "Writer" options, since this is definitely something "written" in the target files:

Screenshot of Trados Studio Project Settings dialog showing 'Writer' options under 'File Types'. Options for 'Unicode UTF-8 byte order mark (BOM)' and 'xml:lang attribute values' are visible.

For the time being, I'm batch finding and replacing this usign a tool, but it would be definitely better to have this fixed.



Generated Image Alt-Text
[edited by: RWS Community AI at 8:18 AM (GMT 0) on 15 Nov 2024]
emoji
Parents
  •   

    Indeed I also cannot prevent these spaces being added.  Even adding xml:space="preserve" to the xml seems to be ignored.  I created a support ticket so support can take a look and if verified they will log a bug for resolution.

    Case Details - 00830442 for reference.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Hi

    Do you have any update for us? This is quite problematic as we need to fix a lot of exported files in order to be able to re-import them
    into the customer's system.

    Looking forward to your feedback!

    KR,
    Julian

    emoji
  •  

    Yes... apologies for not coming back.  I had an answer and forgot to tell you!  This is the outcome:

    We fully embrace semantic functionality of the XML - this means for example, that these sort of things will not be considered bugs:

    • reordering attribute values,
    • automatic normalization of the whitespaces between element attributes (does not apply to attribute values),
    • whitespace changes outside of the content,
    • changing empty tags to selfclosed (<a></a> to <a/>)

    This problem you have described fits into the "whitespace changes outside of the content” category, so our development team will not do anything about this.

    In general I am of the opinion that there is "being technically correct", and "being customer correct", and sometimes I think we can be a little too rigid when it comes to handling customer specific data where the solution a particular customer may use is not capable of properly handling valid XML.  However, in this specific scenario I do think that the problem here is one that your customer should address.  The XML specifications, and plenty of discussion topics on technical forums discussing this sort of issue, conclude that whitespace is irrelevant and all apps should parse the file irrespective of whether there is an extra whitespace there or not.

    I think, if this is problem that your customer won't address then I would run a script and just add the whitespace back into the files.  Probably trivial enough and given they seem to be able to handle the whitespace you could add this as a matter of course into every self-closing tag without having to do this manually for each file.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
Reply
  •  

    Yes... apologies for not coming back.  I had an answer and forgot to tell you!  This is the outcome:

    We fully embrace semantic functionality of the XML - this means for example, that these sort of things will not be considered bugs:

    • reordering attribute values,
    • automatic normalization of the whitespaces between element attributes (does not apply to attribute values),
    • whitespace changes outside of the content,
    • changing empty tags to selfclosed (<a></a> to <a/>)

    This problem you have described fits into the "whitespace changes outside of the content” category, so our development team will not do anything about this.

    In general I am of the opinion that there is "being technically correct", and "being customer correct", and sometimes I think we can be a little too rigid when it comes to handling customer specific data where the solution a particular customer may use is not capable of properly handling valid XML.  However, in this specific scenario I do think that the problem here is one that your customer should address.  The XML specifications, and plenty of discussion topics on technical forums discussing this sort of issue, conclude that whitespace is irrelevant and all apps should parse the file irrespective of whether there is an extra whitespace there or not.

    I think, if this is problem that your customer won't address then I would run a script and just add the whitespace back into the files.  Probably trivial enough and given they seem to be able to handle the whitespace you could add this as a matter of course into every self-closing tag without having to do this manually for each file.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
Children