XML files containing HTML encodings

Hi,

We encounter a problem when analysing a French XML file in SDL Studio 2017
We cannot get the French characters with accents right.  In the source file they use an encoding for these characters and in Studio these encodings appear as tags.
Do you know a solution for this problem?

Kind regards and thanks in advance,
Margo

Parents
  • Check the entity processing in your HTML parser. This can be the reason for wrong characters. Or please post here an extract of your XML with the corresponding text with those French letters.

    _________________________________________________________

    When asking for help here, please be as accurate as possible. Please always remember to give the exact version of product used and all possible error messages received. The better you describe your problem, the better help you will get.

    Want to learn more about Trados Studio? Visit the Community Hub. Have a good idea to make Trados Studio better? Publish it here.

  • Hi Jerzy,

    Thanks for your reaction.

    I already tried several HTML entity settings, but all in vain up to now.

    Please find an extract below.

    Kind regards,

    Margo

     

     

    <sitecore>
     
      <phrase path="/sitecore/content/Sites/France/Home/Self storage in France/Avignon/Avignon" key="Avignon" itemid="{B48AB076-9DF9-4DA3-A130-50E3DD8B3E2D}" fieldid="General information" updated="20170626T083947Z">
        <fr-FR>&lt;h2&gt;&amp;Agrave; propos de Shurgard Avignon&lt;/h2&gt;
    &lt;p&gt;Notre centre de stockage &amp;agrave; Avignon a r&amp;eacute;cemment &amp;eacute;t&amp;eacute; renouvel&amp;eacute;. Le site offre toutes les fonctionnalit&amp;eacute;s n&amp;eacute;cessaires pour r&amp;eacute;pondre &amp;agrave; vos besoins.&amp;nbsp;&lt;/p&gt;
    &lt;ul&gt;
        &lt;li&gt;672 espaces de stockage au rez-de-chauss&amp;eacute; ou &amp;agrave; l'&amp;eacute;tage.&lt;/li&gt;
        &lt;li&gt;Box de stockage avec acc&amp;egrave;s direct permettant de vous garer devant la porte.&lt;/li&gt;
        &lt;li&gt;Ascenseur et chariots pour d&amp;eacute;placer facilement vos affaires.&lt;/li&gt;
        &lt;li&gt;Un acc&amp;egrave;s direct int&amp;eacute;rieur qui peut accueillir tout type de v&amp;eacute;hicule.&lt;/li&gt;
        &lt;li&gt;Des places de parking disponibles juste devant l'accueil.&lt;/li&gt;
    &lt;/ul&gt;
    &lt;p&gt;Informations pour acc&amp;eacute;der au centre de stockage&amp;nbsp;:&lt;/p&gt;
    &lt;ul&gt;
        &lt;li&gt;Shurgard est situ&amp;eacute; proche du centre commercial Cap Sud.&lt;/li&gt;
        &lt;li&gt;Par la route de Marseille, prenez le rond-point du Lac de Saint Chamand.&lt;/li&gt;
        &lt;li&gt;2 arr&amp;ecirc;ts de bus en direction du centre-ville d'Avignon ou de Montfavet.&lt;/li&gt;
    &lt;/ul&gt;</fr-FR>
      </phrase>
      
     
    </sitecore>

  • This is the source of your XML?
    So in that case you would need to process the entities before you open the file for translation. I see it is Sitecore - I have dealt with that format quite a long time ago. If you like, please send me a complete file to jerzy at czopik dot com to create a file type for it.

    _________________________________________________________

    When asking for help here, please be as accurate as possible. Please always remember to give the exact version of product used and all possible error messages received. The better you describe your problem, the better help you will get.

    Want to learn more about Trados Studio? Visit the Community Hub. Have a good idea to make Trados Studio better? Publish it here.

  • Hi  

    If all your files look like this all you need to do is process the fr-FR element with an embedded content rule and the html filter will handle these entities nicely.  Try the attached:

    Margo Van Thienen.zip

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • BTW, I really wonder what was the person with the "bright" idea to name the element by the language abbreviation actually smoking :(
    I hope that "smart guy" will burn in hell forever...

    EDIT:
    Just in case you don't get what I'm talking about...
    Imagine you get source files in English, supposed to be localized into a dozen of languages... so after translation you get buch of files in each target language, but all still containing the translations in <en-US> elements...
    So one needs to include extra prost-processing step in the process... just because someone back at Sitecore didn't bother to use brain :-(

Reply
  • BTW, I really wonder what was the person with the "bright" idea to name the element by the language abbreviation actually smoking :(
    I hope that "smart guy" will burn in hell forever...

    EDIT:
    Just in case you don't get what I'm talking about...
    Imagine you get source files in English, supposed to be localized into a dozen of languages... so after translation you get buch of files in each target language, but all still containing the translations in <en-US> elements...
    So one needs to include extra prost-processing step in the process... just because someone back at Sitecore didn't bother to use brain :-(

Children