HTML code embedded in Excel file

Hello,

I need help with a tricky Excel file. I have activated this in Studio already and it has helped a bit:

Screenshot of Trados Studio project settings showing 'Contenu incorpore' highlighted with regular expression rules for embedded content.

But as you can see, there is still HTML code embedded in the text (like for non breaking spaces, typographic apostrophes, em-dash, etc.):

Screenshot of an Excel file in Trados Studio with HTML code visible, such as ' ' for non-breaking spaces and '—' for em-dash.

Is there a way to deal with that in Studio 2022? Here is a sample of the file in question:

Sample Excel.xlsx

 I know you have answered a few questions like this one already, but I can't find anything on embedded HTML code with & and ;

Thank you!



Generated Image Alt-Text
[edited by: RWS Community AI at 4:02 AM (GMT 1) on 24 Oct 2024]
emoji
Parents
  •  

    The regex expression provided there will cover most tags as inline individual tags. It is not html tagging, but will work in most cases. However, what you have are not tags, but so called entities. These are not covered by the regex expression. You can add &[^;]*?; to capture also the entities. This must be done BEFORE you add the Excel file.

    Other, maybe better solution, would be using Excel Multilingual, where you can add html filter directly to process all tags and entities in proper way. Should this be not feasible, I would modifiy the file type for Excel by adding tags like they are in html with tag pairs and so on. Means more work, but much better results.

    _________________________________________________________

    When asking for help here, please be as accurate as possible. Please always remember to give the exact version of product used and all possible error messages received. The better you describe your problem, the better help you will get.

    Want to learn more about Trados Studio? Visit the Community Hub. Have a good idea to make Trados Studio better? Publish it here.

    emoji
Reply
  •  

    The regex expression provided there will cover most tags as inline individual tags. It is not html tagging, but will work in most cases. However, what you have are not tags, but so called entities. These are not covered by the regex expression. You can add &[^;]*?; to capture also the entities. This must be done BEFORE you add the Excel file.

    Other, maybe better solution, would be using Excel Multilingual, where you can add html filter directly to process all tags and entities in proper way. Should this be not feasible, I would modifiy the file type for Excel by adding tags like they are in html with tag pairs and so on. Means more work, but much better results.

    _________________________________________________________

    When asking for help here, please be as accurate as possible. Please always remember to give the exact version of product used and all possible error messages received. The better you describe your problem, the better help you will get.

    Want to learn more about Trados Studio? Visit the Community Hub. Have a good idea to make Trados Studio better? Publish it here.

    emoji
Children
No Data