Issue with Backslashes Doubling in Markdown Export from Trados

I encountered an issue while translating Markdown documents in Trados.

When I export the translated text, all the backslash (\) characters are being converted into double backslashes (\\). 

before:

Screenshot of Trados Studio showing a mathematical equation with single backslashes in the source text.

after:

Screenshot of Trados Studio displaying the same mathematical equation with double backslashes in the exported translated text.

I need to know what is causing this issue and how I should resolve it.



Generated Image Alt-Text
[edited by: Trados AI at 1:36 PM (GMT 0) on 29 Feb 2024]
emoji
  •  

    Actually I suspect this is as designed because the \ character needs to be escaped.
    How familiar are you with Markup and is this the first time you have noticed this happening?

    Lyds
     

    Lydia Simplicio | RWS Group

    _______
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Lyds:

    I found this problem today when I try to translate a md file directly.

    To be honest, I have no idea about Markup situation, but this should not happen in markdown formulas.

    emoji
  •  

    The first problem is that your markdown file contains a LaTex math equation and Trados Studio doesn't support it anyway.  So when you open it and preview it you see this:

    Screenshot of the MD file open in Trados Studio and the preview generated.

    If you open the same file with a MarkDown editor that supports LaTex you see this:

    Screenshot of the MD file open in Typora displaying a Maths Equation created from LaTex markup.

    The second problem is that Studio seems to have a bug (I believe) and is not handling the escape characters correctly in the LaTex block and this is causing the target file to get messed up.

    The only way I could find to handle this would be to wrap the LaTex Math Equations so they are seen as a code block by Studio and then use the embedded content filter to hide them altogether... which I presume is ok as they are probably not translated?  If they are then you can still take this approach but you'd have to write specific regex rules to handle the LaTex and this could get complicated.

    So for example your code looks like this:

    But if I wrap it with backticks to tell the processor it's a code block like this:

    Then now I can use the embedded content processor in the Markdown filetype to process the file and exclude all the content in the code block from being extracted for translation at all:

    Screenshot showing the Mardown filetype options and the embedded content processor is selected and set to use a custom processor called LaTex

    1. Under the filetype settings I activated the "Translate code blocks" and selected a custom embedded content processor I created to handle the content of the code block
    2. Under the Embedded Content Processors I created a custom "Plain Text" filetype and called it LaTex

    The LaTex filetype I created just does one thing. It has a single placeholder rule:

     .*

    This will select everything in each line and make it non-translatable.  Now when I open the file with the equation wrapped in back ticks I see this and can translate it safely:

    Screenshot showing the edited MD file in the Studio editor and fully translated into Romanian.

    When I save the target, and remove the backticks, then preview it in my MD editor it looks like this:

    Screenshot showing the correctly translated file and the LaTex Match Equation still intact.

    All good.  This was the only way I found to manage this for now... maybe someone has a better idea?

    If you have many files, or many of these equations in your file then you could add the backticks with a script and also remove them with a script.  The nly real drawback of this method is that if you happen to have some code blocks already in your file then you might have to et a bit more clever with the custom LaTex filetype I used to handle the code block content and create specific rules to handle them rather than a catch all.

    In the meantime I'll report the problem with the handling of the characters to support.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Actually, I have found a way to solve this problem temporarily. I just changed the markdown file into a txt file, and the structure can be recognized.

    Screenshot of Trados Studio showing a text with mathematical equations and variables, with 'NMT' labels indicating machine translation on the right side in English and Chinese.

    But it is still a temporary solution, the problem needs to be fixed.

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 1:36 PM (GMT 0) on 29 Feb 2024]
  •   

    That's essentially what I did in the solution above, but only for the specific code you shared.  If your way works for you that's good... but if you had a lot of proper markdown in the file in addition to these maths equations then you lose the recognition by using a text file for the whole file.

    But it is still a temporary solution, the problem needs to be fixed.

    Of course... and I have already logged it.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  •   

    In my response I said this in response to you:

    But it is still a temporary solution, the problem needs to be fixed.

    Of course... and I have already logged it.

    After some discussion with the team I now agree with them that this isn't a bug.  Markdown uses backslashes to escape many characters... for example:

    \ backslash
    ` backtick
    * asterisk
    _ underscore
    {} curly braces
    [] square brackets
    () parentheses
    # hash mark
    + plus sign
    - minus sign (hyphen)

    This means, as Lydia mentioned, that you would have to escape each of the backslashes in your document in order to have them treated as a backslash.  However, Trados Studio does not have any mechanism for doing this because it tries, in its wisdom, to remove the need for this by automatically tagging a backslash when it sees it, for your convenience. This is fine when you just have one or two scattered in your file, but when you have something as complex as your LaTex equation it clearly causes problems of inconsistent behaviour which we will not attempt to address because this really requires an enhancement to properly support LaTex equations by recognising the $$ at the start and end of your code.

    So I recommend you raise enhancement request for this here:

     Trados Studio Ideas 

    I think the solution could be as simple as adding the $$ notation to the concept of a code block which should be trivial, or as complex as adding proper LaTex support so the equation is also rendered in the preview.  But the product manager can review your idea when you post it.

    In the meantime your workarounds are these:

    1. convert to txt and create all the rules you need manually, or

    2. add back ticks outside your LaTex block so that the equations can be handled as needed.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji