.Xlf from Typo3 contains html codes

We received .xlf files from Typo3 that contain a lot of html codes. Is there any chance to handle that in Studio?

Here is an example:

0px 0px 15px; padding:
0px; border:
0px; font-style: normal; font-variant: normal; font-weight: normal; font-stretch: inherit; font-size:
12.012px; line-height:
15.6156px; font-family:
Arial, sans-serif; vertical-align: baseline; color: rgb(102, 102, 102); letter-spacing: normal; orphans: auto; text-align: start; text-indent:
0px; text-transform: none; white-space: normal; widows:
1; word-spacing:
0px; -webkit-text-stroke-width:
0px; background-color: rgb(255, 255, 255); ">M lieferte als Ersatz einen Kaltwassersatz mit einer Leistung von 290 kW sowie sämtliche Schläuche und Elektrokabel.</p>
Swiss Nutrition Hochdorf, Werk Sulgen
<div class="reference-header" style="margin:
0px 0px 10px; padding:
10px 0px; border-width:
1px 0px; border-top-style: solid; border-bottom-style: solid; border-top-color: rgb(235, 237, 238); border-bottom-color: rgb(235, 237, 238); font-style: normal; font-variant: normal; font-weight: normal; font-stretch: inherit; font-size:

Parents
  • Hi Just wanted to share the solution we looked at for this particular problem. 

    1. First of all, the xlf file provided had no target elements at all in it.  We needed a target in there so we could create an xml filetype handling embedded content and just translate the target.
    2. So, we did this with a decent text editor that supports Regex.  Search for this:
      (<trans-unit id=".*?"><source>)(.*?)(</source>)(</trans-unit>)
      And replace with this:
      $1$2$3<target>$2</target>$4
    3. This gave us an xlf file that now contains target elements and they all contain a copy of the source
    4. We could then create a new xml filetype that extracted text from the target element, and we added an embedded content filter using the html filetype

    Now the file could be opened and all the tagged text was protected.

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Hi Paul,

    I had used this exact method on a different xliff file type and it had saved my life.  This reminded me the inability to use embedded content processors for all kinds of file types in Studio. If am not wrong, embedded content processors can be only used for xml file types. This might be logical but there is one feature in Memoq that I want to see in Studio too. It is called regex tagger. Sometimes we receive an sdlxliff file containing too many unprotected tags. I think it would be great for us if Studio can also offer us an option to protect certain words or characters as tags for each kind of filetypes. What do you think?

    Please let me know if you think this should be discussed in a separate topic.

Reply
  • Hi Paul,

    I had used this exact method on a different xliff file type and it had saved my life.  This reminded me the inability to use embedded content processors for all kinds of file types in Studio. If am not wrong, embedded content processors can be only used for xml file types. This might be logical but there is one feature in Memoq that I want to see in Studio too. It is called regex tagger. Sometimes we receive an sdlxliff file containing too many unprotected tags. I think it would be great for us if Studio can also offer us an option to protect certain words or characters as tags for each kind of filetypes. What do you think?

    Please let me know if you think this should be discussed in a separate topic.

Children