.Xlf from Typo3 contains html codes

We received .xlf files from Typo3 that contain a lot of html codes. Is there any chance to handle that in Studio?

Here is an example:

0px 0px 15px; padding:
0px; border:
0px; font-style: normal; font-variant: normal; font-weight: normal; font-stretch: inherit; font-size:
12.012px; line-height:
15.6156px; font-family:
Arial, sans-serif; vertical-align: baseline; color: rgb(102, 102, 102); letter-spacing: normal; orphans: auto; text-align: start; text-indent:
0px; text-transform: none; white-space: normal; widows:
1; word-spacing:
0px; -webkit-text-stroke-width:
0px; background-color: rgb(255, 255, 255); ">M lieferte als Ersatz einen Kaltwassersatz mit einer Leistung von 290 kW sowie sämtliche Schläuche und Elektrokabel.</p>
Swiss Nutrition Hochdorf, Werk Sulgen
<div class="reference-header" style="margin:
0px 0px 10px; padding:
10px 0px; border-width:
1px 0px; border-top-style: solid; border-bottom-style: solid; border-top-color: rgb(235, 237, 238); border-bottom-color: rgb(235, 237, 238); font-style: normal; font-variant: normal; font-weight: normal; font-stretch: inherit; font-size:

  • To give an Impression of how little of the selection Needs translating I've highlighted in red which bits won't need doing

    0px 0px 15px; padding:
    0px; border:
    0px; font-style: normal; font-variant: normal; font-weight: normal; font-stretch: inherit; font-size:
    12.012px; line-height:
    15.6156px; font-family:
    Arial, sans-serif; vertical-align: baseline; color: rgb(102, 102, 102); letter-spacing: normal; orphans: auto; text-align: start; text-indent:
    0px; text-transform: none; white-space: normal; widows:
    1; word-spacing:
    0px; -webkit-text-stroke-width:
    0px; background-color: rgb(255, 255, 255); ">M lieferte als Ersatz einen Kaltwassersatz mit einer Leistung von 290 kW sowie sämtliche Schläuche und Elektrokabel.</p>
    Swiss Nutrition Hochdorf, Werk Sulgen
    <div class="reference-header" style="margin:
    0px 0px 10px; padding:
    10px 0px; border-width:
    1px 0px; border-top-style: solid; border-bottom-style: solid; border-top-color: rgb(235, 237, 238); border-bottom-color: rgb(235, 237, 238); font-style: normal; font-variant: normal; font-weight: normal; font-stretch: inherit; font-size:


    The code seems very formatting heavy, but would presume that what you want is in between the <p> or in <div> tags. Regular expressions (RegExes) will be your friend. Otherwise is there a possibility to save as HTML - since that will probably make it easier to detect the tags.

    Hopefully this link will help you: www.pagecolumn.com/.../all_about_html_tags.htm

  • Hi Michael,

    Interesting choice of regex there. I don't think any of these expressions will work in Studio as they are, but perhaps they will be helpful in terms of learning something. Studio uses .NET and I believe these are some kind of Posix or PCRE regex and they have unusual flags that we do not support.

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Thank you very much for your support, Michael
  • Hi Paul,

    thanks a lot. Does that me there is no chance to handle that file properly?
    That I have to carefully search for the "black colored" text in order to translate this file the client sent?
    (I already asked him to provide another file format)

    Regards
    Daniela
  • Hi Daniela,

    Not at all... it just meant you might need to change the odd thing in these expressions to make them work with .NET. Just a different flavour of regex, albeit useful to see what might be required if you have to use regex.

    I think it's very hard to know what you have to do based on the detail you posted. A lot depends on what the overall XLF file looks like. So is the source equal to the target, empty target, or is it already partially translated? If you can handle this as a custom XML file it's going to easier, but as Michael has mentioned already this is a pretty messy file.

    Can you share a sample file? email it to me or post it into the forum... pfilkin@sdl.com

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Hi Daniela

    An alternative would be to use the "Localization Manager" of Typo3 and export as XML instead of XLIFF.

    This is the way I know most people deal with typo3 content in Studio.

    You can find more information here:
    www.loctimize.com/.../l10n-manager.html

    Walter
  • Hi Just wanted to share the solution we looked at for this particular problem. 

    1. First of all, the xlf file provided had no target elements at all in it.  We needed a target in there so we could create an xml filetype handling embedded content and just translate the target.
    2. So, we did this with a decent text editor that supports Regex.  Search for this:
      (<trans-unit id=".*?"><source>)(.*?)(</source>)(</trans-unit>)
      And replace with this:
      $1$2$3<target>$2</target>$4
    3. This gave us an xlf file that now contains target elements and they all contain a copy of the source
    4. We could then create a new xml filetype that extracted text from the target element, and we added an embedded content filter using the html filetype

    Now the file could be opened and all the tagged text was protected.

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Hi Paul,

    I had used this exact method on a different xliff file type and it had saved my life.  This reminded me the inability to use embedded content processors for all kinds of file types in Studio. If am not wrong, embedded content processors can be only used for xml file types. This might be logical but there is one feature in Memoq that I want to see in Studio too. It is called regex tagger. Sometimes we receive an sdlxliff file containing too many unprotected tags. I think it would be great for us if Studio can also offer us an option to protect certain words or characters as tags for each kind of filetypes. What do you think?

    Please let me know if you think this should be discussed in a separate topic.

  • Hi Sinan,

    I know it well... nice feature. We do have something similar that was created by on the appstore. .. appstore.sdl.com/.../

    But in reality neither regex tagger ot this app are going to be helpful with a heavily tagged file so I think this approach, in the absence of an embedded content solution for XLIFF is probably better.

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub