Help with custom XML filter which no longer works

A custom XML filter for a regular project does not work for a new export from the client. The error message in Trados says 'Unable to open input file for translation. Invalid syntax found at line: 1, column: 59'. The filter worked for the last export (November last year) but not the new export. The client said that nothing has changed from their side in terms of how they export the file. Please could you provide some advice for trying to find a workaround so that I can analyse the XML in Trados? I am using 2022.

Thanks in advance,

Jane

emoji
Parents
  •  

    What is in line 1., column 59?  Maybe share line 1.

    Or can you share your xml file?  If you can email it to pfilkin at sdl dotcom I'd be happy to take a look at it?

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Thanks for coming back to me quickly! The client said that they may have found a workaround. If not, I'll come back to you.

    emoji
  •   

    Thanks for sending me the file.  I  got around to investigating it this evening and the problem is indeed with the source file.  This segment:

      <entry>
        <className>someclass</className>
        <recordId>xxxxxxxxx</recordId>
        <originalText><![CDATA[<HTML><HEAD></HEAD><BODY>Note: The clearance is too low <7.5m.<br></BODY></HTML>]]></originalText>
        <translation><![CDATA[<HTML><HEAD></HEAD><BODY>Note: The power supply is too low <7.5m.<br></BODY></HTML>]]></translation>
        <translationVersion>2</translationVersion>
      </entry>

    The less than symbol is a reserved character in html so you should be using this:

    <HTML><HEAD></HEAD><BODY>Note: The power supply is too low  &lt;7.5m.<br></BODY></HTML>

    It was this only entry.  Change that and the whole file processes without a problem.  The reason you may not have found this is because it was within a CDATA section in the xml.  So the xml parser would have treated this as plain text.  But as soon as you run it through the embedded html processor the error is thrown.  Interestingly this forum code parser also picks it up and flags these lines with a cross :-)

    I changed the content of the text a little to ensure it's a bland example, but if you look on lines 19681  and 19682 of the original source file you'll see the problem and can correct it.

    I also recreated your filetype and simplified it a bit... it's notably faster.  I'll send you the updated filetype and also my corrected filetype in case you have a problem so you can test them.

    Oh yes... I fund the problem by using the "divide and conquer" technique.  Once the file failed I split it, tested both halves, then split the one that failed and did it again... and again etc.  It's surprisingly fast and I was able to spot the problem before I got down to the very end.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
Reply
  •   

    Thanks for sending me the file.  I  got around to investigating it this evening and the problem is indeed with the source file.  This segment:

      <entry>
        <className>someclass</className>
        <recordId>xxxxxxxxx</recordId>
        <originalText><![CDATA[<HTML><HEAD></HEAD><BODY>Note: The clearance is too low <7.5m.<br></BODY></HTML>]]></originalText>
        <translation><![CDATA[<HTML><HEAD></HEAD><BODY>Note: The power supply is too low <7.5m.<br></BODY></HTML>]]></translation>
        <translationVersion>2</translationVersion>
      </entry>

    The less than symbol is a reserved character in html so you should be using this:

    <HTML><HEAD></HEAD><BODY>Note: The power supply is too low  &lt;7.5m.<br></BODY></HTML>

    It was this only entry.  Change that and the whole file processes without a problem.  The reason you may not have found this is because it was within a CDATA section in the xml.  So the xml parser would have treated this as plain text.  But as soon as you run it through the embedded html processor the error is thrown.  Interestingly this forum code parser also picks it up and flags these lines with a cross :-)

    I changed the content of the text a little to ensure it's a bland example, but if you look on lines 19681  and 19682 of the original source file you'll see the problem and can correct it.

    I also recreated your filetype and simplified it a bit... it's notably faster.  I'll send you the updated filetype and also my corrected filetype in case you have a problem so you can test them.

    Oh yes... I fund the problem by using the "divide and conquer" technique.  Once the file failed I split it, tested both halves, then split the one that failed and did it again... and again etc.  It's surprisingly fast and I was able to spot the problem before I got down to the very end.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
Children