Extracting only translatable text from HTM file

Hi everyone,

A bit of forewarning: I’ve never worked with HTML/HTM/XML and I can’t seem to make sense of parsing and attributes and whatnot, so I apologize if this is a stupid question!

We’ve received a document in an HTM format. It looks like this:

 

1

<table style="border-width: 0px; width: 100%; font-family: Times New Roman, Times, serif; font-size: 11pt; border-collapse: collapse; border-spacing: 0px;">

2

     <tbody>

3

         <tr>

4

             <td style="width: 6.5in;">

5

             <h2 style="font-family: Times New Roman, Times, serif; font-size: 11pt; font-weight: bold;"><span>4.3</span> <span style="padding-left: 0.78in;">Acceptance of Amendment</span></h2>

6

             </td>

7

         </tr>

8

         <tr>

9

             <td>&#160;</td>

10

         </tr>

11

     </tbody>

12

</table>

13

 

14

<table style="border-width: 0px; width: 100%; border-collapse: collapse; border-spacing: 0px;">

15

     <tbody>

16

         <tr style="font-family: Times New Roman, Times, serif; font-size: 11pt;">

17

             <td style="width: 1in; font-weight: bold;">&#160;</td>

18

             <td style="width: 5.5in; text-align: justify;">

19

             <div th:if="${policy.groupInfo.situsState == 'QC'}"><span>REDACTED may from time to time unilaterally amend this Policy. The Policyholder will be provided with a copy of the Amendment and the Effective Date of the Amendment.</span><br />

 

In this example, only the following needs to be translated: "Acceptance of Amendment" and "REDACTED may from time to time unilaterally amend this Policy. The Policyholder will be provided with a copy of the Amendment and the Effective Date of the Amendment." But when I run it through Trados Studio, I see all the text copied above. How do I extract only what needs to be translated?

Thanks!

Parents
  • Hi  

    Just to follow up with the comment from Evzen... I created an HTML file with your text and see this when I open it in Studio:

    Screenshot of Trados Studio showing HTML file with text '4.3 Acceptance of Amendment' and a paragraph with redacted content.

    If you are seeing all the text then there is either something wrong with your actual file, or your filetype settings have gone awry.  Can you share the actual file perhaps?  You can email it to pfilkin@sdl.com if this is ok.  Or test my sample here:

    <!DOCTYPE html>
    <html>
    <body>
    <table style='border-width: 0px; width: 100%; font-family: Times New Roman, Times, serif; font-size: 11pt; border-collapse: collapse; border-spacing: 0px;'>
      <tbody>
       <tr>
          <td style='width: 6.5in;'>
              <h2 style='font-family: Times New Roman, Times, serif; font-size: 11pt; font-weight: bold;'><span>4.3</span> <span style='padding-left: 0.78in;'>Acceptance of Amendment</span></h2>
           </td>
        </tr>
        <tr>
            <td>&#160;</td>
         </tr>
        </tbody>
    </table>
    <table style='border-width: 0px; width: 100%; border-collapse: collapse; border-spacing: 0px;'>
       <tbody>
         <tr style='font-family: Times New Roman, Times, serif; font-size: 11pt;'>
            <td style='width: 1in; font-weight: bold;'>&#160;</td>
            <td style='width: 5.5in; text-align: justify;'>
            <div th:if="${policy.groupInfo.situsState == 'QC'}">
            <span>REDACTED may from time to time unilaterally amend this Policy. The Policyholder will be provided with a copy of the Amendment and the Effective Date of the Amendment.</span><br />
           </div>
           </td>
          </tr>
         </tbody>
        </table>
    </body>
    </html>
    
    

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub



    Generated Image Alt-Text
    [edited by: Trados AI at 4:06 AM (GMT 0) on 5 Mar 2024]
  • Hi Paul,
    I’m starting to think the HTM file I was provided with isn’t an actual HTML, but rather text that has been copy-pasted into something else (like a TXT file) and then saved as HTM... Because it doesn’t look like your sample at all. The numbers appear above each line rather than next to it.
    I will contact the client.
    Thank you!
Reply Children
No Data