Protecting "\n" in XML files

Hi there,

I have to import some XML files which have a structure like:

<resources>
<string name="label_transfer_ui_message_warning">Once the code is generated the selected feature(s) will be locked from use.</string>
<string name="label_transfer_contact_dealer_message">To complete the transfer process contact your dealer by phone, or in person. \n\n You will need to provide your transfer code as well as old and new device serial numbers.</string>
</resources>

I have already created a custom XML file type and activated the HTML embedded content processing, as we have CDATA elements with HTML tags in them.

What I'm left with is handling these "\n" elements. I'd like to convert these line break elements into tags, in order to protect them.

How can I do this? I don't see any way to do it via xpath, and I cannot see anywhere to use regular expressions...

Thank you,
Enrico

Parents
  • You do it like this.  I used this example file:

    <?xml version="1.0" encoding="UTF-8"?>
    <rootelement>
      <resources>
    <string name="label_transfer_ui_message_warning">Once the code is generated the selected feature(s) will be locked from use.</string>
    <string name="label_transfer_contact_dealer_message">To complete the transfer process contact your dealer by phone, or in person. \n\n You will need to provide your transfer code as well as old and new device serial numbers.</string>
    </resources>
      <MC>
    <![CDATA[
    <html>
      <head>
        <title>Div Align Attribbute</title>
      </head>
      <body>
        <div align="left">
          Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut
          labore et dolore magna aliqua.
        </div>
        <div align="right">
          Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut
          labore et dolore magna aliqua.
        </div>
        <div align="center">
          Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut
          labore et dolore magna aliqua.
        </div>
        <div align="justify">
          Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut
          labore et dolore magna aliqua.
        </div>
      </body>
    </html>
    ]]>
      </MC>
     </rootelement>

    And using segmentation rules on my TM I can achieve this:

    Screenshot of Trados Studio showing XML code with warning messages about feature lock and dealer contact instructions.

    Your problem is that you can EITHER use the html embedded content processor for HTML OR text (and regex).  So if you use the HTML embedded content processor you need to use something else for these characters.  I used segmentation rules like these:

    Screenshot of Trados Studio's Segmentation Rules window with a red error indicator next to 'Full stop rule'.

    Screenshot of Trados Studio's Edit Segmentation Rule window with a regular expression for line breaks.

    Screenshot of Trados Studio's Edit Segmentation Rule window with a modified regular expression for line breaks.

    Then I just need to filter out the \n\n which is straightforward enough.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub



    Generated Image Alt-Text
    [edited by: Trados AI at 4:38 AM (GMT 0) on 5 Mar 2024]
Reply
  • You do it like this.  I used this example file:

    <?xml version="1.0" encoding="UTF-8"?>
    <rootelement>
      <resources>
    <string name="label_transfer_ui_message_warning">Once the code is generated the selected feature(s) will be locked from use.</string>
    <string name="label_transfer_contact_dealer_message">To complete the transfer process contact your dealer by phone, or in person. \n\n You will need to provide your transfer code as well as old and new device serial numbers.</string>
    </resources>
      <MC>
    <![CDATA[
    <html>
      <head>
        <title>Div Align Attribbute</title>
      </head>
      <body>
        <div align="left">
          Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut
          labore et dolore magna aliqua.
        </div>
        <div align="right">
          Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut
          labore et dolore magna aliqua.
        </div>
        <div align="center">
          Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut
          labore et dolore magna aliqua.
        </div>
        <div align="justify">
          Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut
          labore et dolore magna aliqua.
        </div>
      </body>
    </html>
    ]]>
      </MC>
     </rootelement>

    And using segmentation rules on my TM I can achieve this:

    Screenshot of Trados Studio showing XML code with warning messages about feature lock and dealer contact instructions.

    Your problem is that you can EITHER use the html embedded content processor for HTML OR text (and regex).  So if you use the HTML embedded content processor you need to use something else for these characters.  I used segmentation rules like these:

    Screenshot of Trados Studio's Segmentation Rules window with a red error indicator next to 'Full stop rule'.

    Screenshot of Trados Studio's Edit Segmentation Rule window with a regular expression for line breaks.

    Screenshot of Trados Studio's Edit Segmentation Rule window with a modified regular expression for line breaks.

    Then I just need to filter out the \n\n which is straightforward enough.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub



    Generated Image Alt-Text
    [edited by: Trados AI at 4:38 AM (GMT 0) on 5 Mar 2024]
Children
No Data