Preserving ASCII line breaks in HTML files

HTML files created by our client contain line breaks CRLF - (ASCII #13 and #10). They are inserted to make the source and its translation easier to view and compare in a text editor.
Studio removes/ignores these codes.

In the example below, the ¶ is only inserted to show structure, it will not be visible in a browser:

This is the first sentence.¶
This is the second sentence.

The translated file looks like this:
Dies ist der erste Satz. Dies ist der zweite Satz.

I would like to find a way to make Studio identify CRLF and make it into an internal tag so that it gets preserved. Can anyone think of a way to accomplish this?

Parents
  • The solution Paul suggested is usually the way to go. However please note, that this means, that whitespace will always be preserved. That is, it will preserve the line break in your example, but it will also preserve the white space where it might not be what you want. E.g. it would also preserve the whitespace if there are "simple breaks" inserted by the html editor automatically. Like this:

    <p>In the example below, the ¶
    is only inserted to show structure, ¶
    it will not be visible in a browser:</p>

    If you/the client wants to workaround this conflict and distinguish between "intentional" and "automatic" line breaks, you can use the xml:space="preserve" attribute to explicitly declare that. This would be more "clean" and you can use the »Normalize unless xml:space="preserve"« option:

    In this case the option would import this:

    <p>In the example below, the ¶
    is only inserted to show structure, ¶
    it will not be visible in a browser:</p>

    as this in Studio (as one segment):

    In the example below, the is only inserted to show structure, it will not be visible in a browser:

    but this would be imported in Studio as this (in one segment, but with breaks):

    In the example below, the ¶
    is only inserted to show structure, ¶
    it will not be visible in a browser:

    Hope that helps.

Reply
  • The solution Paul suggested is usually the way to go. However please note, that this means, that whitespace will always be preserved. That is, it will preserve the line break in your example, but it will also preserve the white space where it might not be what you want. E.g. it would also preserve the whitespace if there are "simple breaks" inserted by the html editor automatically. Like this:

    <p>In the example below, the ¶
    is only inserted to show structure, ¶
    it will not be visible in a browser:</p>

    If you/the client wants to workaround this conflict and distinguish between "intentional" and "automatic" line breaks, you can use the xml:space="preserve" attribute to explicitly declare that. This would be more "clean" and you can use the »Normalize unless xml:space="preserve"« option:

    In this case the option would import this:

    <p>In the example below, the ¶
    is only inserted to show structure, ¶
    it will not be visible in a browser:</p>

    as this in Studio (as one segment):

    In the example below, the is only inserted to show structure, it will not be visible in a browser:

    but this would be imported in Studio as this (in one segment, but with breaks):

    In the example below, the ¶
    is only inserted to show structure, ¶
    it will not be visible in a browser:

    Hope that helps.

Children
No Data