Any advice on how to separate, in a Word file, source paragraphs followed sequentially by their translation

Hello,

I wonder if anyone may have a suggestion to solve this problem I’m facing:

I have received a large Word file in which a paragraph in the source language is followed by the corresponding translation, followed by another paragraph in the source language which is followed by its translation, so on and so forth. That’s the only document available.

Out of this file we need to create a translation memory with the translations provided within.

Using Alignment with the file as it is would be beyond messy, I think.

Besides brute force, is there by any chance some way of separating/extracting the source language from the translations?

Thank you in advance for any advice/suggestion you may have.

Gilberto

Parents
  • Hi 

    Create 2 copies of the Word file, named to indicate first English and second Spanish.

    In the English document use Find and Replace as follows:

    Ctrl+H - opens 'Find and Replace' to the Replace tab.

    Find what: > Format > Language > Spanish (selecting the version of Spanish your document has)

    and

    Replace with: > Format > Language > English (selecting the version of English your document has)

    Leave the 'Find what' line blank and in the 'Replace with' line, type ^p

    Then click 'Replace All'

    This will allow you to run a Find and Replace for the Spanish text, replace it with a paragraph mark that should then then leave each English entry beginning on a new line.

    Finally, highlight the whole document and double-click on the language title on the bottom bar, which opens the Language dialog where you can 'Mark selected text' as English. Then click OK.

    Repeat the process in the second file to delete the English text fully and make the whole document Spanish.

    Then you should be able to use Alignment to produce an SDLXLIFF.

    You can then check this in the Studio Editor with a new TM added so you can confirm each segment as you check it. Or simply import the SDLXLIFF to a new TM.

    You may have to use 'trial and error' to make the process work better depending on the textual content.

    See here for a description of Translation Alignment: www.trados.com/.../https://www.trados.com/solutions/translation-alignment/

    Let us know if this works OK,

    All the best,

    Ali Slight smile

  • Thank you very much, Ali, for your suggestion.


    I tried to isolate the English and Spanish text the way you mentioned but unfortunately the document is very randomly formatted and, when trying to find either Spanish or English texts a lot of text was missed in both languages.

    Maybe I'll try a combination of whatever this method can identify as well as some brute force on the segments missed by the finder.

    Thanks again.

    Gilberto

Reply Children