How to handle a project with 80.000 words/200 files with many repetitions?

Dear colleagues,

Does someone have a good working procedure for a huge project of 80.000 words in Studio 2014. I don’t know how to tackle the project to work quickly and efficiently.

The project consists of 200 files; only 27.000 words out of the 80.000 words are ‘new/no match’.

Most other things are internal repetitions, but the problem about these repetitions is that individual words are often a titel (capital letter), in other cases an image legend (small letter), etc, so they should be regarded case by case...

Do you think it makes sense to export for instance all the text that are no internal repetitions to a separate file and that I first translate this single Studio file separately before working myself through all the 80.000 words.

Or do you suggest another working method that would go fast? The main aim is to not have to read again the 40.000 repeated words again and again.

Your meaning is highly appreciated.

Have a nice day,


  • Hi Phil,

    I'll work on the basis that you still want to see these segments as the context may be useful.  So in this case, if it was me I'd do this:

    • First I'd run the analyze batch task and "Export frequent segments"
    • This will create a folder in your Project Folder called "Exports" and in there you'll find an sdlxliff that contains the segments that are repeated throughout your project.
    • Add this file to your Project and translate it first.  Then run a pretranslate across your entire project... maybe lock 100% matches automatically too and then when you come to translate the files these segments will be skipped as you work so you don't have to worry about them.
    I don't know what type of content you have and this may change the approach a little.  If for example it was full of number only segments then I'd probably handle these first using the SDLXLIFF Toolkit.
    But perhaps this will help, and maybe someone else can suggest a better way based on their experience.

    Paul Filkin | RWS Group

    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Hey Paul,

    Thanks for your answer.

    In this case, the export file will only contain the "most frequently" repeated segments, so in the 'second step' (processing all the other files) there can still remain many many words to translate that only occur once and therefore were not integrated in the export file (> so in this case there would be two translation steps + 1 revision step).

    In the ideal case, there should be a workflow by which it is possible to export all the 'no match' text (to translate) to an external file (+ creating a separate TM) and once this is done pretranslating the 80.000 words with this separate TM.

    In the final step (proofreading phase) it should then be possible that once a (frequently repeated) proofread segment is completely ok, it should be applied like this in all the remaining files + + possibly blocked and highlighted in another colour. Especially this last step is crucial in my eyes for the efficiency. Is this possible in Studio?

    Forgive me my English if there are mistakes in expressing myself.

    Thanks for any suggestions,


  • Hey Paul,

    Thanks for your answer.

    In this case, the export file will only contain the "most frequently" repeated segments, so in the 'second step' (processing all the other files) there can still remain many many words to translate that only occur once and therefore were not integrated in the export file (> so in this case there would be two translation steps + 1 revision step).

    In the ideal case, there should be a workflow by which it is possible to export all the 'no match' text (to translate) to an external file (+ creating a separate TM) and once this is done pretranslating the 80.000 words with this separate TM.

    In the final step (proofreading phase) it should then be possible that once a (frequently repeated) proofread segment is completely ok, it should be applied like this in all the remaining files + + possibly blocked and highlighted in another colour. Especially this last step is crucial in my eyes for the efficiency. Is this possible in Studio?

    Forgive me my English if there are mistakes in expressing myself.

    Thanks for any suggestions,

