How to split .tmx/translation memory by size?

Hi!

I would like to split a large .tmx by size into small amounts. How to do that?

emoji
  •  

    A TMX is a flat file so the easiest way is probably to simply use a text editor.  Just work out how many you need based on your size requirements, then convert this into how many lines each file when split would contain, and split it up like that.  Manual but not difficult.

    Is that enough of an explanation for you?  I don't know how comfortable you might be with xml (a TMX is an xml file) files or working with a text editor.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Hi  there are ways to split TMX files into several smaller files. However, you cannot set the exact size of each file, you just set how many files you want the main TMX file to be splitter into, which obviously results in TMX files of smaller size. For example, check this article:

    https://gateway.sdl.com/apex/communityknowledge?articleName=000001019

    Another method is to use Heartsome TMX.

    https://github.com/heartsome/tmxeditor8

    Click on: All installer: Microsoft OneDrive, Dropbox (the other links don't work). This is a GPL license so it can be used and distributed for free.
    Right installer: HSTMXEditor_8_0_1_Win_x86.zip

    However: For huge TMX files, you might get an error. The error will point you to the solution, which is to increase the memory by changing the Heartsome TMX Editor.ini file and increasing

    -Xmx512m

    to

    -Xmx1024m

    Once done: Open the tool. In the File menu, you have Split TMX. Choose your TMX file and the number of pieces you want it split into.
    You can also merge multiple TMX files from the File Menu, by selecting Merge TMX.

    Note that all these methods are based on non-RWS tools that you can apply at your own risk. We do not provide support for these tools/scripts.

    I hope this helps you,

    Caterina

    emoji
  • Hi   Must I extract "split_tmx_v2.zip" to the folder where the .tmx is? 

    emoji
  •  

    If it helps... I just used ChatGPT to create a powershell script to do this.  Script is here along with the sample TMX I used:

    https://github.com/paulfilkin/Powershell_scripts/tree/main/split_TMX

    Also a short video to explain how to use it:

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • I received these errors:

    At C:\Users\shpctac0fffe\SplitTMX.ps1:41 char:60
    + ... script type="application/json" id="client-env">{"locale":"en","featur ...
    +                                                             ~~~~~
    Unexpected token ':"en"' in expression or statement.
    At C:\Users\shpctac0fffe\SplitTMX.ps1:41 char:65
    + ... cript type="application/json" id="client-env">{"locale":"en","feature ...
    +                                                                 ~
    Missing argument in parameter list.
    At C:\Users\shpctac0fffe\SplitTMX.ps1:41 char:716
    + ... tions","custom_inp","remove_child_patch","kb_source_repos"]}</script>
    +                                                                 ~
    The '<' operator is reserved for future use.
    At C:\Users\shpctac0fffe\SplitTMX.ps1:240 char:84
    + ... tion/json" data-target="react-partial.embeddedData">{"props":{"docsUr ...
    +                                                                 ~
    Unexpected token ':' in expression or statement.
    At C:\Users\shpctac0fffe\SplitTMX.ps1:240 char:95
    + ... ":{"docsUrl":"https://docs.github.com/get-started/accessibility/keybo ...
    +                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Unexpected token ':"https://docs.github.com/get-started/accessibility/keyboard-shortcuts"' in expression or statement.
    At C:\Users\shpctac0fffe\SplitTMX.ps1:240 char:168
    + ... s.github.com/get-started/accessibility/keyboard-shortcuts"}}</script>
    +                                                                 ~
    The '<' operator is reserved for future use.
    At C:\Users\shpctac0fffe\SplitTMX.ps1:279 char:205
    + ... ink Button--medium Button d-lg-none color-fg-inherit p-1">  <span cla ...
    +                                                                 ~
    The '<' operator is reserved for future use.
    At C:\Users\shpctac0fffe\SplitTMX.ps1:302 char:45
    +             <ul class="list-style-none f5" >
    +                                             ~
    Missing file specification after redirection operator.
    At C:\Users\shpctac0fffe\SplitTMX.ps1:500 char:13
    +       CI/CD & Automation
    +             ~
    The ampersand (&) character is not allowed. The & operator is reserved for future use; wrap an ampersand in double quot
    ation marks ("&") to pass it as part of a string.
    At C:\Users\shpctac0fffe\SplitTMX.ps1:575 char:45
    +             <ul class="list-style-none f5" >
    +                                             ~
    Missing file specification after redirection operator.
    Not all parse errors were reported.  Correct the reported errors and try again.
        + CategoryInfo          : ParserError: (:) [], ParentContainsErrorRecordException
        + FullyQualifiedErrorId : UnexpectedToken

    emoji
  •  

    Most likely due to content that is in your TMX.  I did this in a few minutes and only tested one 33k TU TMX that is probably a pretty good content after coming from the EU.  The errors might be caused by the way the TMX content is handled or by the presence of special characters that are not correctly escaped.  Additionally, if the TMX file contains HTML or JavaScript-like content, it might cause parsing issues.

    If you can't fix this yourself then I'd need to have your TMX to resolve it properly.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  •  

    You can zip it and send to pfilkin at sdl dotcom.  But if zipped it's still over 15 Mb please just send me a download link for it with dropbox, googledrive or whatever file sharing application you have.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  •   

    Thanks, I downloaded it this morning.  I don't know if you made a mistake but you sent me a 2.6Gb SDLTM and not a TMX.  Anyway, if you tried to use the script on that it would cause a problem!

    So I exported the SDLTM to a TMX and tested my script.  It didn't error at all, but it did fail to extract any segments as it found none.  So this prompted me to make a few changes to the script and now it works.  I updated it in Github and here's how it works with the TMX created from the SDLTM you provided.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji