How to split .tmx/translation memory by size?

Hi!

I would like to split a large .tmx by size into small amounts. How to do that?

emoji
Parents
  •  

    A TMX is a flat file so the easiest way is probably to simply use a text editor.  Just work out how many you need based on your size requirements, then convert this into how many lines each file when split would contain, and split it up like that.  Manual but not difficult.

    Is that enough of an explanation for you?  I don't know how comfortable you might be with xml (a TMX is an xml file) files or working with a text editor.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  •  

    If it helps... I just used ChatGPT to create a powershell script to do this.  Script is here along with the sample TMX I used:

    https://github.com/paulfilkin/Powershell_scripts/tree/main/split_TMX

    Also a short video to explain how to use it:

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • I received these errors:

    At C:\Users\shpctac0fffe\SplitTMX.ps1:41 char:60
    + ... script type="application/json" id="client-env">{"locale":"en","featur ...
    +                                                             ~~~~~
    Unexpected token ':"en"' in expression or statement.
    At C:\Users\shpctac0fffe\SplitTMX.ps1:41 char:65
    + ... cript type="application/json" id="client-env">{"locale":"en","feature ...
    +                                                                 ~
    Missing argument in parameter list.
    At C:\Users\shpctac0fffe\SplitTMX.ps1:41 char:716
    + ... tions","custom_inp","remove_child_patch","kb_source_repos"]}</script>
    +                                                                 ~
    The '<' operator is reserved for future use.
    At C:\Users\shpctac0fffe\SplitTMX.ps1:240 char:84
    + ... tion/json" data-target="react-partial.embeddedData">{"props":{"docsUr ...
    +                                                                 ~
    Unexpected token ':' in expression or statement.
    At C:\Users\shpctac0fffe\SplitTMX.ps1:240 char:95
    + ... ":{"docsUrl":"https://docs.github.com/get-started/accessibility/keybo ...
    +                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Unexpected token ':"https://docs.github.com/get-started/accessibility/keyboard-shortcuts"' in expression or statement.
    At C:\Users\shpctac0fffe\SplitTMX.ps1:240 char:168
    + ... s.github.com/get-started/accessibility/keyboard-shortcuts"}}</script>
    +                                                                 ~
    The '<' operator is reserved for future use.
    At C:\Users\shpctac0fffe\SplitTMX.ps1:279 char:205
    + ... ink Button--medium Button d-lg-none color-fg-inherit p-1">  <span cla ...
    +                                                                 ~
    The '<' operator is reserved for future use.
    At C:\Users\shpctac0fffe\SplitTMX.ps1:302 char:45
    +             <ul class="list-style-none f5" >
    +                                             ~
    Missing file specification after redirection operator.
    At C:\Users\shpctac0fffe\SplitTMX.ps1:500 char:13
    +       CI/CD & Automation
    +             ~
    The ampersand (&) character is not allowed. The & operator is reserved for future use; wrap an ampersand in double quot
    ation marks ("&") to pass it as part of a string.
    At C:\Users\shpctac0fffe\SplitTMX.ps1:575 char:45
    +             <ul class="list-style-none f5" >
    +                                             ~
    Missing file specification after redirection operator.
    Not all parse errors were reported.  Correct the reported errors and try again.
        + CategoryInfo          : ParserError: (:) [], ParentContainsErrorRecordException
        + FullyQualifiedErrorId : UnexpectedToken

    emoji
  •  

    Most likely due to content that is in your TMX.  I did this in a few minutes and only tested one 33k TU TMX that is probably a pretty good content after coming from the EU.  The errors might be caused by the way the TMX content is handled or by the presence of special characters that are not correctly escaped.  Additionally, if the TMX file contains HTML or JavaScript-like content, it might cause parsing issues.

    If you can't fix this yourself then I'd need to have your TMX to resolve it properly.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  •  

    You can zip it and send to pfilkin at sdl dotcom.  But if zipped it's still over 15 Mb please just send me a download link for it with dropbox, googledrive or whatever file sharing application you have.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  •   

    Thanks, I downloaded it this morning.  I don't know if you made a mistake but you sent me a 2.6Gb SDLTM and not a TMX.  Anyway, if you tried to use the script on that it would cause a problem!

    So I exported the SDLTM to a TMX and tested my script.  It didn't error at all, but it did fail to extract any segments as it found none.  So this prompted me to make a few changes to the script and now it works.  I updated it in Github and here's how it works with the TMX created from the SDLTM you provided.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  •  

    In the meantime, since you seem to be starting with an SDLTM and not a TMX, this application is probably useful for you:

    https://appstore.rws.com/Plugin/111

    I think you will need a paid version of this for the size of the files you are handling, but it's quick and easy to use.  Here's how:

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
Reply
  •  

    In the meantime, since you seem to be starting with an SDLTM and not a TMX, this application is probably useful for you:

    https://appstore.rws.com/Plugin/111

    I think you will need a paid version of this for the size of the files you are handling, but it's quick and easy to use.  Here's how:

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
Children
No Data