Is there a way to split a sdltm?

Hello people,

We have got a several different machine translations memories(.sdltm) that's huge at least 180.000 different segments.

the problem is when people that work in our place  using our sdltms it take to long time to  Pre-translate the projects against our translation memories

I wonder if there is a way to split a sdltm/ is there another solution to handle a huge sdltm?

Thank you in advance.

Parents Reply Children
  • Oh sorry,
    In general are our Clients TM:s located on a network drive but I did a test to put the TM into a local drive it took the same amount of time.

  • Hi Daniel,

    I forgot to ask also where are the source files stored. However, here is how I see the big picture:

    Try first with all the source files, TMs, TBs, stored on the local disc. Make sure you re-index TMs, reorganize termbases and prep your files before use. Also, in the TM settings, under Performance and Tuning you have the Advanced Tuning section where you can choose Speed over Accuracy. That might help too (see below screenshot). Try on several computers to make sure it's not a computer-related issue. You can also consider doing a general PC clean-up. Uninstall any unnecessary applications, perform a system check (from a command prompt run sfc /scannow) and use a cleanup tool. I use CCleaner and it's very easy to use and works great. It cleans all unnecessary files and can also fix broken registry entries. A free version is available online.

    One good hardware upgrade that you can make is the hard disk. Solid State Drives will help a lot, as the access times are way shorter then traditional hard disks and SSDs are not that expensive nowadays. I have a Samsung one and when I did the upgrade I could not believe how fast everything worked! Of course this would help if you have your resources stored on that local SSD drive not on the network.

    If you need to have a centralized way to share TMs, Projects, Termbases, then you might want to take a look at our GroupShare offerings. There are basically 3 options:

    1. GroupShare On-premise - this means you buy the license and you install it in your own network and are responsible for managing it and of course for the infrastructure.

    2. GroupShare Hosted - this means we, SDL, are responsible for the infrastructure and the actual software is installed in our data centers. You will have a dedicated VM just for your server. You will be given a link and credentials to access it.

    3. GroupShare Cloud - this is similar to 2. but the VM and infrastructure are shared with other customers. You will not see any difference on your end, because we set it up in such a way that every customer has access only to their own data.

    The last two options would be helpful if you have people working from several offices or locations, but if everybody is in the same location, then option 1 is the best choice.

    Let us know how it goes and if we can help with anything else,

    Thank you,

    Adrian

  • Hi Adrian, thank you for the info I already changed Advanced tuning from accuracy to speed.
    there was no difference in time tho.
  • Hi Daniel and all contributors

    Following this thread from the beginning and seeing no resolution so far, let me ask one question: "How much time doe these project preparation processes really take?"

    What I want to point out is that no single timing figure has been mentioned so far. The only statement we have is Daniel's first statement:  "... the problem is when people using our sdltms it take to long time to use..."

    What is "too long time to use"? Could it be that the expectations are too high and that the time it actually takes to prepare projects is absolutely within the normal timings?

    It would be helpful if we could get some examples of preparation projects that take too long, so we can judge whether we really have a problem here.

    Walter

  • Finally I am done with my test.
    I have now from our huge SDLTM first reorganized the index also the fuzzy index.
    Then I have exported the memory to a tmx-file, I used a software called "Heartsome TMX Editor" to split the tmx into 16 pieces.

    I upgraded each tmx-file piece in sdl studio to sdltm, then I ran "reorganise index also fuzzy index". I exported each upgraded sdltm to tmx again.

    The next step then was to create a new SDL project and use each tmx-file piece as translatable-file by use the plugin called "Translation Memory eXchange".

    I used the huge sdltm memory as translation memory while I ran "Pre-translate without TM".
    then I ran generate target translations finally I got a very clean and smooth tmx.
    now I used batch function in SDL Studio to upgrade them to sdltm.

    now when I pre-translate a new project in SDL Studio and using all the 16 sdltm:s as translation memories,.

    The huge memory takes to pre-translate a project from my computer with memories located on server 1.11 min.

    if the memory is splitted in 16 pieces it takes to pre-translate the project from my computer with memories located on server 4.57 min.


    I need a solution to make our projects we create run faster while we pre-translate them also updating the translation memories..


    Have someone a solution?

    Thank you in advance

  • Hi Daniel,

    I don't know for sure, but I'm not sure it matters if all the TUs are in one TM or several smaller TMs. If they are all used in the pre-translation process, then Trados Studio still needs to "look through" the same number of TUs.

    Here's a couple of suggestions that may help:

    a) If the TM (I'll speak of one instead of several) contains a bunch of "old" units that aren't needed anymore, they can be exported permanently to an "archive" TM. Our "current" TM only contains TUs that have been used in the past 5 years (and contains a LOT more than 180,000 TUs!). The others are stored elsewhere until we think they are needed.

    b) If you have a lot of duplicate entries, you can use Heartsome to "clean" your TM. This is useful if there are several target translations for the same source unit, for example. In this case, Heartsome will delete all of the older units and only keep the "latest and greatest". Make sure you check out what all of the settings do for cleaning in Heartsome and always back up your data.

    c) Maybe it is possible to break up your TMs by area (e.g. make one for IT, one for art, etc.)? Unless you already have fields for classification, this might mean a lot of work upfront but could pay off in the long run.

    I hope one or the other of these suggestions will help. Good luck!
  • Hi Michael,

    I will test your (b) suggestion

    also we are using 7 different fields, but I know with my experience it takes way longer times when you using fields.

    do you agree?
  • Daniel

    I agree with Michael that splitting your main TM in so many sub-TMs will not be beneficial at all. In addition, it makes maintenance difficult (which TMs do you upgrade? all 16? How are you going to do housekeeping on them without merging them again, etc.).

    What I am missing here is the size of your main TM. You mentioned once that your TM is something around 180'000 TUs , which would in my opinion be a small TM. Can you pls tell us what the size of your so-called "huge TM" is?
    Another important point is the number and size of the files you have in the projects you prepare. If you have many large files and they reside on a server, then a prepare time of 1.11 minutes as you stated is absolutely normal.
    And one aspect which I think we have not covered so far is the speed of your LAN. This has a big influence if your files and TMs are stored on the server.
    So, in order to compare and find out where the bottleneck is, I suggest that you run a test with all files on your local drive and the exactly same test (same files, same TMs) with all files located on the server to see what difference this makes.

    Walter
  • Hi again Walter,

    The size of our huge sdltm is about 1.25 GB.
    my test project contains about 14 different xml files.
    all the files together is: 1490 segments, 5570 words.

    Thank you in advance