How does SDL Language Cloud MT calculate the number of characters?

One of my colleagues used this MT last month but her account was quickly blocked as she reached the limit. However, she knows she hasn't actually translated that much using the MT.

So I did a simple test: I ran a file, one segment at a time through the MT while checking my account open. To my surprise, my account shows I translated 6416 characters, while a count in Word of the same gives me 1644 characters (no spaces) and 1926 characters (with spaces)...

So... I do wonder. How does the algorithm calculate the number of characters? No wonder why my colleague reached so easily the limit!

Parents Reply
  • Hi Radu,

    For the purpose of the test I did, I had LC MT as the only TM in my settings. No other TM or MT was set. I did not use the batch task, but opened the file directly and went through it one segment at a time, waiting for the output from the LC MT... I did not confirm any segment, just went from one segment to the next, waiting for the MT to populate it.

    In the end, I still found a big difference:
    In Word: 2227 characters without spaces/2617 characters with spaces
    In LC MT: 4992 characters...

    I can't figure out any reasonable explanation. To me, there is still a bug that needs to be fixed. Could it be related to some pairs only? I am using it from English to French.

    Regards,

    Daniel
Children
  • Hi Daniel,

    I've done some tests on English to French with a smaller file. The number of characters used by LC MT is lower than the number of characters with spaces from Word. LC TM counts as characters the spaces between words as this is how it identifies the words, it doesn't count the spaces at the end of the row or sentence.

    Can you try with a smaller file, maybe with 3-4 segments, and check after each segment how much the LC MT usage increased?
    Also if you have Lookahead activated Trados Studio (File - Options-Editor-Automation) will send 2 more segments in advance for translation, but then when you get on those segments it should not send them again.

    Actually after doing some more tests I noticed that the issue is the LookAhead option and the speed of going from segment to segment. If you go fast from segment to segment, the segments will be sent by Lookahead in advance, but because Lookahead process is not done yet, it will also send the segment again, even if it was sent by Lookahead.

    So try to:
    1. Disable Lookahead
    2. Go segment by segment slowly, allowing 1-2 seconds.

    Studio is dealing with LC MT as with a regular TM and it will search it every-time you go on a segment that is not confirmed.
    Also for AdaptiveMT to work you need to go segment by segment, post-edit it, and then confirm so Studio sends the post-edited segment back to LC MT AdaptiveMT so it can learn your style of translation.

    We would recommend to use LC MT:
    1. Pre-translate then deactivate LC MT provider from project settings. The LC MT hits will be in the target segments for you to post-edit and confirm.
    2. Use LC MT segment by segment, post-edit and confirm. During the time you post edit if you are using LookAhead it should finish retrieving the next 2 segments back from LC MT so it will not send them again.

    Please try this and let me know if the usage is still doubled.

    Thanks
    Radu
  • Ha! That's exactly what it is! Disabling Lookahead gives me exactly the right count. When enabling it, I find indeed the count for each segment actually takes the next 2 segments... So there we are, you found the explanation to the issue!