Caching of machine translation segments?

Hi there,

We've developed a machine translation plugin for Studio 2011 and 2014.

One behaviour we noticed is that when a user opens a specific segment, it sends it to our MT servers. However, if the user revisits that same segment later, rather than using the previously retrieved translation, it sends it to the MT server again.

The main impact this has is in trying to count the number of words sent to the server. (A side effect is the translation also isn't returned *instantly*)

Is it possible to configure the plugin to make Studio cache the machine translations for already processed segments so that it doesn't send them again?

Thanks

John

Parents
  • Maybe Patrick Porter can help you with that one?  He did this in his plugins already I believe - MT Enhanced Plugin for Trados Studio

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Thanks Paul.

    Patrick, if you could give us a pointer that would be great.

    I suspect it's something to do with the translation status and origin, e.g. if status is "translated" or the origin is "automated translation" then don't send it to MT, or something along those lines?

    Cheers

    John

  • Hi John,


    I'm sure that Patrick Porter is writing a response to this as we speak, but you are right on the mark; the current translation state from the translation Origin maintains this information (along with the previous orgin states) and you can use those properties to make a decision whether or not to query your provider.

    I would suggest that you also manage an additional option with your settings that allows the user to ignore segments that are already translated so that the can also override this if required.

    In some cases I also manage a cache of the translations returned from the provider so that I can reuse those translations from cache as opposed to re-querying the provider for the same source segment + content structure.

    P.

  • Hi Patrick,

    That's great, I think we're clear on how to proceed now, thanks!

    Managing a cache was our first thought, but we were concerned about how to constrain it and avoid situations where we ended up maintaining large caches for indeterminate lengths of time.

    Cheers
    John
  • Hi folks,

    We've made some progress on this which now throws up a few more questions

    We are modifying our SearchTranslationUnitsMasked method in our TranslationProviderLanguageDirection implementation, which has the "TranslationUnit [ ] translationUnits" parameter. The segment status is correctly retrieved using the ConfirmationLevel property for each TranslationUnit, but there are two problems:

    1 - We are not able to retrieve the Origin value because the "Origin" property is always set to "Unknown".
    2 - We cannot find a way to retrieve the translation result that has been previously returned (we'd like to show the previous translation in the translation results box instead of "No matches found").

    Any ideas!?
    Thanks in advance!
  • Hi John,

    You need to get a handle on the TranslationOrigin; Include this code in one of the SearchTranslationUnit methods...

     MessageBox.Show(
                    "ConfirmationLevel:\t\t" + translationUnit.DocumentSegmentPair.Properties.ConfirmationLevel.ToString()
                   + "\r\nIsLocked:\t\t\t" + translationUnit.DocumentSegmentPair.Properties.IsLocked.ToString()
                   + "\r\nOriginType:\t\t" + translationUnit.DocumentSegmentPair.Properties.TranslationOrigin.OriginType.ToString()
                   + "\r\nOriginSystem:\t\t" + (translationUnit.DocumentSegmentPair.Properties.TranslationOrigin.OriginSystem != null ? translationUnit.DocumentSegmentPair.Properties.TranslationOrigin.OriginSystem : string.Empty)
                   + "\r\nIsRepeated:\t\t" + translationUnit.DocumentSegmentPair.Properties.TranslationOrigin.IsRepeated.ToString()
                   + "\r\nIsStructureContextMatch:\t" + translationUnit.DocumentSegmentPair.Properties.TranslationOrigin.IsStructureContextMatch.ToString()
                   + "\r\nTextContextMatchLevel:\t" + translationUnit.DocumentSegmentPair.Properties.TranslationOrigin.TextContextMatchLevel.ToString()
                   + "\r\nMatchPercent:\t\t" + translationUnit.DocumentSegmentPair.Properties.TranslationOrigin.MatchPercent
                   + "\r\nConfirmationLevel:\t\t" + translationUnit.DocumentSegmentPair.Properties.ConfirmationLevel.ToString()
                   );

    Example:

    ciao,

    P.

  • Looks like Patrick H. covered most of the answer. As for getting the previously retrieved translation, one possibility could be the target text of the current segment, which I think would work but haven't tested it. Although, that wouldn't give the same exact string as the previous result if the user has edited it in the target edit box. The only other way that comes to mind would be to implement some kind of caching. One way would be to use the SDL TranslationMemory library and store them in TMs, or to keep it simple you could just store them in some data structure like a hashtable, possibly combined with file I/O for persistence between work sessions.

    I've thought about implementing this sort of thing in my plugin, but decided it would just be simpler to quit the search on certain confirmation levels and pass a message to indicate it was canceled. Not the most elegant solution, but definitely simpler.
Reply
  • Looks like Patrick H. covered most of the answer. As for getting the previously retrieved translation, one possibility could be the target text of the current segment, which I think would work but haven't tested it. Although, that wouldn't give the same exact string as the previous result if the user has edited it in the target edit box. The only other way that comes to mind would be to implement some kind of caching. One way would be to use the SDL TranslationMemory library and store them in TMs, or to keep it simple you could just store them in some data structure like a hashtable, possibly combined with file I/O for persistence between work sessions.

    I've thought about implementing this sort of thing in my plugin, but decided it would just be simpler to quit the search on certain confirmation levels and pass a message to indicate it was canceled. Not the most elegant solution, but definitely simpler.
Children