Caching of machine translation segments?

Hi there,

We've developed a machine translation plugin for Studio 2011 and 2014.

One behaviour we noticed is that when a user opens a specific segment, it sends it to our MT servers. However, if the user revisits that same segment later, rather than using the previously retrieved translation, it sends it to the MT server again.

The main impact this has is in trying to count the number of words sent to the server. (A side effect is the translation also isn't returned *instantly*)

Is it possible to configure the plugin to make Studio cache the machine translations for already processed segments so that it doesn't send them again?

Thanks

John

Parents Reply Children
  • Thanks Paul.

    Patrick, if you could give us a pointer that would be great.

    I suspect it's something to do with the translation status and origin, e.g. if status is "translated" or the origin is "automated translation" then don't send it to MT, or something along those lines?

    Cheers

    John

  • Hi John,


    I'm sure that Patrick Porter is writing a response to this as we speak, but you are right on the mark; the current translation state from the translation Origin maintains this information (along with the previous orgin states) and you can use those properties to make a decision whether or not to query your provider.

    I would suggest that you also manage an additional option with your settings that allows the user to ignore segments that are already translated so that the can also override this if required.

    In some cases I also manage a cache of the translations returned from the provider so that I can reuse those translations from cache as opposed to re-querying the provider for the same source segment + content structure.

    P.

  • Hi Patrick,

    That's great, I think we're clear on how to proceed now, thanks!

    Managing a cache was our first thought, but we were concerned about how to constrain it and avoid situations where we ended up maintaining large caches for indeterminate lengths of time.

    Cheers
    John
  • Hi folks,

    We've made some progress on this which now throws up a few more questions

    We are modifying our SearchTranslationUnitsMasked method in our TranslationProviderLanguageDirection implementation, which has the "TranslationUnit [ ] translationUnits" parameter. The segment status is correctly retrieved using the ConfirmationLevel property for each TranslationUnit, but there are two problems:

    1 - We are not able to retrieve the Origin value because the "Origin" property is always set to "Unknown".
    2 - We cannot find a way to retrieve the translation result that has been previously returned (we'd like to show the previous translation in the translation results box instead of "No matches found").

    Any ideas!?
    Thanks in advance!
  • Hi John,

    You need to get a handle on the TranslationOrigin; Include this code in one of the SearchTranslationUnit methods...

     MessageBox.Show(
                    "ConfirmationLevel:\t\t" + translationUnit.DocumentSegmentPair.Properties.ConfirmationLevel.ToString()
                   + "\r\nIsLocked:\t\t\t" + translationUnit.DocumentSegmentPair.Properties.IsLocked.ToString()
                   + "\r\nOriginType:\t\t" + translationUnit.DocumentSegmentPair.Properties.TranslationOrigin.OriginType.ToString()
                   + "\r\nOriginSystem:\t\t" + (translationUnit.DocumentSegmentPair.Properties.TranslationOrigin.OriginSystem != null ? translationUnit.DocumentSegmentPair.Properties.TranslationOrigin.OriginSystem : string.Empty)
                   + "\r\nIsRepeated:\t\t" + translationUnit.DocumentSegmentPair.Properties.TranslationOrigin.IsRepeated.ToString()
                   + "\r\nIsStructureContextMatch:\t" + translationUnit.DocumentSegmentPair.Properties.TranslationOrigin.IsStructureContextMatch.ToString()
                   + "\r\nTextContextMatchLevel:\t" + translationUnit.DocumentSegmentPair.Properties.TranslationOrigin.TextContextMatchLevel.ToString()
                   + "\r\nMatchPercent:\t\t" + translationUnit.DocumentSegmentPair.Properties.TranslationOrigin.MatchPercent
                   + "\r\nConfirmationLevel:\t\t" + translationUnit.DocumentSegmentPair.Properties.ConfirmationLevel.ToString()
                   );

    Example:

    ciao,

    P.

  • Looks like Patrick H. covered most of the answer. As for getting the previously retrieved translation, one possibility could be the target text of the current segment, which I think would work but haven't tested it. Although, that wouldn't give the same exact string as the previous result if the user has edited it in the target edit box. The only other way that comes to mind would be to implement some kind of caching. One way would be to use the SDL TranslationMemory library and store them in TMs, or to keep it simple you could just store them in some data structure like a hashtable, possibly combined with file I/O for persistence between work sessions.

    I've thought about implementing this sort of thing in my plugin, but decided it would just be simpler to quit the search on certain confirmation levels and pass a message to indicate it was canceled. Not the most elegant solution, but definitely simpler.
  • Thanks Patricks P and H. I guess we can tell if the segment has been edited or not based on the state, but if it has, then maybe we need to just store the MT for these segments.

    One quick follow up question from Patrick H's response related to the DocumentSegmentPair property. We're using this in the Studio 14 API but can't find it in 2011. Is there another way to collect the same information in 2011?
  • Hi John,

    Just reading this now...

    from memory... I also had some problems with this becuase the ISegment isn't exposed, correct?

    I will check/test later on but you could try to initialize the translation unit with an additional SearchResult class and take some of the properties from the ScoringResults (I think that is what it is called)... there should be a few properties in there that resemble in part what is maintained with the TransaltionOrigin...

    I will try to follow up later on if i get a chance to look at some code and see if what I am saying here makes sense /or works

    P.

  • Checked this, this morning and I can safely say that what I suggested regarding -> recovering this type of information by initializing an additional SearchResults with an existing TranslationUnit can be ignored. It was worth a try, but no cigar :-)

    I also took the opportunity to review some code for one of the plugins that I released that supports Studio 2009/2011; for these releases, the only checks that I had in place to confirm if a segment had already been translated were against the parameters (ConfirmationLevel and the TargetSegment itself).
  • Thanks so much Patrick, I guess for 2011 we'll just use the ConfirmationLevel as a more coarse solution than we can do in 2014.

    This begs another question though - how do we maintain multiple versions of the same plugin but for different versions of Studio (this is what I think we'll need in this case, as we're making use of parameters in 2014 that don't exist in 2011). Perhaps Paul can answer?

    (P.S. how do I tag someone in a post? like Paul in this case?)