Batch-read all segments from a file??

I have realized that my initial post was a bit impractical, mixing together two distinct issues. So here is issue #2 separately.

For processing efficiency, we would like our AT plugin to be able, during Pre-translate batch processing, to read all segments of the current file in one go, send them over in bulk and finally pass back the results to Studio in such a way that Studio can associate the data with the right target segments.

Could anyone please point me to the part of the API to use for this task? Many thanks in advance!

Best regards,

Sebastian

Parents
  • Hi Sebastian,

    I notice that this is an old post and am unsure if you have already solved this, but you probably want to take a look at the implementation of the ITranslationProviderLanguageDirection interface in the sample app.

    I would think you would want to put the logic that calls your server in the SearchTranslationUnits() and/or SearchTranslationUnitsMasked() methods.  Then you could temporarily store the results in some kind of data structure, like a dictionary, then call SearchSegment() to add each one to the file.

    Anyway, you may have already figured that out.

    Best regards,

    Patrick Porter

  • Thanks a bunch Patrick! Your reply is very relevant, since I had not come round to a solution yet. Now I feel encouraged to keep looking and experimenting in that part of the code. Thanks and best regards,

    Sebastian

  • Hi Sebastian,

    I was wondering if you've made any progress on this.  I plan to try something similar for a client but am unsure how to determine whether the current lookup is being done as part of a batch pretranslate operation or whether in interactive mode in the editor.  It seems it would also be necessary to know when the last TU was reached, for example some property or an event that fires when the operation is over.  

    I plan to post a separate question here on this issue and maybe someone can help.

    Thanks.

    Patrick

  • Hi Patrick!

    I have been making slow but steady progress on both of my issues. Being fairly new to C# I have had to read up on a few technological elements (such as Linq to XML) so I haven't reached quite the stage in the implementation where I can test and verify our approach to this issue (and report back here, which I was planning to do), but I do think I have a good idea about the path now. Basically I am aiming to export an XML file and run that through an existing, asynchronous web API (with periodical status checks and final download and re-import).

    As far as interactive and batch use are concerned, I was expecting to notice the difference in whether the call is to a Search* (e.g. SearchSegment) or a Search*s (e.g. SearchTranslationUnits). Don't you think so?

    Also, in my mental image it should suffice that the respective method returns when the last translation comes in. Your concept sounds as if you are planning for multiple threads or some such in the plugin proper? (I'll be able to leave parallelization to a distributed-computing system on the server side, so here is maybe where the similarities between our approaches end.)

    Best,

    Sebastian

  • Hi Sebastian,

    Actually, it is Studio that seems to be breaking the task up, not any requirement on my part.  If the method was only called once, then it would be easy just to put in the necessary logic before the return.

    The problem is that instead of just calling SearchTranslationUnitsMasked once with an array of all translation units, it calls it multiple times with around 10 or 11 translation units each time.  So for example, when pre-translating a test file of 58 segments, it called SearchTranslationUnitsMasked six times, passing 10, 11, 11, 11, 11, and 9 translation units, respectively (curiously enough totaling 63, not 58).

    I was hoping there was some kind of property to check or something else that might identify when the end of the file has been reached (i.e., the last batch of TUs is passed to SearchTranslationUnitsMasked).  So far I have been unable to find anything and can't see anything in the API doc that is related.

    Anyway, I'll keep trying and will post any updates.

    Thanks for your help.

    Patrick

  • Hi Patrick,

    I appreciate having this discussion with you.

    As you probably have noticed, I too find the lack of usage information in the API doc quite irritating...

    I have to admit my first reaction to your revelation that SearchTranslationUnitsMasked is called multiple times, was one of shock... Then an idea dawned on me: May it be done that way in order for the batch dialog to update the progress bar? Be that as it may, thank you for letting me know. (It won't be optimal for us, especially in documents with widely differing degrees of segment complexity, but I assume we can live with it, and at least I won't be surprised about seeing multiple web API calls.)

    I am still wondering a bit why you feel the need to tell when the last batch is being passed, but I guess you have to perform some sort of finalization such as tearing down a database connection. Would you care to elaborate?

    Bye for now!

    Sebastian

  • Hi Sebastian,

    I've been rather busy lately..hence the delay... but to elaborate on the above:

    The reason for detecting the end of the batch process was to wait until the very end and then send over all segments to an mt server at one time.  I was developing some plugins for clients that use my servers and wanted to be able to do this to improve performance in batch pretranslate operations.

    For now I have resigned to just doing it 10 segments at a time, which helps performance some but can still be slow in some cases.

    You make a good point about the progress bar, which I hadn't considered.

    Another reason for knowing the difference between batch and interactive modern is different error handling.  In interactive mode I want to throw any errors up to trados studio so the user can see. But in batch mode it would be nice to be able to skip a segment that causes an error and to report the errors to the user at the end.

    The way trados studio works in batch mode is that if an error occurs the entire batch process is cancelled and any translations that have already been completed are rolled back.

    Anyway, for now I haven't found a good solution and am just advising my clients that use my servers to do any large batch translations outside of studio with an external utility that just reads the XML of the underlying sdlxliff file.

  • Hi Patrick,

    I see! The thought of bundling segments like that has also occurred to me, but I agree that the way SearchTranslationUnitsMasked is called in the present API (the filling of the complete results array being done outside of the plugin) unfortunately doesn't allow for this approach.

    Some of my findings from testing: Despite there being 7 Search* methods in the API, Trados always calls SearchTranslationUnitsMasked, both in interactive and batch mode. And if the (first) queried segment is not #1, then the previous segment is also submitted, but masked - hence the number 11 in your tests and the total count greater than the number of segments; the idea is probably for TMs to use the extra segment for Context Match.

Reply
  • Hi Patrick,

    I see! The thought of bundling segments like that has also occurred to me, but I agree that the way SearchTranslationUnitsMasked is called in the present API (the filling of the complete results array being done outside of the plugin) unfortunately doesn't allow for this approach.

    Some of my findings from testing: Despite there being 7 Search* methods in the API, Trados always calls SearchTranslationUnitsMasked, both in interactive and batch mode. And if the (first) queried segment is not #1, then the previous segment is also submitted, but masked - hence the number 11 in your tests and the total count greater than the number of segments; the idea is probably for TMs to use the extra segment for Context Match.

Children
No Data