2 problems regarding term recognition (too slow to recognize or not-refreshed)

I have 2 problems regarding Term recognition on Trados Studio 2017.

Please teach me how to solve these problems.

 

I changed some settings regarding term recognition on the project.

After that either of 2 problems occurs.

 

First one is that the displayed message "Searching for terms" is kept on the Term recognition screen for more than 5

minutes.

 

Second one  is that when next segment is selected after recognized terms are displayed by waiting for long time,

screen inside the Term recognition is not refreshed!

 

TM seems to be  recognized sooner than Termbase.

 

I guess some changes in the project affects adversely to searching terms.

But I do not know which change(s) cause these problem.

 

Alternatively any other factor may cause these problem.

 

Parents
  • Hi Maya,

    I'd be surprised if a change in the settings caused this. What happens if you create a new project? Can you reproduce the same issue and perhaps notice what you did?

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Hi Paul, thank you for reply and I must apologize for having kept this topic pending.

    When ask the question, I misunderstood the cause of this phenomenon.
    Now I believe the phenomenon occurred not by changing the project setting,
    but by a source segment including many names of chemical compound.

    I found when a source text, which includes many names of chemical compound, is designated, Studio Editor stops recognizing a termbase.
    (Sometimes names of chemical compound are very long and include symbols like bracket or hyphen. I believe the phenomenon is because of Studio Editor’s failing to recognize these name.)

    After shutdown and restart the SDL Studio, it starts recognizing the termbase again.
    However it stops recognizing the termbase again when the segment where the Studio Editor stopped recognizing is designated again.
  • Hi ,

    Can you give me a small document and sample termbase that I can use the reproduce this problem? You can email it to pfilkin@sdl.com if this is ok.

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Hi Paul,
    Thank you for your quick response.
    I have already prepare a small document and sample termbase.
    I confirmed similar problems with these files.
    I will send an email to your address.
  • Hi  

    Thank you for the files (one termbase with 84k entries, and one with 6 entries).  I thought I'd share the results of some testing on this because it is quite interesting to look at how the different settings can influence the results.  I also compared the results using an Excel spreadsheet directly as a termbase without any conversion at all.  More info on how to do this here:

    https://multifarious.filkin.com/2016/03/02/committing-the-cardinal-sin/

    You have a couple of problems I think.  First, your larger termbase has around 84k entries which is quite a large for a filebased termbase I think, and more importantly you have an awful lot of repetition within the chemical names.  So the deeper the search depth the longer it's taking in MultiTerm as you can see from the tests below:

    These are all default settings and it's quite interesting to see how Excel handles the 84k entries pretty well and doesn't get hung up on the problems of search depth.

    If you reduce the search depth however (default is 200) then the excessive waiting times are drastically reduced.  So using a value of 25 for example finds segment #4 in a second, but only finds 3 terms compared to the 13 found with the default.  There are 8 terms available for recognition in the segment so that may be too low.  If I use 50 it takes a second longer but now I get 7 terms.  If I use 60 I get exactly the same result in three seconds that took a minute with the default.  Segment #6 however still takes around four and a half minutes which is a third of the time using defaults but still woefully inadequate.

    I also tested some new segments I made up that did not have terms with such repetition and in these cases for segments with more than ten recognised terms term recognition was almost instant, even in the 84k MultiTerm termbase.  So it is a problem specific to those terms where you have significant similarities.

    I think, if I was you I'd use the Excel spreadsheet for this termbase because the huge amount of similar words/phrases is very hard work for the MultiTerm search engine.  Just make sure you always keep a backup copy handy in case of corruption.

    I'd also like to share this termbase with the development team if it's ok with you?  It's an excellent testcase for future development and enhancements to the existing feature.

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

Reply
  • Hi  

    Thank you for the files (one termbase with 84k entries, and one with 6 entries).  I thought I'd share the results of some testing on this because it is quite interesting to look at how the different settings can influence the results.  I also compared the results using an Excel spreadsheet directly as a termbase without any conversion at all.  More info on how to do this here:

    https://multifarious.filkin.com/2016/03/02/committing-the-cardinal-sin/

    You have a couple of problems I think.  First, your larger termbase has around 84k entries which is quite a large for a filebased termbase I think, and more importantly you have an awful lot of repetition within the chemical names.  So the deeper the search depth the longer it's taking in MultiTerm as you can see from the tests below:

    These are all default settings and it's quite interesting to see how Excel handles the 84k entries pretty well and doesn't get hung up on the problems of search depth.

    If you reduce the search depth however (default is 200) then the excessive waiting times are drastically reduced.  So using a value of 25 for example finds segment #4 in a second, but only finds 3 terms compared to the 13 found with the default.  There are 8 terms available for recognition in the segment so that may be too low.  If I use 50 it takes a second longer but now I get 7 terms.  If I use 60 I get exactly the same result in three seconds that took a minute with the default.  Segment #6 however still takes around four and a half minutes which is a third of the time using defaults but still woefully inadequate.

    I also tested some new segments I made up that did not have terms with such repetition and in these cases for segments with more than ten recognised terms term recognition was almost instant, even in the 84k MultiTerm termbase.  So it is a problem specific to those terms where you have significant similarities.

    I think, if I was you I'd use the Excel spreadsheet for this termbase because the huge amount of similar words/phrases is very hard work for the MultiTerm search engine.  Just make sure you always keep a backup copy handy in case of corruption.

    I'd also like to share this termbase with the development team if it's ok with you?  It's an excellent testcase for future development and enhancements to the existing feature.

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

Children
  • Thank you for your hard research Paul.
    However, I am afraid that I have still two questions.

    Question 1:
    You mentioned that,
    >>… more importantly you have an awful lot of repetition within the chemical names. So the deeper the search depth the longer it's taking in MultiTerm as you can see from the tests below:…

    I recognized like that:
    If there are many repetitions in the terms include in a termbase, searching a exact destinated term becomes high-load task for the search engine because so many terms which actually are not the exact destinated term leave for a long time as candidates.

    Is the idea correct?


    Question 2:
    I tried to apply excel based file for the too large termbase but failed.
    Could you tell me how to solve this problem?
    Let me describe details.

    1) By using Glossary Converter, I exchanged the larger termbase into an excel file.
    The excel file includes only two columns, Souce term (EN) and Target term (JP).
    2) I installed TermExcelerator
    appstore.sdl.com/.../
    into my Trados Studio and found new menu item “excel-based terminology provider” appeared when I open the project setting menu.
    3) I made the created excel recognized by Trados Studio.
    4) I open Editor and select an arbitrary segment while expecting terms recognized appears in the recognition window as the termbase is recognized.
    5) I found in the recognition window a message “There is no term recognized.”

    -I apologize that wording is not strictly same as you see in a screen because setting in my Trados Studio is Japanese and the message I wrote in this response is translation by me.
    -According to the default setting on my Trados Studio, default language pair seems English-German. When I forgot to change this setting into English-Japanese, recognized terms were displayed in a manner that both source and target terms are described in English although used excel file includes English and Japanese.


    Regarding your request to share the termbase with the development team:

    The termbase is very important for my business. I do hope that both a whole or a part of the termbase itself and how I extracted the terms are kept secret against public website, books brochure for your customer, and so on.

    However, I guess you are just interested in the point that when there are many repetitions in a plurality of terms, especially in names of chemical compounds, in a termbase, the problem written above occurs. And you would like to use my termbase as an example of the problem.

    Thus, you can share the termbase as far as it is ensured that the termbase is used only for development of application programs of your company.


    Thank you for reading long message.

    Regards
    Maya
  • Unknown said:
    Is the idea correct?

    I think so.  This does seem very inefficient on the part of MultiTerm, and in fact I haven't come across this phenomenon until you provided this example.  So thank you for allowing us to use this for testing.

    Unknown said:
    According to the default setting on my Trados Studio, default language pair seems English-German. When I forgot to change this setting into English-Japanese, recognized terms were displayed in a manner that both source and target terms are described in English although used excel file includes English and Japanese.

    I'm not sure I fully understand your problem here.  Are you saying that it worked when you left your default settings at English - German and not when you changed them to English - Japanese?  You can change your default, and probably should do, by going to File -> Options -> Editor -> Languages

    Or are you saying that when you had an EN -> DE project and set the languages in the plugin to EN -> JP that it didn't work?  The Excel terminology doesn't do any language checks in the same way MultiTerm does so you can be more flexible here.  BUt perhaps you can clarify again what your problem actually is?  Sorry for the lack of understanding on my part.

    Unknown said:
    Thus, you can share the termbase as far as it is ensured that the termbase is used only for development of application programs of your company.

    Thank you.  This is the only reason we will use this.

    Kind regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Thank you for quick reply Paul.

    Sorry for my terrible description.

    I tried to reproduce the error but failed, which means I could apply excel file same as a termbase fortunately.

    But just for your information, I would like to describe properties of the error.

    I attached two images on this reply and I believe you will see what I would like to tell.

     

    Thank you for your great efforts!

     

    Regards

    Maya

  • Hi Maya,

    Glad to see this is working for you now. It looks as though the earlier problem may be related to using EN-DE on an EN-JP project, at least the problem of not seeing EN and JP terms. I'll have a play and see whether there is something we need to address here, but I think it may be that if we do anything it will be to try and check for matching languages, or at least provide a warning.

    I'm not sure why no term would be recognised but this may have been a refresh issue at the time which is why you can't reproduce it now.

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub