TerminologyProvider does not reliably generate scores for Fuzzy Search

Hello,

I am using Trados Studio 2024 (not SR1) and I have another question.

After getting the search with the TerminologyProviderManager Singleton to work (see the answers in this question) I noticed some strange behavior.
My goal is to get the scores for terms like they are displayed in the "Term Recognition" view in the Editor UI. Ideally I want to put the whole segment string in and get the scores.

When I search via ITerminologyProivder.Search() the scores never line up. Even if I put in only a single term it only lines up when there is a 100% match.

When I put in a longer sentence the score for the best fitting term decreases with the length of the sentence.

For example see this code from the other question:

var sourceLanguage = new CultureInfo("en-US");
var targetLanguage = new CultureInfo("de-DE");
string segmentText = "...";
int maxResultsCount = 10;
bool targetRequired = true;

var searchResults = terminologyProvider.Search(
    segmentText,
    sourceLanguage,
    targetLanguage,
    maxResultsCount,
    SearchMode.Fuzzy,
    targetRequired
);

If I have a Termbase with the word "Gefahrstoff" in it and when I set the segmentText to "Gefahrstoff" the top searchResult has a score of 100. 

If i set the segmentText to "Gefahrstoffen" it has a score of 90. In the Term Recognition view in the Editor UI, "Gefahrstoff" has a Score of 87. The sentence in the Editor is  "Die Nutzung von Gefahrstoffen ist gefährlich".
When I set the segmentText to "Die Nutzung von Gefahrstoffen ist gefährlich" the term "Gefahrstoff" has only a score of 63.

Do i really need to parse the thext and search for each token? I would like to avoid doing that. And even if i did that the scores do not line up.

Is there any method for getting the same scores?

Best,
Lukas

Parents Reply
  • Hi  , this is just a note to let you know that the team are actively working on this now. 

    The scores generated from the `TerminologyProviderManager.Instance` are matching the terms against the entire search text, as you identified.  Whereas the term match scores generated from the integrated feature in the editor are against the terms themselves.

    In the background Trados Studio is using the same terminology provider to recover the terms, but then has additional logic on top of that to calculate the scores for the terms, that are also based on linguistic conditions. We need to first migrate that additional logic from legacy C++ and then expose it in the public api for third-party developers.  This is the idea...

    I'll report again when we get closer to releasing this.

    Patrick Andrew Hartnett | Developer Experience | Team Lead | RWS Group

Children
No Data