TM fuzzy matching gets no leverage at all for sources with the Tibetan script because the "་" tsheg Tibetan Punctuation is being interpreted as a character rather than a word delimiter.

Hello, I actually brought this issue up in 2019 when I was trying out an older version of Trados Studio (If you search in the community forum for "Tibetan" you will see my topics from then under the user name "Celso Scott." haha, it seems I'm one of the few people trying to use the software for Tibetan translation since then). I received some helpful support back then, but the deal breaker was an issue regarding how Trados interprets the Tibetan tsheg, which is a little dot that goes between each word/syllable. Tibetan is nearly a monosyllabic language so for fuzzy matching purposes, a syllable should be read as a word. There are spaces in the Tibetan script, but they occur at more or less in between sentences, and not between individual words. For example, see the following Tibetan sentence:

ངས་ཀློག་དེབ་གསར་པ་ཞིག་ཉོས་པ་ཡིན། - I bought a new reading book.

ངས་ "I" ཀློག་ "reading" དེབ་ "book" གསར་པ་ "new" ཞིག་ "a" ཉོས་པ་ཡིན། "Bought (+ past tense auxiliary)"

However, Trados reads those little dots as characters and therefore, for the purposes of fuzzy match retrieval, it is useless unless there is a 100% match (Trados reads the sentence like a single word).

It seems there is no setting to make Trados read the tsheg as word delimiters (like spaces in English). I made a complaint about this in 2019 and I see in the release notes for "Trados Studio 2022 SR1" it appears that this post was addressed:

Fixed an issue where unicode "TIBETAN MARK INTERSYLLABIC TSHEG" character would not be recognized as a word delimiter. (CRQ-15202)

I was happy to see this in the notes, but when I started a new trail and tested the TM recall it seems the problem persists.

A simple test is to take the sentence above and replace a the predicate དེབ་ "book" for གློག་ཀླད་ "computer":

ངས་གློག་ཀླད་གསར་པ་ཞིག་ཉོས་པ་ཡིན། - I bought a new computer.

ངས་ "I" གློག་ཀླད་ "computer" གསར་པ་ "new" ཞིག་ "a" ཉོས་པ་ཡིན། "Bought (+ past tense auxiliary)"

If the prior sentence at the top of this post is a TM segment then this altered sentence should be a 70%-99% match. It works as such on other platforms including SmartCAT, Memesource, TransFX, and CafeTrans. However, it is not matching as anything in Trados. A control can be given when I convert the Tibetan into Roman transliteration (the common scheme we use called the "Wylie" transliteration scheme), in which the tsheg dots are converted into spaces. So the two sentences above become.

Sentence 1: ngas klog deb gsar pa zhig nyos pa yin/

Sentence 2: ngas glog klad gsar pa zhig nyos pa yin/

Here the TM matching works beautifully and we get a ~80% match. However, I, and most Tibetan translators don't want to work in Roman transliteration, we want to work with Tibetan Unicode.

If you would like your software to be usable for Tibetan language translators, I suggest this be fixed by updating the software to regard the tsheg as equivalent to a space character by default, or making a setting available to do so. Otherwise, there really is no benefit to using Trados for Tibetan. Please let me know if there is anything I can clarify.

I am researching for an organization that supports some two dozen Tibetan-English translators. I would like to recommend your software, since it seems most suited to the rigorous level of detail we need for philological/scholastic works. However, I can't really do so if this isn't resolved. In case this issue has been resolved in the 2022 SR1 release, and its rather a matter of my failure in technical understanding for me to get it to work, please let me know, as I would like to recommend Trados as an available tool.

Best wishes,

-Celso W.

Celso Wilkinson

Top Comments

Parents

Daniel Brockmann over 2 years ago

Hi again Celso Wilkinson - I just tested it with a fresh TM in Studio 2022 SR1 and got a fuzzy match for the second segment:

What _might_ be at play here is that you may need to reindex the TM if you are using an existing TM (and not a freshly created one as in my case). To do that, close the document and go to the TM view in Studio, open the TM in question so that you can see it in the list of TMs. (If it was opened also in the side-by-side editor, close it there). Now, right click on the TM name and select "Settings" then "Performance and Tuning". Here you can re-index your TM. You need to do that for the fuzzy matching to be refreshed and hopefully this will then start working.

Looking forward to your feedback,
Daniel
- Cancel
- Vote Up +1 Vote Down
- More
- Cancel

Comment

Daniel Brockmann over 2 years ago

Hi again Celso Wilkinson - I just tested it with a fresh TM in Studio 2022 SR1 and got a fuzzy match for the second segment:

What _might_ be at play here is that you may need to reindex the TM if you are using an existing TM (and not a freshly created one as in my case). To do that, close the document and go to the TM view in Studio, open the TM in question so that you can see it in the list of TMs. (If it was opened also in the side-by-side editor, close it there). Now, right click on the TM name and select "Settings" then "Performance and Tuning". Here you can re-index your TM. You need to do that for the fuzzy matching to be refreshed and hopefully this will then start working.

Looking forward to your feedback,
Daniel
- Cancel
- Vote Up +1 Vote Down
- More
- Cancel

Children

No Data

Trados Portfolio Ideas > Trados Studio Ideas

TM fuzzy matching gets no leverage at all for sources with the Tibetan script because the "་" tsheg Tibetan Punctuation is being interpreted as a character rather than a word delimiter.

Top Comments