TM fuzzy matching gets no leverage at all for sources with the Tibetan script because the "་" tsheg Tibetan Punctuation is being interpreted as a character rather than a word delimiter.

Hello, I actually brought this issue up in 2019 when I was trying out an older version of Trados Studio (If you search in the community forum for "Tibetan" you will see my topics from then under the user name "Celso Scott." haha, it seems I'm one of the few people trying to use the software for Tibetan translation since then). I received some helpful support back then, but the deal breaker was an issue regarding how Trados interprets the Tibetan tsheg, which is a little dot that goes between each word/syllable. Tibetan is nearly a monosyllabic language so for fuzzy matching purposes, a syllable should be read as a word. There are spaces in the Tibetan script, but they occur at more or less in between sentences, and not between individual words. For example, see the following Tibetan sentence:

ངས་ཀློག་དེབ་གསར་པ་ཞིག་ཉོས་པ་ཡིན། - I bought a new reading book.

ངས་ "I" ཀློག་ "reading" དེབ་ "book" གསར་པ་ "new" ཞིག་ "a" ཉོས་པ་ཡིན། "Bought (+ past tense auxiliary)"

However, Trados reads those little dots as characters and therefore, for the purposes of fuzzy match retrieval, it is useless unless there is a 100% match (Trados reads the sentence like a single word).

It seems there is no setting to make Trados read the tsheg as word delimiters (like spaces in English). I made a complaint about this in 2019 and I see in the release notes for "Trados Studio 2022 SR1" it appears that this post was addressed:

Fixed an issue where unicode "TIBETAN MARK INTERSYLLABIC TSHEG" character would not be recognized as a word delimiter. (CRQ-15202)

I was happy to see this in the notes, but when I started a new trail and tested the TM recall it seems the problem persists.

A simple test is to take the sentence above and replace a the predicate དེབ་ "book" for གློག་ཀླད་ "computer":

ངས་གློག་ཀླད་གསར་པ་ཞིག་ཉོས་པ་ཡིན། - I bought a new computer.

ངས་ "I" གློག་ཀླད་ "computer" གསར་པ་ "new" ཞིག་ "a" ཉོས་པ་ཡིན། "Bought (+ past tense auxiliary)"

If the prior sentence at the top of this post is a TM segment then this altered sentence should be a 70%-99% match. It works as such on other platforms including SmartCAT, Memesource, TransFX, and CafeTrans. However, it is not matching as anything in Trados. A control can be given when I convert the Tibetan into Roman transliteration (the common scheme we use called the "Wylie" transliteration scheme), in which the tsheg dots are converted into spaces. So the two sentences above become.

Sentence 1: ngas klog deb gsar pa zhig nyos pa yin/

Sentence 2: ngas glog klad gsar pa zhig nyos pa yin/

Here the TM matching works beautifully and we get a ~80% match. However, I, and most Tibetan translators don't want to work in Roman transliteration, we want to work with Tibetan Unicode.

If you would like your software to be usable for Tibetan language translators, I suggest this be fixed by updating the software to regard the tsheg as equivalent to a space character by default, or making a setting available to do so. Otherwise, there really is no benefit to using Trados for Tibetan. Please let me know if there is anything I can clarify.

I am researching for an organization that supports some two dozen Tibetan-English translators. I would like to recommend your software, since it seems most suited to the rigorous level of detail we need for philological/scholastic works. However, I can't really do so if this isn't resolved. In case this issue has been resolved in the 2022 SR1 release, and its rather a matter of my failure in technical understanding for me to get it to work, please let me know, as I would like to recommend Trados as an available tool.

Best wishes,

-Celso W.

Celso Wilkinson

Top Comments

Parents

Celso Wilkinson over 2 years ago

Thank you Daniel Brockmann , I much appreciate your swift response. Also, very much enjoyed your webinar on Wednesday. Your screen shot is very encouraging as this is the results I would expect to see. However, I'm not able to reproduce them in my editor. I have re-indexed the TM as you instructed. I also created a new active TM to be updated. In both cases: (1) of making a fresh TM segment in the actively updated segment, or (2) recalling a previous TM with the segment. I still do not get a fuzzy match. Here in my screen shot you can see it next to a control with Roman transliteration where the TM fuzzy matching is functioning properly:

As you can see the Roman transliteration functions as expected. And when the 1st sentence is repeated (row 6) there is a 100% match from the TM when the string is exactly the same as row 1, but there is no 80-90% match for the altered sentence on rows 2 and 7.

I am now suspecting perhaps I do not have the latest service updated with corrections to ticket "CRQ-15202", could that be the case? I am using the free 30-day trial, and when I downloaded the software, I assumed the latest service updates would be included, but perhaps that is not the case. When I open Trados Studios the version statement says "Trados Studio 2022 - 17.0.3.11695". Notably, I'm not seeing the "SR1" in the version, do I need to update this separately?
- Cancel
- Vote Up 0 Vote Down
- More
- Cancel
Daniel Brockmann over 2 years ago in reply to Celso Wilkinson

Hi Celso Wilkinson - your suspicion is correct: you need version 17.1.6.16252 which is SR1 (indicated also by the .1 in the second digit). We released this on 5 July. You should see the update under the notification bell in the top right corner in Studio. If you have any apps installed, it's important to update those as well after you have installed SR1 (again, the notification bell should tell you this). And if you have MultiTerm Desktop installed separately, it's key to update it also to SR1. I hope this helps and this should resolve this problem. I am setting the idea to "Delivered" in the meantime as we have shown that the SR1 version indeed addressed this long-standing problem. Let me know if this is OK. Thanks! Daniel
- Cancel
- Vote Up +1 Vote Down
- More
- Cancel

Comment

Daniel Brockmann over 2 years ago in reply to Celso Wilkinson

Hi Celso Wilkinson - your suspicion is correct: you need version 17.1.6.16252 which is SR1 (indicated also by the .1 in the second digit). We released this on 5 July. You should see the update under the notification bell in the top right corner in Studio. If you have any apps installed, it's important to update those as well after you have installed SR1 (again, the notification bell should tell you this). And if you have MultiTerm Desktop installed separately, it's key to update it also to SR1. I hope this helps and this should resolve this problem. I am setting the idea to "Delivered" in the meantime as we have shown that the SR1 version indeed addressed this long-standing problem. Let me know if this is OK. Thanks! Daniel
- Cancel
- Vote Up +1 Vote Down
- More
- Cancel

Children

Celso Wilkinson over 2 years ago in reply to Daniel Brockmann

Great Thank you Daniel Brockmann . I did install the update and got this to work with the proper fuzzy matches as you documented. Apologies for not realizing the SR1 update was missing from the beginning. So that's very helpful. I do think we can proceed with Tibetan Unicode on Trados. I will note in my reviews and recommendations to other Tibetan translators that this SR1 update (17.1+) is vital. Thank you very much for your clarifications!
- Cancel
- Vote Up 0 Vote Down
- More
- Cancel
Daniel Brockmann over 2 years ago in reply to Celso Wilkinson

Thanks Celso!
- Cancel
- Vote Up 0 Vote Down
- More
- Cancel

Trados Portfolio Ideas > Trados Studio Ideas

TM fuzzy matching gets no leverage at all for sources with the Tibetan script because the "་" tsheg Tibetan Punctuation is being interpreted as a character rather than a word delimiter.

Top Comments