A Bug that needs to be addressed for the Tibetan script

Question

I was recently in conversation with one of the moderators, Paul Filkin, about this, but I got an automated reply that he is currently not available, so I thought I would post this here. 
 I have been setting up SDL Trados studios for about 3 weeks now, and I noticed that I was getting very poor leverage for Tibetan-English TMs and Concordance search. After going through many research, tests, and tutorials I was able to identify the definite cause of the problem. 
 
 Let me explain first that in the Tibetan script there are no spaces between words instead there are little dots called tseks, e.g., ཐམས་ཅད་མཁྱེན་ཅིང་ཀུན་གཟིགས། is a phrase with seven words and dot between each word. Trados Studios is recognizing each of these dots as a character rather than a word delimiter, thereby registering phrases as single words and as a result getting very poor leverage for both fuzzy matching and concordance search. 
 
 I was able to confirm this by doing a simple test: in the document called "Test C Tibetan Script" I created a TM for a simple 8 lines of verse (creating one segment for verse) then I ran this against a new source text where I altered a word or two on each line so that we would expect a fuzzy match for each line, though I left one line 100% as a control. 
 
 Then I ran the exact same test in another document called "Test C Tibetan Romanized Transliteration". The exacts same TM and source except all the Tibetan was converted into the Wylie Roman transliteration, where all of the punctuation dots are replaced by spaces, the example phrase from above would thus be rendered "thams cad mkhyen cing kun gzigs/". 
 
 The results where dramatically different, in the first test with Tibetan script, only the control line that was a 100% match hit, all the other lines failed to make even the 50% threshold for fuzzy matching. However, in the second test with Tibetan translation, all the lines registered as matches. The 100% line registered as thus, and all the other lines registered in the 70-95% range that we would expect for a CAT tool like SDL Trados. I got similar results from the concordance searches. 
 
 I have attached my tests to this post below for reference. 
 
 So basically, SDL Trados just needs a very small update to recognize the Tibetan punctuation properly. It should recognize all the tseks, the little punctuation dots (Unicode = U+0F0B) as word delimiters equivalent to spaces (Unicode = U+0020). This minor adjustment would make the platform functional for Tibetan as it is with many other languages. I noticed for instance, that another question in the community forum (the only other question tagged "Tibetan") was from a Tibetan-Chinese translator who was have a problem similar to what I am experiencing. 
 At this point I am 100% confident that this is the problem. Would it be possible to please inform someone in SDL Trados' development or IT to develop an update or patch to address this very simple fix? It would make a world of difference not just to me but anyone using your product for Tibetan translation. Otherwise the fuzzy match and concordance search function very poorly and one would get better results by just searching the TMs in a text editor. 
 
 After trying the plat form for three weeks I recently just made a review of SDL Trados Studios to a forum of Tibetan translators and I had to make a very poor review of the platform because of this issue, however, if this one issue was fixed, I would redact my poor review for a very positive one. 
 
 I note that, it would be possible to convert all the Tibetan into Roman transliteration for use in SDL Trados, but this would be very impractical for myself and other translators, as the Tibetan script is widely used, and Roman transliteration is not very comfortable to read, plus all ones TMs in Tibetan script would need to be converted, which would be a complicated process. Also any new users to SDL Trados would expect the Tibetan script to work and encounter the same problem that I had. 
 
 Here are tests attached, each is accompanied by the .sdltm that I ran it against. I am happy to communicate with developers or other staff if there are any questions about the Tibetan script from developers. 
 
 Thank you very much for taking the time to look into this. 
 
 Best wishes, -Celso 
 
 1321.Trados Tibetan Script vs Transliteration Tests.zip

Daniel Hug · Accepted Answer

celso scott 
 Thank you for your in-depth testing. I hope SDL can make this work for you and the Tibetan translation community. Did you post your suggestion in the ideas section? 
 https://community.sdl.com/ideas/translation-productivity-ideas/i/trados-studio-ideas 
 You could outline very briefly what you suggest and link to this post. I found SDL to be quite responsive. 
 Daniel

Paul Filkin · Answer

Hi celso scott 
 Yes, thank you for persevering with these tests. I played around with this today and can obviously reproduce this, although without your help I would not have been able to identify the likely problems. I copied Kevin Flanagan as he may be interested to take a look at this problem too.

Trados Studio > 1. Trados Studio

A Bug that needs to be addressed for the Tibetan script

Top Replies