Term Recognition is not working for Tibetan language

Question

I'm testing Trados for Tibetan translation and I'm trying to get a proof of concept for using a Tibetan termbase. It seems I cannot get any term recognition for a Tibetan source text. I create a simple bilingual termbase at the beginning of the project and then test by adding the simple term ཞི་མི་ = "cat". When I run a parallel test from English to French it works fine: 
 English to French: 
 
 But, no results with Tibetan to English: 
 
 I've tried a few different things here. Checking the "Use word-based tokenization for Asian source text" does not seem to help. 
 I also thought it might be related to the editor not recognizing the Tibetan tsheg punctuation (the little dots between words) as word boundaries--this issue was previously a problem with the TM matching, but was fixed in the 2022 SR1 update. However, as in the example above using roman transliteration with spaces: "zhi mi" = "cat" does not work either. 
 I know the terms are being added to the termbase because I see them in the termbase viewer and they come up in the termbase search, but they are still not being recognized. 
 It seems to only occur when the source is Tibetan. Does anyone have some insight into why this isn't working?

Paul Filkin · Answer

Celso Wilkinson 
 Asian languages can be tricky because there are no word boundaries, and I think this would almost certainly cause a problem here. It's also very difficult for us to test properly because, as I have learned this morning with only 30 mins or so of investigation, Tibetan is a very complex (and interesting) language with its own unique grammatical rules and context-dependent meanings. I had a go to try and test this as follows. Two files, one with no spaces, only the "tsek" as you mentioned (I think!) and the second some forced spaces to try and test if this made a difference... it didn't seem to:

the term recognition does not work 
 termbase search seems to find everything... although I did set a very low fuzzy to try and help so it may have found far more than it should given these characters are not in the the Tibetan translation of "he was not there" as far as I can see. 
 Find & replace can find these chars so they are definitely there 
 
 Apologies for the what is most likely completely non-sensical translations and terms, but I just wanted to have something to be able to replicate this problem for technical support to review. If you have a better sample termbase and source text with translation please send me the termbase and the sdlxliff and then I can use them to raise this problem?

Paul Filkin · Answer

Celso Wilkinson 
 Celso Wilkinson said: If there is anything I can send for report or investigation to your tech support please let me know. 
 What would be really helpful is the following: 
 
 a short translated SDLXLIFF... Tibetan to any European language 
 a small termbase that should, in theory, pick up terms in Tibetan 
 
 It's very hard to find any resource that can translate into Tibetan for us to test. You can send this to pfilkin at sdl dotcom 
 Also, you might be interested to have a play with this plugin: 
 https://appstore.rws.com/Plugin/59 
 This does do a better job than MultiTerm for term recognition in Tibetan... although this is also not perfect and definitely needs work. But we might be able to do some work on this plugin to better support this language... it's also opensource in case you have access to developers who could do this too? 
 So for example:

Cat is recognised, but only when it's at the start of the sentence. But even this is an improvement over MultiTerm. 
 Cat being picked up in the term recognition window 
 The Term Excelerator Termbase Viewer (this is also editable so quite neat...) 
 
 It also gets me this when I add spaces to where I think the words end: 
 
 Here I get two terms recognised. So this makes me think that we might be able to do some work on this plugin to recognise the "tsek" and any other important markers and improve the ability to use Tibetan. This could then serve as a useful proof of concept that "might" transpose to MultiTerm, or at least provide the dev team with a solution.

Trados Studio > 3. MultiTerm

Term Recognition is not working for Tibetan language

Top Replies