Strange concordance search results in file-based TM (Studio 2015)

Hi all

In a file-based TM there are two DE-EN segments containing the German word "Allmend". When performing a concordance search with the exact character sequence ("Allmend"), I can only find one of them. Strangely enough when searching for a similar sequence (ALLMEND, almend) I can find both. Searches in the target language are normal (i.e. when searching for a certain target string I get normal results).

Reorganising TM, exporting/importing data in new TM, with character based concordance search etc., editing the text in the TM (overwriting the text, copy/pasting the text in the TM), this all doesn't help.

What did help was moving the segment to a different position in the tmx export file and import it again in a TM.

Anyone experienced this strange issue?

Regards,

Bruno Ciola

  • Hi 

    I recall seeing this behaviour but I didn't really figure out why it was happening except, for example, if there was a punctuation mark preventing the software from making the match properly. 

    I and other users also have periodically seen the occurrence of identical duplicates being added to a TM when confirming or of existing units not being overwritten when corrected during translation but rather creating a new unit and leaving the incorrect one inn the TM. Someone in the technical department said that it was because their background hex values were different. It hasn't happened yet with Studio 2017...

    Possibly the oddest TM behaviour I recall, I think it was with Studio 2014 or earlier, was this: No matter how many times I confirmed a certain frequently-used single-word source and target TU to the TM, I think it was 'Brenner <> Burner', it 'disappeared' and did not come up again in the Translation Results window or populate segments at the pretranslate stage of project setup. I remember that I exported the TM to tmx and opened it in Olifant and found numerous seemingly identical duplicate TUs with just 'Brenner <> Burner'. I never did find out why it was happening. I presumed at the time that it was a display issue.

    Huge complex pieces of software that have been built, revised and rebuilt, 'new programming on top of old' (?) will have the odd glitch now and then...

    Makes life more 'interesting' ;-))

    All the best

    Alison

  • Hi Bruno

    I have seen similar reports about concordance reported many times.
    Concordance search uses a fuzzy matching algorithm, as the name implies it doesn't do an exact search, it makes a "best guess" based on certain criteria. The advantage to this approach is its faster than an exact search, but at the same time it can miss segments that would be obvious to a human. I know SDL is aware of these issues, hopefully they are working on a solution.