TM API - TM object - tokenizerflags vs recognizer

Hello developer team,

we are preparing the migration of our TMs on Group Share 2020. We want to create TMs that are exactly set up like on the old server. For this, we export the TM properties and we import them on the new server.

In the list of properties of a TM, we have 2 similar objects: TokenizerFlags and Recognizers. In the user interface, we can see the recognizers (date, numbers, alphanumerics...) but we can't set any tokenizer flags. Can you explain what's the difference between tokenizerflags and recognizers is and how the tokenizerflags are set when a TM is created manually?

Thank you

Sébastien Desautel

Parents
  • Hello and

    I want to share some troubleshooting we made recently by correcting the TokenizerFlags. This might be important for you to know and I 've just read in the forum that some people maybe need this information.

    We faced 2 problems recently:

    1. for a particular client, we received WorldServer packages that we analyze again with SDL Studio, one first time with the local TMs from the packages and one more time with our own Server TM. We had 3 different analyses! After some investigations we noticed, that the TokenizerFlags was set to "noFlags" and we repeated the analyse with another TM with "defaultFlags" and got the same result as the analyze with local TMs.

    2. a couple of days later, before we corrected the TokenizerFlags of our own server TM, we got an error in the editor preventing the user to work on the file. The reason was that the number of confirmed words was higher as the number of words in the file. This was the result again, of the 2 different count methods.

    Since we've corrected the TokenizerFlags in the client TM, we haven't got any problem anymore.

    What's still disturbing, is that WorldServer still gives a different analyze but we haven't the possibility to investigate on this.

    Kind regards

    Sébastien

Reply
  • Hello and

    I want to share some troubleshooting we made recently by correcting the TokenizerFlags. This might be important for you to know and I 've just read in the forum that some people maybe need this information.

    We faced 2 problems recently:

    1. for a particular client, we received WorldServer packages that we analyze again with SDL Studio, one first time with the local TMs from the packages and one more time with our own Server TM. We had 3 different analyses! After some investigations we noticed, that the TokenizerFlags was set to "noFlags" and we repeated the analyse with another TM with "defaultFlags" and got the same result as the analyze with local TMs.

    2. a couple of days later, before we corrected the TokenizerFlags of our own server TM, we got an error in the editor preventing the user to work on the file. The reason was that the number of confirmed words was higher as the number of words in the file. This was the result again, of the 2 different count methods.

    Since we've corrected the TokenizerFlags in the client TM, we haven't got any problem anymore.

    What's still disturbing, is that WorldServer still gives a different analyze but we haven't the possibility to investigate on this.

    Kind regards

    Sébastien

Children