define 'word' boundaries for better word count in Trados

Hi,

Today I⁠ got a small file from a client using MemoQ. When it comes to the word count, it showed that Trados had 140 words instead of the 123 from MemoQ.

It turned out, that placeholders were defined badly thus variables between square brackets did not get converted and this was a difference of 5 words. So there’s still a difference of 12 words (10%) between the two.

When I had a closer look it showed that there were about 6 EU standards referenced in the text and Trados counted EACH number and string between and sourrinding the '/' as 1 word.

Is there a way to define these via regex as 1 word so they get counted as 1 word? The same would go for quite long Standard names (e.g. DIN 2137-1:2018-11 or even longer ones) or badly formatted URLs and some other strings.

Just to make sure: No, it’s not an option to convert them to tags or untranslatable text as they need to be adapted (e.g. adding word joiners and no-break spaces to avoid wrapping)

I know you can adapt some missing word count settings via TM settings (language resources e.g. ’count as word if words contain’) but I can’t see any option to add regex or similar to define rules like above or am I⁠ missing something?

Best regards,

Pascal

clarification of additional settings
[edited by: Pascal Zotto at 2:14 PM (GMT 0) on 20 Jan 2025]

Translate

Rate translation

Suggest better translation

Moderator UI

Thread Subject & Description
define 'word' boundaries for better word count in Trados Hi, Today I⁠ got a small file from a client using MemoQ. When it comes to the word count, it showed that Trados had 140 words instead of the 123 from MemoQ. It turned out, that placeholders were defined badly thus variables between square brackets did not get converted and this was a difference of 5 words. So there’s still a difference of 12 words (10%) between the two. When I had a closer look it showed that there were about 6 EU standards referenced in the text and Trados counted EACH number and string between and sourrinding the '/' as 1 word. Is there a way to define these via regex as 1 word so they get counted as 1 word? The same would go for quite long Standard names (e.g. DIN 2137-1:2018-11 or even longer ones) or badly formatted URLs and some other strings. Just to make sure: No, it’s not an option to convert them to tags or untranslatable text as they need to be adapted (e.g. adding word joiners and no-break spaces to avoid wrapping) I know you can adapt some missing word count settings via TM settings (language resources e.g. ’count as word if words contain’) but I can’t see any option to add regex or similar to define rules like above or am I⁠ missing something? Best regards, Pascal
Get AI Suggestion

AI Reply

Accept answer Reject Answer

Top Replies

Parents

0 Daniel Hug over 1 year ago

Pascal Zotto said:
Is there a way to define these via regex as 1 word so they get counted as 1 word? The same would go for quite long Standard names (e.g. DIN 2137-1:2018-11 or even longer ones) or badly formatted URLs and some other strings.

Just to make sure: No, it’s not an option to convert them to tags or untranslatable text as they need to be adapted (e.g. adding word joiners and no-break spaces to avoid wrapping)

Pascal Zotto

I understood that the numbers were the problematic part – do they also change in the translation, such as "2137-1:2018-11" in the above example?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
0 Pascal Zotto over 1 year ago in reply to Daniel Hug

Hi Daniel Hug,

the numbers themselves should not change (although it happened that a client mixed up numbers of a standard and I had to correct it) but I need to replace and add other characters:

e.g. (the brackets only show which char was added/replaced)

DIN 535/2137-1:2018-11 > DIN(narrow nbsp)535/(word joiner)2137(non-breaking hyphen)1:(word joiner)2018(non-breaking hyphen)11
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
0 Daniel Hug over 1 year ago in reply to Pascal Zotto

Pascal Zotto

It's not worth it for such a small document, but if you have this case more often, you might want to do your replacements in the source (with CleanUp Task e.g.) and then make them into tags (I hope that is possible, but should). Once you've made them into tags their content can't be altered anymore - as I am sure you know.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
0 Pascal Zotto over 1 year ago in reply to Daniel Hug

Daniel Hug

Yes, I get these quite often in larger texts too but as I said several times by now: TAGs are NOT an option as I⁠ need to be able to change parts of the string AND I need them to be counted as 1 word: say DIN 535/2137-1:2018-11 = 1 word. So yes I know they can’t be altered ^^ and it would also give a wrong word count as tags are not counted during analysis.

I’m rater looking for a way to define word count as regex for DIN standards to be taken as 1 word. Even saying string between spaces is 1 word which would then still be closer to the correct word count (at least for these strings, but it would then result in problems with other strings).

But I guess this is not possible at the moment.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
0 Paul over 1 year ago in reply to Pascal Zotto

Pascal Zotto

I tend to agree with you... I'd also like a feature like that in Studio. But it's not there. A workaround might be to use the Cleanup Tasks app - https://appstore.rws.com/Plugin/23 - or the Data Protection Suite - https://appstore.rws.com/Plugin/39 - as you could use regex with these and add the batch task in a custom project prep (if you have the Professional Version of Studio) when you create your projects. With the Freelance it's just an additional step to run the batch task after preparing your project and then run the analysis again.

Paul Filkin | RWS

Design your own training!
You've done the courses and still need to go a little further, or still not clear?
Tell us what you need in our Community Solutions Hub
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
0 Pascal Zotto over 1 year ago in reply to Paul

Paul

Yes, but the problem is that I don’t need tags of these at all (they would even be counterproductive), so I don’t see how the plugin (which I⁠ use regularly in my professional version) would help me with that problem?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
0 Daniel Hug over 1 year ago in reply to Pascal Zotto

Pascal Zotto

What I meant (see above) and how I understand Paul is that you modify the DIN standard notation as needed in the source, using CleanUp Task. Then you "protect" the numerical part of the standard (e.g. "535/2137-1:2018-11") as a tag. This would result in the standard being counted as one word, and in being written the way you need it. I agree with Paul, for special use cases like yours it might be desirable to be able to define word boundaries freely, but since that is not an option at this point, this looks like a viable workaround.

Generated Image Alt-Text
[edited by: RWS Community AI at 2:08 PM (GMT 0) on 27 Jan 2025]
Cancel
Vote Up +1 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
0 Pascal Zotto over 1 year ago in reply to Daniel Hug

Daniel Hug

I know, but you always miss the point where I explained several times already that I MUST be able to replace the hyphens and add special space characters. So tags are NOT an option at all.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
0 Daniel Hug over 1 year ago in reply to Pascal Zotto

Pascal Zotto

You do this IN THE SOURCE. You MODIFY THE SOURCE to match your requirements. THEN you make it a tag. Target will have the formatting you did to the source via CleanUp Task.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
0 Pascal Zotto over 1 year ago in reply to Daniel Hug

Daniel Hug

Okay, but altering source text in pre-generated xliff exports or even sdlxliff files is again a problem with most clients as they do not allow this as else the files to import back differ from their CAT tool.

I would then have to reset everything back to initial state after translation is done as else the files to import differ from their memoQ or whatever files. On the other side, I’m not sure if that will then not cause internal issues when they import their files back. In theory it should not, but I⁠ had such issues in the past where I then had to use original files and re-run them against TM another time and fill in all tags manually to get them right. Sometimes this makes me lose hours on a single project.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
0 Paul over 1 year ago in reply to Pascal Zotto

Pascal Zotto

I wonder if you could use the Term injector plugin to manipulate the format in the source and have it automatically exactly as you'd like in the target?

https://appstore.rws.com/Plugin/75

Or even better this version, although you'll have to speak to Tommi to get it:

https://tomminieminen.github.io/TermInjectorPlus/

Paul Filkin | RWS

Design your own training!
You've done the courses and still need to go a little further, or still not clear?
Tell us what you need in our Community Solutions Hub
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
0 Pascal Zotto over 1 year ago in reply to Paul

Paul

I already use TI+ and with the okay from Tommi within the last year I even programmed it way beyond the basic functionality it had but that does not help with this problem as fiddling with source is not allowed.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate

Reply

0 Pascal Zotto over 1 year ago in reply to Paul

Paul

I already use TI+ and with the okay from Tommi within the last year I even programmed it way beyond the basic functionality it had but that does not help with this problem as fiddling with source is not allowed.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate

Children

0 Paul over 1 year ago in reply to Pascal Zotto

Pascal Zotto

Why would you fiddle with the source? Wouldn't you adjust the target translation result using Terminjector and leave the source alone?

It is a bit fiddly, but I think we'll soon have Terminjector Plus which looks a lot easier to manage... something to look forward to!

https://tomminieminen.github.io/TermInjectorPlus/

Paul Filkin | RWS

Design your own training!
You've done the courses and still need to go a little further, or still not clear?
Tell us what you need in our Community Solutions Hub
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
0 Pascal Zotto over 1 year ago in reply to Paul

because it’s about counting source words, not about translation. I use my verion of TI+ to adjust target as needed. That’s not the problem at all.

I’m not sure about it coming out soon from Tommi as the thing I heard from him what that he does not work on it anymore as he’s busy with OPUS CAT. He even allowed me to use it under a license so that I could see the new product I created.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
+1 Paul over 1 year ago in reply to Pascal Zotto

Pascal Zotto

Pascal Zotto said:
because it’s about counting source words, not about translation.

Fair enough. In that case I can confirm there is no solution other than to consider adopting a different tool for more of a standardised approach for wordcounts... Practicount for example. I don't know how customisable that is, but at least you will have the same wordcount for all the tools you work with. And if that doesn't help because you simply want to be able to mirror the wordcount from a tool that you were provided with then I think you'd need to build your own plugin for that, or you can try submitting an idea for this in the ideas site.

Pascal Zotto said:
I’m not sure about it coming out soon from Tommi as the thing I heard from him what that he does not work on it anymore as he’s busy with OPUS CAT.

I spoke with him yesterday over email and he confirmed this morning ;-)

Paul Filkin | RWS

Design your own training!
You've done the courses and still need to go a little further, or still not clear?
Tell us what you need in our Community Solutions Hub
Cancel
Vote Up +1 Vote Down

Sign in to reply

Reject Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
0 Pascal Zotto over 1 year ago in reply to Paul

Paul

Thanks for the tip. I’ll have a look at Practicount but of course it would be best for me to have everything in one app (Trados). So maybe I’ll go for a new plugin.

Great to hear Tommi is getting back with TI+. My version became way too powerful to offer it to the broad public as it ended up in a grammar based MT with way better results than existing MT and AI combinations. At least for the language combinations which I started programming the grammar and spelling rules for.

Edit: Practicount does not help at all as it only counts words/chars/lines/pages but does not offer at least repetitions.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate

Trados Studio > 1. Trados Studio

define 'word' boundaries for better word count in Trados

Top Replies