How to use regular expressions in XML file types to mark placeholders?

I created a custom XML file type for an XML file with embedded HTML content and would like to mark formatting strings enclosed in rectangular brackets as placeholders.

Here's an excerpt:

<content:encoded><![CDATA[This is [B]an[/B] example.]]></content:encoded>

What kind of XPATH query will I need to use to select all strings enclosed by rectangular brackets?

Translate

Rate translation

Suggest better translation

Moderator UI

Thread Subject & Description
How to use regular expressions in XML file types to mark placeholders? I created a custom XML file type for an XML file with embedded HTML content and would like to mark formatting strings enclosed in rectangular brackets as placeholders. Here's an excerpt: <content:encoded><![CDATA[This is [B]an[/B] example.]]></content:encoded> What kind of XPATH query will I need to use to select all strings enclosed by rectangular brackets?
Get AI Suggestion

AI Reply

Accept answer Reject Answer

Top Replies

Parents

0 Jerzy Czopik over 4 years ago

I'm afraid xpath won't work here. Even the typical regex rule for tag pairs will not work. So what remains is creating placeholders for such expressions, preferably \[[^]]*\]

_________________________________________________________

When asking for help here, please be as accurate as possible. Please always remember to give the exact version of product used and all possible error messages received. The better you describe your problem, the better help you will get.

Want to learn more about Trados Studio? Visit the Community Hub. Have a good idea to make Trados Studio better? Publish it here.

Documentation Survey: help us offer you better documentation! Translate
0 Hans Kröger over 4 years ago in reply to Jerzy Czopik

AFAIK, it's not possible to create placeholders in XML files. Apparently, only XPATH expressions are allowed.

Documentation Survey: help us offer you better documentation! Translate
0 Paul over 4 years ago in reply to Hans Kröger

Hans Kröger

Your example is a CDATA section so you'll be using an embedded content processor to handle this. You can use regex to handle placeholders as Jerzy explained.

Paul Filkin | RWS Group

________________________
Design your own training!
You've done the courses and still need to go a little further, or still not clear?
Tell us what you need in our Community Solutions Hub

Documentation Survey: help us offer you better documentation! Translate
0 Hans Kröger over 4 years ago in reply to Paul

I looked at the HTML5 embedded content processor, but I don't see any options for adding regular expression in the Parser dialog box.
Apparently, I can only include/exclude HTML tags by name.

Documentation Survey: help us offer you better documentation! Translate
+1 Jerzy Czopik over 4 years ago in reply to Hans Kröger

Unfortunately you cannot combine more than two parser settings, so you need to define both "normal" tags and []-tags in one parser, using regex.

_________________________________________________________

When asking for help here, please be as accurate as possible. Please always remember to give the exact version of product used and all possible error messages received. The better you describe your problem, the better help you will get.

Want to learn more about Trados Studio? Visit the Community Hub. Have a good idea to make Trados Studio better? Publish it here.

Documentation Survey: help us offer you better documentation! Translate

Reply

+1 Jerzy Czopik over 4 years ago in reply to Hans Kröger

Unfortunately you cannot combine more than two parser settings, so you need to define both "normal" tags and []-tags in one parser, using regex.

_________________________________________________________

When asking for help here, please be as accurate as possible. Please always remember to give the exact version of product used and all possible error messages received. The better you describe your problem, the better help you will get.

Want to learn more about Trados Studio? Visit the Community Hub. Have a good idea to make Trados Studio better? Publish it here.

Documentation Survey: help us offer you better documentation! Translate

Children

0 Hans Kröger over 4 years ago in reply to Jerzy Czopik

Thanks for confirming my suspicions. IMHO, it's a major design flaw that users will need to create a plaintext based HTML parser when they want to add custom placeholders to embedded HTML content.

Documentation Survey: help us offer you better documentation! Translate
0 Paul over 4 years ago in reply to Hans Kröger

Hans Kröger

Hans Kröger said:
it's a major design flaw

I disagree. It's definitely something that would be a beneficial enhancement, but the use of what looks a bit like bb code inside an html embedded inside an xml file is hardly valid html, so it's not a flaw.

If you don't care to see as bold in the editor, for example, and only wish to ensure the tags are handled then adding the appropriate tags isn't that difficult using regex as you don't need to create separate rules for each tag type.

Paul Filkin | RWS Group

________________________
Design your own training!
You've done the courses and still need to go a little further, or still not clear?
Tell us what you need in our Community Solutions Hub

Documentation Survey: help us offer you better documentation! Translate
+1 Daniel Hug over 4 years ago in reply to Hans Kröger

Hans Kröger

I think you are expecting too much from the Studio parser. There is an XML parser to handle the XML content and an HTML parser to handle the embedded content. If you have custom requirements, you might have to find a custom solution. One simple way might be to turn the [B]stuff[/B] into tags. If you pre-process your files to arrive at something like this:

<content><![CDATA[This is <custom_content value="[B]an[/B]"/> example.]]></content>

(I used the regex (\[B\].*?\[\/B\]) for matching and <custom_content value="$1"/> for replacing.)

How easy that is depends a bit on the content: Is it always [B] or can it be [All kind of things]? What is enclosed? If it's characters like <, ", ', & the content might need escaping.

If it is as simple as it is in your sample, then the above will show in the editor a like that:

Daniel

EDIT: I should add that if you go this route, you will have to post-process the files accordingly. Just to state the obvious.

Generated Image Alt-Text
[edited by: Trados AI at 4:37 AM (GMT 0) on 5 Mar 2024]

Documentation Survey: help us offer you better documentation! Translate
0 Hans Kröger over 4 years ago in reply to Daniel Hug

Daniel Hug
Thanks for your very helpful reply!!! I might actually have to go that route. I'm also considering post-processing the XLIFF file.

Paul

Paul said:
It's definitely something that would be a beneficial enhancement, but the use of what looks a bit like bb code inside an html embedded inside an xml file is hardly valid html, so it's not a flaw.

You have a point there, but, IMHO, it doesn't make sense that the embedded HTML content processor doesn't offer the same options as the regular HTML file type. You're basically forcing Studio users who want to create custom placeholders in embedded HTML content to create a make-shift HTML parser, when Studio already comes with one.

Also, I should be possible to select strings inside of tags using XPATH commands such as text(), but Studio doesn't seem to support these commands.

Documentation Survey: help us offer you better documentation! Translate
0 Paul over 4 years ago in reply to Hans Kröger

Hans Kröger said:
IMHO, it doesn't make sense that the embedded HTML content processor doesn't offer the same options as the regular HTML file type.

Hans Kröger

As Daniel already pointed out, you are asking far too much of the tool. The HTML filetype can use an embedded processor to handle embedded content. But seeing as you are actually handling XML and not HTML you are asking it to use an embedded content processor inside an embedded content processor. Quite a big ask and nothing to do with making sense.

Hans Kröger said:
Also, I should be possible to select strings inside of tags using XPATH commands such as text(), but Studio doesn't seem to support these commands.

text() would be used to select text between tags and not inside tags. And it certainly does work.

Paul Filkin | RWS Group

________________________
Design your own training!
You've done the courses and still need to go a little further, or still not clear?
Tell us what you need in our Community Solutions Hub

Documentation Survey: help us offer you better documentation! Translate
0 Hans Kröger over 4 years ago in reply to Paul

Paul said:
text() would be used to select text between tags and not inside tags. And it certainly does work.

Can you please provide an example of the text() syntax that SDL Studio supports?

Documentation Survey: help us offer you better documentation! Translate
0 Paul over 4 years ago in reply to Hans Kröger

Hans Kröger

Hans Kröger said:
Can you please provide an example of the text() syntax that SDL Studio supports?

It supports the correct way to use it in XPath:

//*[contains(text(), 'toast')]

In this case any segments at all containing the word toast.

//VariableAssignment[answer/text()='42']/Value

A way to extract the content of the <Value> element ONLY if the value of the <answer> element is 42 - using a conditional xpath expression;

Paul Filkin | RWS Group

________________________
Design your own training!
You've done the courses and still need to go a little further, or still not clear?
Tell us what you need in our Community Solutions Hub

Generated Image Alt-Text
[edited by: Trados AI at 4:37 AM (GMT 0) on 5 Mar 2024]

Documentation Survey: help us offer you better documentation! Translate

Trados Studio > 5. Regex and XPath

How to use regular expressions in XML file types to mark placeholders?

Top Replies