Catching </ol> and </ul>

Hi all,

This has been driving me nuts for a while now. I'm using embedded content (legacy) in an XML file type to catch some common html tags, among other things. All in all this is pretty straightforward, but html lists are giving me quite a headache.

Right now I have this in there:

Start tag: <[o|u]l>

End tag: </[o|u]l>

Segmentation hint: Exclude

What happens is that everything after the first list in a file--either ordered or unordered--is not extracted. I tried all kinds of variations of the above expressions, and also used separate tag pairs for ordered and unordered lists, but the result is always the same.

I should maybe also mention that, unfortunately, the embedded content processors that were introduced recently are not an option, because they are not available in WorldServer

I'd be grateful for any pointers to fix this.

Stephan

Translate

Rate translation

Suggest better translation

Moderator UI

Thread Subject & Description
Catching </ol> and </ul> Hi all, This has been driving me nuts for a while now. I'm using embedded content (legacy) in an XML file type to catch some common html tags, among other things. All in all this is pretty straightforward, but html lists are giving me quite a headache. Right now I have this in there: Start tag: <[o|u]l> End tag: </[o|u]l> Segmentation hint: Exclude What happens is that everything after the first list in a file--either ordered or unordered--is not extracted. I tried all kinds of variations of the above expressions, and also used separate tag pairs for ordered and unordered lists, but the result is always the same. I should maybe also mention that, unfortunately, the embedded content processors that were introduced recently are not an option, because they are not available in WorldServer I'd be grateful for any pointers to fix this. Stephan
Get AI Suggestion

AI Reply

Accept answer Reject Answer

Parents

0 Jerzy Czopik over 10 years ago

Hi

What happens if you use this <(ol|ul)> and </(ol|ul)>? These should be more precise...

_________________________________________________________

When asking for help here, please be as accurate as possible. Please always remember to give the exact version of product used and all possible error messages received. The better you describe your problem, the better help you will get.

Want to learn more about Trados Studio? Visit the Community Hub. Have a good idea to make Trados Studio better? Publish it here.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
0 Stephan Gasteyer over 10 years ago in reply to Jerzy Czopik

Hi Jerzy,

thanks, but unfortunately this one also produces the same result. I also tested this in other applications and it works just fine. It's just Studio that won't cooperate, and I have no clue where I'm going wrong.

I also tried escaping pretty much every single character just in case it does something unexpected, but to no avail.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate

Reply

0 Stephan Gasteyer over 10 years ago in reply to Jerzy Czopik

Hi Jerzy,

thanks, but unfortunately this one also produces the same result. I also tested this in other applications and it works just fine. It's just Studio that won't cooperate, and I have no clue where I'm going wrong.

I also tried escaping pretty much every single character just in case it does something unexpected, but to no avail.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate

Children

0 Jerzy Czopik over 10 years ago in reply to Stephan Gasteyer

This is indeed very strange. Maybe the element which includes the embedded content (such as cData or so) is not defined properly? From my experience that way of using embedded content should work. What you can try too is to change the "exclude" to "may excluce"...

_________________________________________________________

When asking for help here, please be as accurate as possible. Please always remember to give the exact version of product used and all possible error messages received. The better you describe your problem, the better help you will get.

Want to learn more about Trados Studio? Visit the Community Hub. Have a good idea to make Trados Studio better? Publish it here.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
0 Paul over 10 years ago in reply to Jerzy Czopik

Hi Stephan,

It might help to see a sample of the xml file with the elements containing the html code you wish to handle? Perhaps also mention the other rules you have created? One of the biggest problems with using the legacy embedded content processor is when you start to add many rules as you can easily get some overlap which can cause unexpected behaviour when you parse the file.

Paul Filkin | RWS Group

________________________
Design your own training!
You've done the courses and still need to go a little further, or still not clear?
Tell us what you need in our Community Solutions Hub
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate

0 Stephan Gasteyer over 10 years ago in reply to Paul

Thank you both for your help. The overlap was a good hint. So I figured I'll add the rules one by one to see where they go awry, and sure enough it wasn't until the last one Confused .

All embedded content is in a CDATA section. Basically, this section can hold any html formatting. For example:

<![CDATA[
<ol>
<li>Punkt 1</li>
<li>Punkt 2</li>
<li>Punkt 3</li>
</ol>
<ul>
<li>Punkt 4</li>
<li>Punkt 5</li>
<li>Punkt 6</li>
</ul>
]]>

These are the rules. Everything works fine until I add \n, at which point any content that comes after the first list disappears.

Start Tag	End Tag	Type	Translate	Segmentation
<(ol\|ul>	</(ol\|ul)>	Tag Pair	Yes	Exclude
<li>	</li>	Tag Pair	Yes	Exclude
<a.*?>	</a>	Tag Pair	Yes	Include
<i>	</i>	Tag Pair	Yes	Include
<b>	</b>	Tag Pair	Yes	Include
<sub>	</sub>	Tag Pair	Yes	Include
<sup>	</sup>	Tag Pair	Yes	Include
\{\d\}		Placeholder		Include
<br>		Placeholder		Exclude
</br>		Placeholder		Exclude
<br />		Placeholder		Exclude
\n		Placeholder		Exclude

The trouble is that for the output it doesn't seem to make a difference if a new line is triggered by a manual line break or a break tag, so there's no consistency in the source files and I have to segment at both <br> and manual breaks.

I'm a bit lost now, because now that I was able to isolate \n as the cause of the problem, I have no idea how to prevent it.

Thanks!

Stephan

Documentation Survey: help us offer you better documentation! Translate

0 Jerzy Czopik over 10 years ago in reply to Stephan Gasteyer

\n is wrong, the "\" must be escaped, so it should read \\n

_________________________________________________________

When asking for help here, please be as accurate as possible. Please always remember to give the exact version of product used and all possible error messages received. The better you describe your problem, the better help you will get.

Want to learn more about Trados Studio? Visit the Community Hub. Have a good idea to make Trados Studio better? Publish it here.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
0 Stephan Gasteyer over 10 years ago in reply to Jerzy Czopik

Now you have me confused. When I escape the backslash, content after the list is parsed correctly, but this isn't:

<![CDATA[

Line 1 w/ manual break

Line 2 w/ manual break

Line 3 w/ manual break

]]>

With \\n all three lines end up in the same segment. Note there's a break tag after line 3 in the example below. With just \n, lines one to three are parsed correctly with each line in a new segment, as is the list (Punkt 1 to Punkt 3), but everything after the list is gone
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
0 Jerzy Czopik over 10 years ago in reply to Stephan Gasteyer

set \\n to external (structure)

_________________________________________________________

When asking for help here, please be as accurate as possible. Please always remember to give the exact version of product used and all possible error messages received. The better you describe your problem, the better help you will get.

Want to learn more about Trados Studio? Visit the Community Hub. Have a good idea to make Trados Studio better? Publish it here.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
0 Paul over 10 years ago in reply to Stephan Gasteyer

Why not take a different approach? You were correct the first time I think because here the \n did not need to be escaped unless you were trying to find the \ in \n specifically as opposed to a line feed. So perhaps if you remove the \n rule altogether and then create a segmentation rule in the TM instead you will have more success.

I have not tried to recreate your situation but \n is quite a catch all and this may be causing a problem elsewhere in the file tagging. If you add it as a segmentation rule, which is what you are trying to achieve then you may find you have the desired result.

Paul Filkin | RWS Group

________________________
Design your own training!
You've done the courses and still need to go a little further, or still not clear?
Tell us what you need in our Community Solutions Hub
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Reject Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
0 Stephan Gasteyer over 10 years ago in reply to Paul

Technically, this sounds like a good solution, but alas, again WorldServer doesn't work that way. There simply are no TM-level segmentation rules. It's one of those cases where I wished the two were integrated better.

I might just have to take this up with the authors and see if they can achieve some consistency here and always use break tags, or indeed maybe even with the developers to see if they can automatically convert the line feeds to break tags when creating the xmls so I can do away with the \n rule.

Thanks again, I appreciate all your help.

Stephan
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
0 Jerzy Czopik over 10 years ago in reply to Stephan Gasteyer

Hi Stephan

Do you see any possibility to contact me off forum (jerzy at czopik dot com) and let me try to work with you on your settings?

_________________________________________________________

When asking for help here, please be as accurate as possible. Please always remember to give the exact version of product used and all possible error messages received. The better you describe your problem, the better help you will get.

Want to learn more about Trados Studio? Visit the Community Hub. Have a good idea to make Trados Studio better? Publish it here.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
0 Stephan Gasteyer over 10 years ago in reply to Jerzy Czopik

Hi Jerzy

Absolutely. I'm just having a crazy day and I didn't get round to compiling some sample files and the FTD yet. Also, it's not that big an issue anymore. For now I'll just drop the list tags. All that will do is give us translatable segments with a single plain text html tag, so no biggie, especially since lists don't come up all that often.

It's a lot more important that all the other segmentation rules are adhered to.

I'm inclined to select Paul's suggestion with the TM-level segmentation rule as the answer, because in any other scenario this is probably what I would do.

Either way I'm still curious about what's causing the conflict and how to avoid it. I seem to be unable to figure this out myself. So if you still want to help me out here even though it's not really necessary I'll be happy to send you some files.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate

Trados Studio > 5. Regex and XPath

Catching </ol> and </ul>