Trados Studio > 1. Trados Studio

Remove tag from source

Trados Studio requires membership for participation - click to join

State Verified Answer
Replies 8 replies
Answers 1 answer
Subscribers 432 subscribers
Views 3070 views
Users 0 members are here

Options

Related

Remove tag from source

Daniel Hug over 5 years ago

Hi,

sorry, this sounds like a silly question, but I am trying to remove a tag (HTML entity) from my source, and replace pricing info with XXX:

Screenshot of Trados Studio showing a segment of text with a highlighted HTML entity £ and pricing information 3,010.

Source file is XML with embedded HTML.

I just updated my Cleanup Task, the old one had the "remove tag" option, recognized £ and would have removed it, but it threw an error, so I updated and the new version does not have this anymore.

Ideally, instead of [POUND]3,010 the source would say GBP X,XXX.

Tried SDLXLIFFToolkit, but that can find text in tags, but not replace them.

I should have dealt with this in the XML source, but alas ...

So my issue is that I cant find a way to handle (remove) a tag in the SDLXLIFF source.

Does anyone have an idea how to tackle this?

Daniel

Generated Image Alt-Text
[edited by: Trados AI at 8:24 PM (GMT 0) on 28 Feb 2024]

Rate translation

×

Suggest better translation

×

Moderator UI

×

Thread Subject & Description
Remove tag from source Hi, sorry, this sounds like a silly question, but I am trying to remove a tag (HTML entity) from my source, and replace pricing info with XXX: https://community.rws.com/resized-image/__size/320x240/__key/communityserver-discussions-components-files/90/pastedimage1583336270758v1.png Source file is XML with embedded HTML. I just updated my Cleanup Task, the old one had the "remove tag" option, recognized £ and would have removed it, but it threw an error, so I updated and the new version does not have this anymore. Ideally, instead of [POUND]3,010 the source would say GBP X,XXX. Tried SDLXLIFFToolkit, but that can find text in tags, but not replace them. I should have dealt with this in the XML source, but alas ... So my issue is that I cant find a way to handle (remove) a tag in the SDLXLIFF source. Does anyone have an idea how to tackle this? Daniel
Get AI Suggestion

AI Reply

Accept answer Reject Answer

Top Replies

Evzen Polenka over 5 years ago +1 verified

Ummm, this is an HTML entity, not XML entity... which means your XML has actually HTML embedded inside... which means you are actually parsing the HTML by Embedded Content Processor... right? So simply…

0 Paul over 5 years ago

Maybe in the sdlxliff itself?

Paul Filkin | RWS Group

________________________
Design your own training!
You've done the courses and still need to go a little further, or still not clear?
Tell us what you need in our Community Solutions Hub
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Reject Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
0 Daniel Hug over 5 years ago in reply to Paul

Paul

That is what I wanted to avoid - modifying the sdlxliff directly, e.g. in Npp. I was hoping for a CleanupTask - style process that can be run when needed.

If this is not possible in the sdlxliff, I think I will make this part of my XML pre-processing rules. This time I updated the source file with one where I had removed the strings in question.

Is there a reason why the new version of Cleanup Task does not offer removal of placeholder tags anymore?

Daniel
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
0 Paul over 5 years ago in reply to Daniel Hug

Unknown said:
Is there a reason why the new version of Cleanup Task does not offer removal of placeholder tags anymore?

I'd need to look back at the behaviour of the old one to answer this question. Assuming you are correct then no, there is probably no particular reason. We have had to change quite a bit of the way this plugin functioned to be able to fix some of the bugs reported and in reality we should probably rewrite it altogether because maintenance is quite difficult. So maybe we broke something... I don't know. But will need to check and see... would also need to see your source file and ensure that this placeholder tag is one that would have been allowed as even the old version didn't allow the removal of all types of tags.

Paul Filkin | RWS Group

________________________
Design your own training!
You've done the courses and still need to go a little further, or still not clear?
Tell us what you need in our Community Solutions Hub
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
+1 Evzen Polenka over 5 years ago

Ummm, this is an HTML entity, not XML entity... which means your XML has actually HTML embedded inside... which means you are actually parsing the HTML by Embedded Content Processor... right?

So simply turn on the entity conversion in the ECP and you should be done - the HTML entities will be converted to actual character during parsing-in the source... so there won't be any such entity-tag garbage.

Am I missing something?
Cancel
Vote Up +1 Vote Down

Sign in to reply

Verify Answer

Reject Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
0 Daniel Hug over 5 years ago in reply to Evzen Polenka

Evzen Polenka

Evzen Polenka said:
Ummm, this is an HTML entity, not XML entity

Well, yes, that's what I said in my initial post:

Unknown said:
I am trying to remove a tag (HTML entity) from my source
Evzen Polenka said:
you are actually parsing the HTML by Embedded Content Processor... right?

Right.

Evzen Polenka said:
So simply turn on the entity conversion in the ECP and you should be done - the HTML entities will be converted to actual character during parsing-in the source... so there won't be any such entity-tag garbage.

Voilá! It worketh! Some entity conversion was on (default settings I guess), some was off. Is there any reason NOT to have everything converted by default?

Daniel
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
0 Daniel Hug over 5 years ago in reply to Paul

Paul

This instance is solved, I modified the XML source, and Evzen Polenka pointed out rightly that these entities can be converted and will then be displayed as normal text that can be cleaned up without problem.

I sent you my files yesterday already, just in case you want to look into this. Cleanup Task is an extremly useful app, well worth keeping alive IMHO. It could almost become part of the core functionality of Studio.

Daniel
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
0 Evzen Polenka over 5 years ago in reply to Daniel Hug

IMO, in normal cases, there is hardly a reason to not convert the entities to real characters.

In my career I actually NEVER met a case where the characters would be entitizied deliberately in the source... the entities were always a result of some clumsy process, incorrectly configured tools, plain inexperience, etc. on the client's side.

Still, there MAY be cases where the system creating the sources (and consuming the targets) is some old-school one, unable to work with UTF-8 encoded files (remember, UTF-8 is just natural and expected by default in XML, but not mandatory) and requiring ASCII encoded ones...

And that's where NOT converting the entities used to get handy... to be able to have entitized characters in the target as well (I know, you can convert them when loading into Studio and backconvert them when saving target... but I think that was not possible, or didn't work like that, or something back in that time)

I may be wrong here, but I think(!) this is the story behind all this entities conversion...
It actually dates back to Trados times when Unicode and UTF-8 used to appear just in engineer's wet dreams and the reality was dominated by individual charsets, ISO codepages competing with DOS ones, Windows ones, Mac ones, EBCDIC ones... huh...
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
0 Daniel Hug over 5 years ago in reply to Evzen Polenka

Evzen Polenka

In our case, a CMS dutifully encodes non-ASCII characters in HTML fields, so I get plenty of that in my source files. But if I understand you correctly, I should never have to encode non-ASCII characters in my target files, which would solve a couple of issues for me, so I would like to do that.

But: How can I convert entities to real characters when parsing but NOT convert back to entities when writing the target file? As far as I can see, the "Entities" settings govern both. I vaguely remember reducing the number of entitiy conversions I permitted because the CMS would not read all of them correctly.

At the moment, I avoid encoding as entities for some XML elements by not passing them on the embedded content processor. The CMS treats some fields as plain text, so entities would be displayed as they are, such as "Ärgernis".

I usually parse by field name, but in some cases e.g. the "Title" field will expect (and contain) HTML in some templates but plain text in other templates. So I guess I will have to parse all fields first which I want to pass on the the embedded content processor, then use some catch-all to parse the content for signs of HTML, then parse for plain text fields...

Daniel
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate