Wordpress XML - cleaning up with a regex but how to apply

Hi Community,

I have received a Wordpress XML file and creating a custom XML for it.

It looks good so far, the parsing works, but it has HTML embedded content, and it is hard to clean it up.

It has some of the below content:

************************************
<strong> Mise en œuvre</strong>
L'ensemble des notions théoriques seront illustrées par des cas concrets sur le logiciel elec calcTm.
[/av_textblock]

[av_heading tag='h3' padding='10' heading='Dates' color='' style='blockquote modern-quote' custom_font='' size='' subheading_active='' subheading_size='15' custom_class='' admin_preview_bg='' av-desktop-hide='' av-medium-hide='' av-small-hide='' av-mini-hide='' av-medium-font-size-title='' av-small-font-size-title='' av-mini-font-size-title='' av-medium-font-size='' av-small-font-size='' av-mini-font-size='' margin=''][/av_heading]

[av_textblock size='16' font_color='' color='' av-medium-font-size='' av-small-font-size='' av-mini-font-size='' av_uid='av-50tak4l' admin_preview_bg='']
<span style="color: #000000;"><strong>Les 24, 25 et 26 novembre de 9h00 à 12h30 et de 14h00 à 17h30</strong></span>
[/av_textblock]

[av_heading tag='h3' padding='10' heading='Programme' color='' style='blockquote modern-quote' custom_font='' size='' subheading_active='' subheading_size='15' custom_class='' admin_preview_bg='' av-desktop-hide='' av-medium-hide='' av-small-hide='' av-mini-hide='' av-medium-font-size-title='' av-small-font-size-title='' av-mini-font-size-title='' av-medium-font-size='' av-small-font-size='' av-mini-font-size='' margin=''][/av_heading]

[av_textblock size='16' font_color='' color='' av-medium-font-size='' av-small-font-size='' av-mini-font-size='' av_uid='av-juva8v6m' admin_preview_bg='']

********************************

I am a newbie to regex, but with testing I came up with the below to filter out the [av...] bits (works in RegExr):

\[[a-z\s\S]+\]

Trados Studio error message displaying issues with HTML embedded content in an XML file.

However, I cannot seem to add this to Studio. Do I have to add this to the HTML embedded content processor?
Studio seems to have a problem with it each time I add it here (something is missing, have to add an attribute, etc):
Trados Studio Project File Type Settings showing HTML 5 Parser configurations.

Can you please advise? Added the source file for reference.

Thank you!
Greta

export-tsi-page-octobre-2020.xml



Generated Image Alt-Text
[edited by: Trados AI at 4:32 AM (GMT 0) on 5 Mar 2024]
emoji