Multiterm Converter 2021 Parsing Error

I am trying to convert a .tbx file to create a termbase. 

The .tbx file is from the Irish terminological database, available here https://www.tearma.ie/ioslodail/ 

I am getting this error:
SDL MultiTerm Convert dialog box showing file paths for input, output, termbase definition, and log files. An error message reads 'The conversion option could not be initialised properly. An error occurred while parsing EntityName. Line 3167, position 58.'


I can see other parsing errors on the forum, but nothing that is a help re Multiterm Converter 2021.

Can anybody advise?

TIA, Darán



Generated Image Alt-Text
[edited by: Trados AI at 2:00 PM (GMT 0) on 5 Mar 2024]
emoji
Parents
  • Former Member
    +1 Former Member

    I downloaded this tbx file, changed the extension to xml and then used the SDL MultiTerm 2021 Convert (selecting OLIF XML format). That error refers to an ampersand (that is, the & sign) which is mixed up with the actual content of the terms. The ampersand cannot be escaped (ie replaced) with \& and instead Unicode & should be used. However there are tons of error of the same type plus other ampersands mixed up with backslashes.  This termbase has 184420 entries (ie "concepts")  and several descriptive fields (inside 4165212 lines). Doing a manual clean up is very slow and I guess a global search and replace with regular expressions will do the job.  I haven't tried with the Glossary Converter, but it is most likely the same errors will occur. I used Notepad++ to explore this huge XML file (in addition, files with the extension .tbx are essentially xml files, so changing the extension may not be necessary).

Reply
  • Former Member
    +1 Former Member

    I downloaded this tbx file, changed the extension to xml and then used the SDL MultiTerm 2021 Convert (selecting OLIF XML format). That error refers to an ampersand (that is, the & sign) which is mixed up with the actual content of the terms. The ampersand cannot be escaped (ie replaced) with \& and instead Unicode & should be used. However there are tons of error of the same type plus other ampersands mixed up with backslashes.  This termbase has 184420 entries (ie "concepts")  and several descriptive fields (inside 4165212 lines). Doing a manual clean up is very slow and I guess a global search and replace with regular expressions will do the job.  I haven't tried with the Glossary Converter, but it is most likely the same errors will occur. I used Notepad++ to explore this huge XML file (in addition, files with the extension .tbx are essentially xml files, so changing the extension may not be necessary).

Children
No Data