EntryID during conversion from TBX to XML + XDT

Hello,

currently, I'm working on the conversion of a TBX export from an "external" terminology database into the MultiTerm XML format to be able to import the entries into a MultiTerm database. I've tried "MultiTerm 2022 SR1 Convert" (17.1.2185) as well as the GlossayConverter (6.2.8543.33526).

Unfortunately, with both applications I'm facing the following problem and I don't know, why it behaves as it behaves.
During the conversion process, the values of the "<termEntry id="XYZ">" in the TBX file are written to "<descripGrp><descrip type="conceptId">XYZ</descrip></descripGrp>" in the XML. Additionally, in the XML file, at the beginning of each term entry, I get "<conceptGrp><concept>{ascending number}</concept>".

The TBX file starts with:
"<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE martif SYSTEM "TBXcoreStructV02.dtd">
<martif type="TBX" xml:lang="de-DE">
  <martifHeader>
    <fileDesc>
      <titleStmt>
        <title>Title</title>
      </titleStmt>
      <sourceDesc>
        <p></p>
      </sourceDesc>
    </fileDesc>
    <encodingDesc>
      <p type="XCSURI">www.ttt.org/.../p>
    </encodingDesc>
    <revisionDesc>
      <change>
        <p>2023-07-11 07:35:07 UTC</p>
      </change>
    </revisionDesc>
  </martifHeader>
  <text>
    <body>
      <termEntry id="10001">
      ..."

In my opinion, the tag "concept" has to contain the value from "<termEntry id=" of the TBX instead of a "self given" number. Otherwise, it might lead to doublicated entries in the MultiTerm database.

My question now is, why is the value of "<termEntry id=" in the TBX written to a "descripGrp" group instead of into the "concept" tag of the XML?

And second: What do I have to do, to be able to convert the TBX file into a MultiTerm XML file where the term ID from the TBX is used as the term ID in the MultiTerm XML?

Thank you very much in advance for any hint and your support. :)
Your help is very much appreciated.

Kind regards and have a nice and easy day.
Nils

emoji
Parents
  • Hello Nils,

    I understand your concern about the conversion of TBX to MultiTerm XML format and the issue with the termEntry ID. The conversion process you're experiencing is the standard behavior of the MultiTerm Convert and Glossary Converter tools. They assign a new, ascending number to the "<concept>" tag in the XML file, while the original termEntry ID from the TBX file is preserved in the "<descrip type="conceptId">" tag.

    To answer your first question, the value of "<termEntry id=" in the TBX is written to a "descripGrp" group instead of into the "concept" tag of the XML because this is how these tools are designed to work. They preserve the original termEntry ID from the TBX file in the "<descrip type="conceptId">" tag, while assigning a new, ascending number to the "<concept>" tag in the XML file.

    As for your second question, unfortunately, there's no direct way to convert the TBX file into a MultiTerm XML file where the term ID from the TBX is used as the term ID in the MultiTerm XML using these tools. However, you can manually edit the XML file to replace the ascending numbers in the "<concept>" tags with the corresponding termEntry IDs from the TBX file. Please note that this could be a time-consuming process if you have a large number of entries.

    Here's a step-by-step guide on how you can do this:

    Step 1: Open the XML file in a text editor (e.g., Notepad++).

    Step 2: For each "<conceptGrp><concept>{ascending number}</concept>", replace "{ascending number}" with the corresponding "<termEntry id="XYZ">" from the TBX file.

    Step 3: Save the changes to the XML file.

    Please note that you need to ensure that the termEntry IDs are unique to avoid any issues with duplicate entries in the MultiTerm database.

    I hope this helps! If you have any other questions, feel free to ask.

    Best regards,

    RWS Community AI

    emoji
Reply
  • Hello Nils,

    I understand your concern about the conversion of TBX to MultiTerm XML format and the issue with the termEntry ID. The conversion process you're experiencing is the standard behavior of the MultiTerm Convert and Glossary Converter tools. They assign a new, ascending number to the "<concept>" tag in the XML file, while the original termEntry ID from the TBX file is preserved in the "<descrip type="conceptId">" tag.

    To answer your first question, the value of "<termEntry id=" in the TBX is written to a "descripGrp" group instead of into the "concept" tag of the XML because this is how these tools are designed to work. They preserve the original termEntry ID from the TBX file in the "<descrip type="conceptId">" tag, while assigning a new, ascending number to the "<concept>" tag in the XML file.

    As for your second question, unfortunately, there's no direct way to convert the TBX file into a MultiTerm XML file where the term ID from the TBX is used as the term ID in the MultiTerm XML using these tools. However, you can manually edit the XML file to replace the ascending numbers in the "<concept>" tags with the corresponding termEntry IDs from the TBX file. Please note that this could be a time-consuming process if you have a large number of entries.

    Here's a step-by-step guide on how you can do this:

    Step 1: Open the XML file in a text editor (e.g., Notepad++).

    Step 2: For each "<conceptGrp><concept>{ascending number}</concept>", replace "{ascending number}" with the corresponding "<termEntry id="XYZ">" from the TBX file.

    Step 3: Save the changes to the XML file.

    Please note that you need to ensure that the termEntry IDs are unique to avoid any issues with duplicate entries in the MultiTerm database.

    I hope this helps! If you have any other questions, feel free to ask.

    Best regards,

    RWS Community AI

    emoji
Children
No Data