EntryID during conversion from TBX to XML + XDT

Hello,

currently, I'm working on the conversion of a TBX export from an "external" terminology database into the MultiTerm XML format to be able to import the entries into a MultiTerm database. I've tried "MultiTerm 2022 SR1 Convert" (17.1.2185) as well as the GlossayConverter (6.2.8543.33526).

Unfortunately, with both applications I'm facing the following problem and I don't know, why it behaves as it behaves.
During the conversion process, the values of the "<termEntry id="XYZ">" in the TBX file are written to "<descripGrp><descrip type="conceptId">XYZ</descrip></descripGrp>" in the XML. Additionally, in the XML file, at the beginning of each term entry, I get "<conceptGrp><concept>{ascending number}</concept>".

The TBX file starts with:
"<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE martif SYSTEM "TBXcoreStructV02.dtd">
<martif type="TBX" xml:lang="de-DE">
  <martifHeader>
    <fileDesc>
      <titleStmt>
        <title>Title</title>
      </titleStmt>
      <sourceDesc>
        <p></p>
      </sourceDesc>
    </fileDesc>
    <encodingDesc>
      <p type="XCSURI">www.ttt.org/.../p>
    </encodingDesc>
    <revisionDesc>
      <change>
        <p>2023-07-11 07:35:07 UTC</p>
      </change>
    </revisionDesc>
  </martifHeader>
  <text>
    <body>
      <termEntry id="10001">
      ..."

In my opinion, the tag "concept" has to contain the value from "<termEntry id=" of the TBX instead of a "self given" number. Otherwise, it might lead to doublicated entries in the MultiTerm database.

My question now is, why is the value of "<termEntry id=" in the TBX written to a "descripGrp" group instead of into the "concept" tag of the XML?

And second: What do I have to do, to be able to convert the TBX file into a MultiTerm XML file where the term ID from the TBX is used as the term ID in the MultiTerm XML?

Thank you very much in advance for any hint and your support. :)
Your help is very much appreciated.

Kind regards and have a nice and easy day.
Nils

emoji
Parents
  •  

    It's very hard for me to understand your problem without a better sample file (I don't know the product well enough).  So as an example I completed your file with a couple of terms:

    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE martif SYSTEM "TBXcoreStructV02.dtd">
    <martif type="TBX" xml:lang="de-DE">
      <martifHeader>
        <fileDesc>
          <titleStmt>
            <title>Title</title>
          </titleStmt>
          <sourceDesc>
            <p>Source Description</p>
          </sourceDesc>
        </fileDesc>
        <encodingDesc>
          <p type="XCSURI">www.ttt.org/.../</p>
        </encodingDesc>
        <revisionDesc>
          <change>
            <p>2023-07-11 07:35:07 UTC</p>
          </change>
        </revisionDesc>
      </martifHeader>
      <text>
        <body>
          <termEntry id="10001">
            <langSet xml:lang="de-DE">
              <tig>
                <term>Auto</term>
              </tig>
            </langSet>
            <langSet xml:lang="en-GB">
              <tig>
                <term>Car</term>
              </tig>
            </langSet>
          </termEntry>
          <termEntry id="10002">
            <langSet xml:lang="de-DE">
              <tig>
                <term>Fahrrad</term>
              </tig>
            </langSet>
            <langSet xml:lang="en-GB">
              <tig>
                <term>Bicycle</term>
              </tig>
            </langSet>
          </termEntry>
        </body>
      </text>
    </martif>
    

    Then tested with the Glossary Converter and also by using MultiTerm Convert.  I don't think either of them can "force" MultiTerm to use the conceptid in MultiTerm as I think it has it's own mechanism for this.  But at least the Glossary Converter can read the conceptid and add it as a new field at the entry level:

    Screenshot showing how the TBX conerts properly inside MultiTerm with the ConceptID of the TBX being used as a conceptid entry ID.

    MultiTerm Convert doesn't see this field at all.

    Is this what you are trying to achieve?  have MultiTerm use the same conceptid as the TBX?

    I think in this case the TradosAI reply looks pretty good!

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Hello  ,

    thank you very much for your soon response. Slight smile

    Yes, I'm trying to achieve, that MultiTerm uses the conceptID as Term ID ("Entry Id"). Therefore, yes, the AI answer looks pretty good.

    Let me try to clarify my problem a little:
    My problem is, that some terms are "doublicated" and (therefore) have different properties/values during updating the MultiTerm database with the XML.

    One entry is:

    Trados Studio MultiTerm entry showing Entry Id 3640 with conceptId 13876 highlighted, indicating a potential duplicate issue.

    But - after the import of an updated TBX/XML -  the same term exists with a different "Entry ID", too. But in the originating TBX file, it's the same term, according to the conceptId:

    Trados Studio MultiTerm entry showing a different Entry Id 3657 with the same conceptId 13876 highlighted, suggesting a duplication error.

    I'd expected, that a term entry from the TBX file will ever be referenced as the one and the same term entry in the MultiTerm database as well. Not as possibly multiple entries in the XML/MultiTerm.

    Does this clarify my problem  a little deeper?

    Thank you very much for your support.

    Kind regards
    Nils

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 2:21 PM (GMT 0) on 5 Mar 2024]
  •   

    Does this clarify my problem  a little deeper?

    Yes... 

    I think the reason why MultiTerm doesn't use the same conceptID as a TBX file is due to the different ways these two systems handle term management.

    TBX (TermBase eXchange) is a standard format for exchanging terminology data, including concept IDs.  Concept IDs in TBX are used to link related terms across different languages.

    On the other hand, MultiTerm, uses its own internal system for managing and linking terms. 

    When you import a TBX file into MultiTerm using the Glossary Converter or MultiTerm Convert for example, the tool creates a new MultiTerm TermBase and assigns its own internal IDs to the terms.  While this might mean that the original TBX concept IDs are not preserved, the relationships between terms are maintained, which is the most important aspect for translation work.

    I agree it could be useful to have the same ID in Multiterm as the TBX but this idea could unravel over time due to differences in data models between TBX and MultiTerm, potential integration conflicts with other systems, issues with scalability and uniqueness of IDs, increased complexity in maintenance, potential conflicts with future updates to MultiTerm, and potential redundancy if the relationships between terms are effectively maintained without preserving the exact TBX IDs.

    Let me try to clarify my problem a little:
    My problem is, that some terms are "doublicated" and (therefore) have different properties/values during updating the MultiTerm database with the XML.

    What options have you used for the import and can you share a small set of TBX files that would allow us to test and see if there's a way to manage it?

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Always entertaining to see the AI waffling away, but it's wrong. At least for the glossary converter  it's not "because this is how these tools are designed to work.", it's "because there's a bug." 

    It was part of the todo list for the next release, so I had a look today. If you want, you can try the latest beta at https://cerebus.de/glossaryconverter/beta/index.html . As long as the tbx id is numeric, it should behave as expected and use the tbx id directly as the concept id in xml. the conceptId field is never seen by the user.

    Depending on where you get your tbx from, the id can be nearly any string, like "asl-e4t-99n". MultiTerm can't handle this, so in a strict "No term shall be left behind" policy, the converter creates a unique random numeric value to replace it.

    Maintaining termbases by merging in changes is a highly skilled job, and keeping ids is just a small step, but least that step can be done a bit more efficiently  than making changes manually in notepad ;-) 

    emoji
  • Good morning ,

    thank you for your update and especially for the new beta of the Glossary Converter. I've downloaded and checked it today. As far as I could see, now, the "conceptId" from the TBX is converted to the "concept" entry in the (MultiTerm) XML. Great! Thumbsup

    In my case, the "conceptId" field in the TBX does only contain numerical entries. Therefore, in my case, there shouldn't be any difficulties, which could result from a string like i. e. "asl-e4t-99n".

    I'm going to run some further checks with my TBX export and the conversion into the XML format for MultiTerm during this week.
    If I should be faced with any (further) issues, I'll let you know.

    Thank you very much again and have great day.

     Slight smile

    Best regards
    Nils

    emoji
  •  

    At least for the glossary converter  it's not "because this is how these tools are designed to work.", it's "because there's a bug." 

    I wish it was a bug in MultiTerm too then!  Does that mean previous versions of the GC did this already?  I had no idea, and it is a very cool feature that is just one of the reasons the GC is a must have tool.

    Maintaining termbases by merging in changes is a highly skilled job

    Indeed... and often underestimated.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  •   it did not work properly before, hence the bug. I had put in the groundwork, but then another change broke it. That's why developers need external QA :-)

    emoji
  • I ran some tests on my end as well and in my case, the „termEntry” element from the TBX was correctly converted to the „concept” element in the XML using MultiTerm Convert. When importing the XML into Multiterm, however, new Entry IDs starting with 1 were indeed created. But this only happened when using the Default import definition. When using „Synchronize on Entry Number”, the entry numbers from the TBX were used.

    emoji
Reply Children