Extract TM source files list

Hi everyone,

I'd like to update some (massive) TMs, but I don't know which source files have already been aligned and added, so that's the first item on my to-do list before adding any new alignment result.

So far the only way I've found is to simply open the TM in Trados, look at the third column and copy/paste the source file name somewhere.

Is there any automated way to get the list of all the source files in my TM? I've thought about SQL-ing this, but I don't know enough about it.

Any help would be greatly appreciated! Thanks!

emoji
Parents
  •   

    You cannot do this easily in Trados Studio, but with the help of ChatGPT you can do this in SQLite even more easily!

    The first thing to do is look where the information is held by inspecting the Translation Memory in a tool like DB Browser for SQLite as this is free.  When you do this you'll see this sort of thing:

    Screenshot of DB Browser for SQLite showing the database structure with a list of tables and indices, with 'string_attributes' table highlighted.

    Look through the tables in the "Browse Data" and you'll find this:

    Screenshot of DB Browser for SQLite displaying the 'Browse Data' tab with the 'string_attributes' table selected, showing multiple entries with source and target filenames.

    This is the "string_attributes" table and you can see that the "value" column holds all the information related to custom fields and for alignment TMs this includes the source and target filenames.  Trados Studio won't hold the full file name, just the name without the extension.  It also puts source and target together into both the source and target fields (don't ask me why!).  So what I want is a list of the unique values from this column, less the .sdlalign part as it's not needed.  I can get that with a regex like this:

    .+(?=\.sdlalign)

    So, armed with all of this information I can ask ChatGPT something like this:

    Create a SQLite instruction for use in DB Browser for SQLite.
    The instruction should create a list of the contents of the value column in the string_attributes table.
    The content returned in the value column should be evaluated using a SQLite equivalent to this regular expression:
    .+(?=\.sdlalign)
    The list of contents returned should only be unique values.

    ChatGPT obliges with this excellent information:

    Screenshot of a text box with an SQLite instruction to select distinct values from the 'value' column in the 'string_attributes' table, excluding '.sdlalign' part.

    So I enter this:

    SELECT DISTINCT
    substr(value, 0, instr(value, '.sdlalign'))
    FROM
    string_attributes
    WHERE
    value LIKE '%.sdlalign';

    Into the "Execute SQL" tab:

    Screenshot of the 'Execute SQL' tab in DB Browser for SQLite with the provided SQLite instruction ready to be run.

    Then "Run" the code.  This returns the following:

    en-greatminds_es-greatminds
    en-latinamerica_es-latinamerica
    en-UNICEF_es-UNICEF

    I actually ran three alignments into a TM to test this.  So now I have a list of the files I aligned.  I think this is another excellent example of just how smart ChatGPT is.  If I didn't know how to write the regular expression I could have probably skipped that part and just explained in my question what I wanted. 

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 10:11 AM (GMT 0) on 4 Mar 2024]
  • Hi Paul,

    Thank you for your quick response!

    Unfortunately, I didn't run these queries in ChatGPT as I'm not comfortable with spreading information about my clients' contents.

    In the meantime, I did find a more manual way to do it:

    - export the TM as a .tmx file

    - open it with Word

    - copy the first .sdlalign file name in another document

    - hit "Search and Replace" to delete all occurrences of this file name in the exported TM

    - "Search" for the next .sdlalign occurrence

    - repeat until there's no more .sdlalign occurrence.

    It works perfectly fine for small TMs, but for heavier ones it is indeed a bit longer than working with ChatGPT. Still quicker than just scrolling through the TM in Trados.

    Thanks again for your time though! I'm keeping your solution in mind for less sensitive content.

    emoji
  •  

    Unfortunately, I didn't run these queries in ChatGPT as I'm not comfortable with spreading information about my clients' contents.

    You wouldn't be!  I only used ChatGPT to get the SQLite query for any SDLTM.  Then I ran the query in DB Browser for SQLite... which is about as safe as you searching in Trados Studio :-)  Now that I gave you the query you don't even need to use ChatGPT at all.

    In the meantime, I did find a more manual way to do it:

    Indeed.... perhaps worth installing Notepad++.  It might be better for you doing operations like this than working in Word.  Glad you solved it though!

    emoji
  • perhaps worth installing Notepad++

    Noted!

    I'm sorry I didn't read your reply correctly! I've tried what you wrote and it worked perfectly. Thank you so much for this insanely time-saving method!

    emoji
  •  

    Thank you so much for this insanely time-saving method!

    Thanks for going back and checking.  I completely agree with the concerns over ChatGPT and your content, but in terms of solving technical problems that I certainly am not able to do on my own yet, it's a brilliant tool!  Easy too once you see how it all works.

    emoji
Reply Children
No Data