How to achieve document-level or project-level context for MT and AI

I am looking for a way to achieve at least document-level context for MT and AI. The current solution is to run an AI platform in parallel and copy/paste from Trados Studio to the AI platform. There, and agent is equipped with the entire document (and reference material, if there is). I then manually paste the response back into Studio. So far, so good, but this is painfully manual and very slow.

I'd like to be able to tell Studio's AI assistant:

“Include the preceding X segments (or paragraphs) in your request. With (or without) translation.”

“Include the succeeding X segments (or paragraphs) in your request. (With or without translation, for the sake of completion, although this will usually be without unless I am in the role of the reviewer.)

I created an idea for this, please support:  Context-awareness for AI Assistent 

For terminology, very, very important: Include such-and-such a field in your request. This is how I can tell AI not to use TB entries with the status “deprecated” or “superseded”. There is an idea for this already, please support:  OpenAI Provider for Trados Studio: option to include term information in system prompt 

A lot happened recently with the AI Assistant (user can modify the system prompt)! Thank you for that!



Removed AI Suggestion
[edited by: Daniel Hug at 10:17 AM (GMT 0) on 13 Dec 2025]
emoji
  • For MT, I remember Globalese (now bought up by MemoQ) offered only “asynchronous” MT – you had to send the whole document to their MT engine, wait some, and get the whole document back (all target segments filled in). I was thinking along these lines when thinking about context-aware MT. It could store the segments in a TM.

    I can do this already, manually: Get the whole document translated by MT (or AI), perform an alignment (works really well with AI), create a TM and populate it with the TUs resulting from the alignment. It's very possible already, just very slow. I could write a script to speed up parts of it, but I think it would be timely for any translation tool to offer as out-of-the-box functionality. Could be done as part of the core product or as an app.

    emoji
  • Just to say that I expressed the very same idea during a talk in Luxembourg in October, recommending it as the way forward. But I have no further ideas on its implementation (other than the slow way that you have outlined).

    emoji
  •    

    Hi, 
    Unfortunately, I can't help you with AI Assistant, and passing the context of the document and termbase to the prompt can be difficult in general.
    However, if you want to export the entire text and terms from the termbase in the document, and then import the AI translation, you can check out the TransAIde plugin. This plugin does just do that.

    https://posteditacat.xyz/en/

    https://www.youtube.com/watch?v=VbW-YH-yaw4&t=6s

    https://appstore.rws.com/Plugin/414


    Dariusz Adamczak (posteditacat.xyz)

    emoji
  • Thank you  ,

    Your solution makes a lot of sense, but (cc: ) I notice there are a lot of attempts at the moment to export content from Trados Studio in order to translate it within context using AI or MT systems, then re-import it into Trados. Michael Beijer's “Supervertaler” is another variation of the theme. I have been tinkering around with the XLIFF export function for the same purpose for almost a year now.

    I think the message is clear: While there is a lot of utility in segmentation still, the time for context has arrived. This is a functionality that CAT systems should provide natively – and will, I am sure. The market will dictate it. The competitive advantage of being able to do so is overwhelming.

    I am currently using Trados to translate with MT (more reliable than AI), export to XLIFF, hand the whole file over to an AI agent and re-import the translations. Then I work in Trados to do the QA steps or send projects off to co-workers. So while Trados is still the hub of my translation tech stack (file conversions/ file types!), the actual translation happens more and more outside of it. I wish it could move back in.

    emoji
  •  

    I agree with you on many points.

    Document-level context vs. segment-level context:
    Once someone has tried using AI for translation with document-level context, they will never go back to segment translation. And the CAT industry must take this into account because the first solutions will reap the biggest rewards. However, this is not a trivial problem. Personally, I also would prefer to use Trados only for importing source files, as a QA tool, and for generating target documents. (https://posteditacat.xyz/beyond-segments-the-critical-role-of-context-in-modern-translation/).
    I think there are many translators who work this way or at least copy text from AI to their CAT tools, because they don't know that there are already tools for this purpose. I only now just found out about  Supervertaler for example.

    XLIFF process:
    My attempts to translate large XLIFF files with AI have not been very successful. XLIFF files are simply too large to be processed efficiently by AI. We could process them segment by segment, but then we wouldn't be able to retain the context of entire documents. Do you have any ideas for this?
    For projects with a lot of exact and fuzzy segments, I develop a compact JSON format containing only the necessary data (source, existing target, segment identification, and status), and it works surprisingly well when updating large projects.

    emoji
  • Let me reveal the nature of my own experiments (with small files - medical guidelines - of about 300 words only).
    STEP1 I upload my source text and a glossary to NotebookLM and ask for a translation that respects the glossary. This yields a text-based translation.
    STEP2 I turn source and target into a tmx-file using LF Aligner.
    STEP3 I start up a project in Studio using the tmx and run through the text, revising where needed.

    My latest experiment shows a few worthwhile improvements with this method as compared with an earlier translation in Studio (so in segmentation mode) of the same text, with Chat GPT as an engine. Two examples of improvements:
    (1) in the earlier translation the title was rendered in a shortened version, which would be okay in the body of the text but not as a title. The NotebookLM translation (text-based) didn't make this error.
    (2) in the earlier translation a feminine French term was referred to with the masculine pronoun "il" in a subsequent sentence; the NotebookLM version correctly chose "elle".

    emoji
  • Hi guys,

    Supervertaler has been undergoing extremely rapid development (thanks to Claude Code). I am even starting to get .sdlppx/.sdlrpx support working reliably!

    See: https://supervertaler.com/changelog


    Indeed, Supervertaler will soon even be signed and notarized on macOS and I currently already have ready-to-run Windows EXE and macOS DMGs in the latest releases on GitHub!   

    https://github.com/michaelbeijer/Supervertaler/releases/tag/v1.9.276

    In the beginning, I spent most of my time exporting bilingual files from memoQ or Trados and then pre-translating them in Supervertaler with AI (while creating the prompt on chatgpt.com), but the CAT tool in Supervertaler has now gotten so good that I prefer to do the actual human translation (post-tweaking, whatever you call it these days) in Supervertaler as well. Claude Code basically allows me to implement new ideas in mere hours that would take a proper team weeks! 

    For example, I always loved the novel terminology display system I first encountered in the RYS plugin for Trados (then called "RyS Termbase & Translation Assembler"; now called "RYSTUDIO Post-editing Package"), so I implemented a version of it in Supertranslator. The whole process was completed in a day!

    See: community.rws.com/.../the-sad-sad-state-of-trados-studio-s-useless-terminology-tools

    Michael

    emoji
  •  

    XLIFF files are simply too large to be processed efficiently by AI

    Not sure if I understand this. Do you send the full XLIFF file? Have you tried stripping the <internal-file> content first ? (it can be huge if the original file had images)?

    emoji
  • Hi guys! Supervertaler now has an "Okapi sidecar" — a lightweight Java microservice that runs quietly in the background and handles monolingual file imports and exports using the industry-standard Okapi Framework file filters.

    See: https://github.com/michaelbeijer/Supervertaler/releases/tag/v1.9.342 + https://supervertaler.com/changelog (see: v1.9.342)

    What is the Okapi Framework?

    The Okapi Framework is the same open-source localisation toolkit used under the hood by various professional translation tools. It contains thoroughly battle-tested file filters for dozens of formats — DOCX, XLSX, PPTX, HTML, XML, IDML, and many more — with proper handling of inline formatting, segmentation, and round-trip fidelity.

    What does "sidecar" mean?

    Since Okapi is written in Java and Supervertaler is a Python/Qt application, they can't talk to each other directly. The sidecar is a small Java process that starts automatically in the background when needed. Supervertaler communicates with it over a local REST API — sending files to be extracted into translatable segments, and sending translations back to be merged into a properly formatted output file. You never have to interact with it; it just works behind the scenes.

    What does this mean in practice?

    The previous system used a fully Python-based DOCX importer, which worked reasonably well but struggled with more complex formatting. The Okapi-powered system produces exported files that are exact replicas of the original in terms of formatting and layout — bold, italic, colored text, heading styles, fonts, lists — everything comes through faithfully. It also brings proper SRX segmentation, better paragraph detection, and semantic inline formatting tags (like <b> for bold) that are visible while you translate.

    The new system can already be tested in the latest Windows builds available via pip. I'm also working on a Windows EXE release and a Mac DMG.

    emoji
  •  

    Just wanted to add to this:

    The Okapi Framework is the same open-source localisation toolkit used under the hood by various professional translation tools. It contains thoroughly battle-tested file filters for dozens of formats — DOCX, XLSX, PPTX, HTML, XML, IDML, and many more — with proper handling of inline formatting, segmentation, and round-trip fidelity.

    Okapi is indeed open-source, well-established, and provides file filters for a wide range of formats.  It's been around since the early 2000s and may well be used as plumbing inside some translation tools.  I think that in the early days of Phrase (memsource at the time) they might have based their solution on Okapi - but I'd be surprised if this is still the case today. Whilst the range of supported formats is impressive, and the framework does handle segmentation (via SRX) and inline codes reasonably well, you might be stretching its capabilities a little :-)

    • "Battle-tested" is doing some heavy lifting.  The filters vary in quality.  Some (like HTML, XLIFF, PO) are likely fairly solid.  Others (like IDML) have known quirks and limitations.  The DOCX filter, for instance, handles the basics well but can stumble on more complex documents with nested content controls, tracked changes, or unusual formatting.
    • "Round-trip fidelity" is aspirational rather than universal.  For simpler files it's probably fine, but edge cases in formats like IDML or heavily styled DOCX can and do break.  Anyone working seriously with localisation file filters knows that perfect round-tripping is the hardest part, and Okapi doesn't magically solve that.
    • "Used under the hood by various professional translation tools" - maybe, but not as widespread as the statement implies.  Many major tools (Trados, memoQ, Across) use their own proprietary filters rather than Okapi's.

    Also, I got your message to my alter-ego, but this is probably better directed to here.  Everything you need to get started working with the APIs can be found with RWS resources.  So a good place to start is here:

    https://developers.rws.com/

    You'll find the SDKs and API documentation for almost every product we have here.  Certainly the Trados portfolio is well covered.  For Trados Studio plugins specifically a good starting point is here:

    https://developers.rws.com/studio-api-docs/articles/gettingstarted/studio_plugin_overview.html

    And for technical questions this forum is a must:

     

     

    And lastly, a great learning resource is the open-source plugins... much of the work the Trados AppStore team has done is open-source and they share it here:

    https://github.com/RWS/Sdl-Community

    Hope that helps?

    Paul Filkin | RWS

    Design your own training!
    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji