How to achieve document-level or project-level context for MT and AI

I am looking for a way to achieve at least document-level context for MT and AI. The current solution is to run an AI platform in parallel and copy/paste from Trados Studio to the AI platform. There, and agent is equipped with the entire document (and reference material, if there is). I then manually paste the response back into Studio. So far, so good, but this is painfully manual and very slow.

I'd like to be able to tell Studio's AI assistant:

“Include the preceding X segments (or paragraphs) in your request. With (or without) translation.”

“Include the succeeding X segments (or paragraphs) in your request. (With or without translation, for the sake of completion, although this will usually be without unless I am in the role of the reviewer.)

I created an idea for this, please support:  Context-awareness for AI Assistent 

For terminology, very, very important: Include such-and-such a field in your request. This is how I can tell AI not to use TB entries with the status “deprecated” or “superseded”. There is an idea for this already, please support:  OpenAI Provider for Trados Studio: option to include term information in system prompt 

A lot happened recently with the AI Assistant (user can modify the system prompt)! Thank you for that!



Removed AI Suggestion
[edited by: Daniel Hug at 10:17 AM (GMT 0) on 13 Dec 2025]
emoji
Parents
  • Just to say that I expressed the very same idea during a talk in Luxembourg in October, recommending it as the way forward. But I have no further ideas on its implementation (other than the slow way that you have outlined).

    emoji
  •    

    Hi, 
    Unfortunately, I can't help you with AI Assistant, and passing the context of the document and termbase to the prompt can be difficult in general.
    However, if you want to export the entire text and terms from the termbase in the document, and then import the AI translation, you can check out the TransAIde plugin. This plugin does just do that.

    https://posteditacat.xyz/en/

    https://www.youtube.com/watch?v=VbW-YH-yaw4&t=6s

    https://appstore.rws.com/Plugin/414


    Dariusz Adamczak (posteditacat.xyz)

    emoji
  • Thank you  ,

    Your solution makes a lot of sense, but (cc: ) I notice there are a lot of attempts at the moment to export content from Trados Studio in order to translate it within context using AI or MT systems, then re-import it into Trados. Michael Beijer's “Supervertaler” is another variation of the theme. I have been tinkering around with the XLIFF export function for the same purpose for almost a year now.

    I think the message is clear: While there is a lot of utility in segmentation still, the time for context has arrived. This is a functionality that CAT systems should provide natively – and will, I am sure. The market will dictate it. The competitive advantage of being able to do so is overwhelming.

    I am currently using Trados to translate with MT (more reliable than AI), export to XLIFF, hand the whole file over to an AI agent and re-import the translations. Then I work in Trados to do the QA steps or send projects off to co-workers. So while Trados is still the hub of my translation tech stack (file conversions/ file types!), the actual translation happens more and more outside of it. I wish it could move back in.

    emoji
  •  

    I agree with you on many points.

    Document-level context vs. segment-level context:
    Once someone has tried using AI for translation with document-level context, they will never go back to segment translation. And the CAT industry must take this into account because the first solutions will reap the biggest rewards. However, this is not a trivial problem. Personally, I also would prefer to use Trados only for importing source files, as a QA tool, and for generating target documents. (https://posteditacat.xyz/beyond-segments-the-critical-role-of-context-in-modern-translation/).
    I think there are many translators who work this way or at least copy text from AI to their CAT tools, because they don't know that there are already tools for this purpose. I only now just found out about  Supervertaler for example.

    XLIFF process:
    My attempts to translate large XLIFF files with AI have not been very successful. XLIFF files are simply too large to be processed efficiently by AI. We could process them segment by segment, but then we wouldn't be able to retain the context of entire documents. Do you have any ideas for this?
    For projects with a lot of exact and fuzzy segments, I develop a compact JSON format containing only the necessary data (source, existing target, segment identification, and status), and it works surprisingly well when updating large projects.

    emoji
  • Let me reveal the nature of my own experiments (with small files - medical guidelines - of about 300 words only).
    STEP1 I upload my source text and a glossary to NotebookLM and ask for a translation that respects the glossary. This yields a text-based translation.
    STEP2 I turn source and target into a tmx-file using LF Aligner.
    STEP3 I start up a project in Studio using the tmx and run through the text, revising where needed.

    My latest experiment shows a few worthwhile improvements with this method as compared with an earlier translation in Studio (so in segmentation mode) of the same text, with Chat GPT as an engine. Two examples of improvements:
    (1) in the earlier translation the title was rendered in a shortened version, which would be okay in the body of the text but not as a title. The NotebookLM translation (text-based) didn't make this error.
    (2) in the earlier translation a feminine French term was referred to with the masculine pronoun "il" in a subsequent sentence; the NotebookLM version correctly chose "elle".

    emoji
  • Hi guys,

    Supervertaler has been undergoing extremely rapid development (thanks to Claude Code). I am even starting to get .sdlppx/.sdlrpx support working reliably!

    See: https://supervertaler.com/changelog


    Indeed, Supervertaler will soon even be signed and notarized on macOS and I currently already have ready-to-run Windows EXE and macOS DMGs in the latest releases on GitHub!   

    https://github.com/michaelbeijer/Supervertaler/releases/tag/v1.9.276

    In the beginning, I spent most of my time exporting bilingual files from memoQ or Trados and then pre-translating them in Supervertaler with AI (while creating the prompt on chatgpt.com), but the CAT tool in Supervertaler has now gotten so good that I prefer to do the actual human translation (post-tweaking, whatever you call it these days) in Supervertaler as well. Claude Code basically allows me to implement new ideas in mere hours that would take a proper team weeks! 

    For example, I always loved the novel terminology display system I first encountered in the RYS plugin for Trados (then called "RyS Termbase & Translation Assembler"; now called "RYSTUDIO Post-editing Package"), so I implemented a version of it in Supertranslator. The whole process was completed in a day!

    See: community.rws.com/.../the-sad-sad-state-of-trados-studio-s-useless-terminology-tools

    Michael

    emoji
  •  

    XLIFF files are simply too large to be processed efficiently by AI

    Not sure if I understand this. Do you send the full XLIFF file? Have you tried stripping the <internal-file> content first ? (it can be huge if the original file had images)?

    emoji
  • Hi guys! Supervertaler now has an "Okapi sidecar" — a lightweight Java microservice that runs quietly in the background and handles monolingual file imports and exports using the industry-standard Okapi Framework file filters.

    See: https://github.com/michaelbeijer/Supervertaler/releases/tag/v1.9.342 + https://supervertaler.com/changelog (see: v1.9.342)

    What is the Okapi Framework?

    The Okapi Framework is the same open-source localisation toolkit used under the hood by various professional translation tools. It contains thoroughly battle-tested file filters for dozens of formats — DOCX, XLSX, PPTX, HTML, XML, IDML, and many more — with proper handling of inline formatting, segmentation, and round-trip fidelity.

    What does "sidecar" mean?

    Since Okapi is written in Java and Supervertaler is a Python/Qt application, they can't talk to each other directly. The sidecar is a small Java process that starts automatically in the background when needed. Supervertaler communicates with it over a local REST API — sending files to be extracted into translatable segments, and sending translations back to be merged into a properly formatted output file. You never have to interact with it; it just works behind the scenes.

    What does this mean in practice?

    The previous system used a fully Python-based DOCX importer, which worked reasonably well but struggled with more complex formatting. The Okapi-powered system produces exported files that are exact replicas of the original in terms of formatting and layout — bold, italic, colored text, heading styles, fonts, lists — everything comes through faithfully. It also brings proper SRX segmentation, better paragraph detection, and semantic inline formatting tags (like <b> for bold) that are visible while you translate.

    The new system can already be tested in the latest Windows builds available via pip. I'm also working on a Windows EXE release and a Mac DMG.

    emoji
  •  

    Just wanted to add to this:

    The Okapi Framework is the same open-source localisation toolkit used under the hood by various professional translation tools. It contains thoroughly battle-tested file filters for dozens of formats — DOCX, XLSX, PPTX, HTML, XML, IDML, and many more — with proper handling of inline formatting, segmentation, and round-trip fidelity.

    Okapi is indeed open-source, well-established, and provides file filters for a wide range of formats.  It's been around since the early 2000s and may well be used as plumbing inside some translation tools.  I think that in the early days of Phrase (memsource at the time) they might have based their solution on Okapi - but I'd be surprised if this is still the case today. Whilst the range of supported formats is impressive, and the framework does handle segmentation (via SRX) and inline codes reasonably well, you might be stretching its capabilities a little :-)

    • "Battle-tested" is doing some heavy lifting.  The filters vary in quality.  Some (like HTML, XLIFF, PO) are likely fairly solid.  Others (like IDML) have known quirks and limitations.  The DOCX filter, for instance, handles the basics well but can stumble on more complex documents with nested content controls, tracked changes, or unusual formatting.
    • "Round-trip fidelity" is aspirational rather than universal.  For simpler files it's probably fine, but edge cases in formats like IDML or heavily styled DOCX can and do break.  Anyone working seriously with localisation file filters knows that perfect round-tripping is the hardest part, and Okapi doesn't magically solve that.
    • "Used under the hood by various professional translation tools" - maybe, but not as widespread as the statement implies.  Many major tools (Trados, memoQ, Across) use their own proprietary filters rather than Okapi's.

    Also, I got your message to my alter-ego, but this is probably better directed to here.  Everything you need to get started working with the APIs can be found with RWS resources.  So a good place to start is here:

    https://developers.rws.com/

    You'll find the SDKs and API documentation for almost every product we have here.  Certainly the Trados portfolio is well covered.  For Trados Studio plugins specifically a good starting point is here:

    https://developers.rws.com/studio-api-docs/articles/gettingstarted/studio_plugin_overview.html

    And for technical questions this forum is a must:

     

     

    And lastly, a great learning resource is the open-source plugins... much of the work the Trados AppStore team has done is open-source and they share it here:

    https://github.com/RWS/Sdl-Community

    Hope that helps?

    Paul Filkin | RWS

    Design your own training!
    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Hi Paul,

    Thanks for all the great information!

    As you might have guessed, Claude wrote most of the blurb for the new Okapi sidecar feature. Claude's first draft was even more enthusiastic, and I had to ask it to tone it down a bit. I think I will definitely need to temper its enthusiasm even more.

    In terms of how I use Supervertaler myself, unless the job involves a very simple Word docx, I almost always run the project through Trados or memoQ first, do the actual translation in Supervertaler (via SDXLIFF or memoQ bilingual docx), and then export back to Trados/memoQ to generate the final product.

    Ideally, I would love for Supervertaler to handle all kinds of complicated Word documents and other formats, but I know that probably won't happen. I don't have the knowledge, budget, or time to work with all the different file types that Trados and memoQ can handle.

    However, Supervertaler already offers things that Trados and MemoQ don't. One of these is to allow the AI to see the document as a whole, even images, and I find that if properly set up, I get a much better translation in Supervertaler than in Trados/memoQ.

    Another area where I vastly prefer Supervertaler is its terminology handling, which is as comfortable and powerful as CafeTran, or even memoQ, if memoQ weren't so damn slow these days.

    My love-hate relationship with Trados continues. Currently, one of my favourite setups is to do the actual translation in Supervertaler but start and end the project in Trados, mainly because the grid in Trados is so much faster than memoQ.

    Regarding my idea to launch an app on the RWS App Store, I have had several ideas.

    One of these is to make it easier for Trados users to quickly open their project in Supervertaler in order to translate it there and then send it back to Trados. Not that it's hard at the moment, since Supervertaler can open multiple SDXLIFF files in a single project and also handle Trados Studio packages.

    Another idea I had for an RWS app was to build something similar to the "RYSTUDIO Post-editing Package" (appstore.rws.com/.../135), which, in my view, is a vastly superior way to handle terminology in Trados or any CAT tool, for that matter. The interesting way it displays terms in the same format as the actual segment is so useful. This is why I built the same sort of thing in Supervertaler. Original versions of Supervertaler had a more memoq-based layout. However, even though I love using the "RYSTUDIO Post-editing Package" in Trados, I can hardly get it to work these days. It is always crashing, and I am always emailing its developer, who often takes weeks to respond, and even then I often can't get it to work properly.

    ---

    Anyway, thanks again for your detailed response! I will look at all the links you mentioned.

    I find myself in the rather odd position of having developed a fairly complicated CAT tool without any actual coding experience. Claude Code has allowed me to effectively manage a software development project, without actually understanding how the underlying code works. Also, since I am usually supposed to translate to make some actual money, I don't have much time to learn the nuts and bolts of programming.

    Michael

    emoji
  •   

    I find myself in the rather odd position of having developed a fairly complicated CAT tool without any actual coding experience. Claude Code has allowed me to effectively manage a software development project, without actually understanding how the underlying code works.

    You're surely not the only one!  I find I can build an application, or a Python or Powershell script, to solve a problem faster than I could set up a spreadsheet!  Once you can understand the basics of Visual Studio, Visual Studio Code, or even Android Studio (I built two apps to do things my wife wanted on her Pixel phone), using AI as just another tool to solve a problem is almost childs play.  One of our developers used to say to me that coding is like lego for adults... today it really is!

    Doing this for your own purposes is fantastic and fun.  But it's not quite the same for production tools sold to thousands of people that deliver massive amounts of capabilities, all built by teams of developers working together.  Worth remembering that, and especially remember that these teams did all this before they had AI in the first place!  Every day I marvel at how clever they must be!!

    Paul Filkin | RWS

    Design your own training!
    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
Reply
  •   

    I find myself in the rather odd position of having developed a fairly complicated CAT tool without any actual coding experience. Claude Code has allowed me to effectively manage a software development project, without actually understanding how the underlying code works.

    You're surely not the only one!  I find I can build an application, or a Python or Powershell script, to solve a problem faster than I could set up a spreadsheet!  Once you can understand the basics of Visual Studio, Visual Studio Code, or even Android Studio (I built two apps to do things my wife wanted on her Pixel phone), using AI as just another tool to solve a problem is almost childs play.  One of our developers used to say to me that coding is like lego for adults... today it really is!

    Doing this for your own purposes is fantastic and fun.  But it's not quite the same for production tools sold to thousands of people that deliver massive amounts of capabilities, all built by teams of developers working together.  Worth remembering that, and especially remember that these teams did all this before they had AI in the first place!  Every day I marvel at how clever they must be!!

    Paul Filkin | RWS

    Design your own training!
    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
Children
No Data