Creating a TM from a Single Bilingual File

Hi all! I'm new here. So stupid question time!

I need to translate a massive file. But I'm in luck! It's version 2.0, and I have the following: a txt file for the 1.5 ST, a txt file for the 1.5 TT, and an bilingual HTML file that has the entire thing with the English on top and Japanese right underneath.

I tried to align the TXT files, but it's... just too big. It's around 20,000 segments, things go off in the middle, several parts aren't parsing right... it would take me two weeks to align it manually.

But wait, I already have an "aligned" version! The bilingual HTML file! It's already separated line by line. Is there any way to leverage this into a TM? I found a question like this in the archives, but it was around five years old and the answer seemed to be a big "nope". I've got my fingers crossed that there's a workaround here... otherwise I will just end up having to CTRL+F every single line of this huge thing from scratch.

emoji
Parents
  • Can you provide a small sample of the html file?  Just the header of the file and a couple of lines showing how the source and target is laid out in the file.  I'm imagining a workaround using the Multilingual XML filetype from the appstore but if your file is html it'll be helpful to see what this looks like so we can suggest a small change to support it being processed by this filetype.

    https://appstore.sdl.com/language/app/multilingual-xml-filetype/1261/

    Another possibility would be to use a different solution to align your txt files.  So it would also be interesting to see a small sample of what this txt file looks like. Just because it's txt doesn't necessarily mean it only contains the text.  If it does then using something like LF Aligner could be useful as this is light and pretty good with txt files.

    https://sourceforge.net/projects/aligner/

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • I can sure try! Let's see... the header of the file... I think I took one class on HTML about seven years ago, and I've forgotten everything. Is this it?

    h1 {font-size: 200%; font-weight: bold; margin: 0pt; text-align:center;
    border: 5pt outset #08f; padding: 3pt}
    h2 {font-size: 150%; font-weight: bold; text-decoration: none; margin: 20pt 10% 7pt;
    text-align: center; border: 2pt outset #080; padding: 3pt;}
    h3 {font-size: 140%; font-weight: bold; text-decoration: none; margin: 15pt 0pt 0pt;}
    .gloss h3 {font-size: 120%; font-weight: normal; text-decoration: none; margin: 7pt 5pt 2pt 0pt;}
    h4 {font-size: 120%; font-weight: normal; text-decoration: none; margin: 7pt 5pt 2pt 5pt;}

    Screenshot of Trados Studio showing a list of headings and subheadings in both Japanese and English, with numbers and links preceding the text.

    <p><a class="r" href="#r0">0.</a> はじめに</p>
    <p class="eng">Introduction</p>
    <p><a class="r" href="#r1">1.</a> ゲームの考え方</p>
    <div class="subsec">
    <p class="eng">1. Game Concepts</p>
    <p><a class="r" href="#r100">100.</a> 原則</p>
    <p class="eng">100. General</p>
    <p><a class="r" href="#r101">101.</a> マジックの黄金律</p>

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 6:12 AM (GMT 0) on 29 Feb 2024]
  • A small snippet of the file would have been more helpful.  You said it was an HTML file but this looks like something that has been copy pasted out of a word file, or something you just made up?

    If you insert it as code in here:

    Screenshot of a forum post by Benjamin with a red arrow pointing to the 'Code' option in the text editor menu.

    Then we can do something with it.  On the basis of this bit:

    <p><a class="r" href="#r0">0.</a> はじめに</p>
    <p class="eng">Introduction</p>
    <p><a class="r" href="#r1">1.</a> ゲームの考え方</p>
    <div class="subsec">
    <p class="eng">1. Game Concepts</p>
    <p><a class="r" href="#r100">100.</a> 原則</p>
    <p class="eng">100. General</p>
    <p><a class="r" href="#r101">101.</a> マジックの黄金律</p>

    I'd say it might be possible.  But I can't give a confident answer without seeing how this file is structured from start to finish (but only containing your small sample of the translatable parts).  Like this perhaps?

    <!DOCTYPE html>
    <html>
    <body>
    <p><a class="r" href="#r0">0.</a> はじめに</p>
    <p class="eng">Introduction</p>
    <p><a class="r" href="#r1">1.</a> ゲームの考え方</p>
    <div class="subsec">
    <p class="eng">1. Game Concepts</p>
    <p><a class="r" href="#r100">100.</a> 原則</p>
    <p class="eng">100. General</p>
    <p><a class="r" href="#r101">101.</a> マジックの黄金律</p>
    </body>
    </html>

    But don't make it up... tell us exactly what you have in the file you said is html.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub



    Generated Image Alt-Text
    [edited by: Trados AI at 6:12 AM (GMT 0) on 29 Feb 2024]
Reply
  • A small snippet of the file would have been more helpful.  You said it was an HTML file but this looks like something that has been copy pasted out of a word file, or something you just made up?

    If you insert it as code in here:

    Screenshot of a forum post by Benjamin with a red arrow pointing to the 'Code' option in the text editor menu.

    Then we can do something with it.  On the basis of this bit:

    <p><a class="r" href="#r0">0.</a> はじめに</p>
    <p class="eng">Introduction</p>
    <p><a class="r" href="#r1">1.</a> ゲームの考え方</p>
    <div class="subsec">
    <p class="eng">1. Game Concepts</p>
    <p><a class="r" href="#r100">100.</a> 原則</p>
    <p class="eng">100. General</p>
    <p><a class="r" href="#r101">101.</a> マジックの黄金律</p>

    I'd say it might be possible.  But I can't give a confident answer without seeing how this file is structured from start to finish (but only containing your small sample of the translatable parts).  Like this perhaps?

    <!DOCTYPE html>
    <html>
    <body>
    <p><a class="r" href="#r0">0.</a> はじめに</p>
    <p class="eng">Introduction</p>
    <p><a class="r" href="#r1">1.</a> ゲームの考え方</p>
    <div class="subsec">
    <p class="eng">1. Game Concepts</p>
    <p><a class="r" href="#r100">100.</a> 原則</p>
    <p class="eng">100. General</p>
    <p><a class="r" href="#r101">101.</a> マジックの黄金律</p>
    </body>
    </html>

    But don't make it up... tell us exactly what you have in the file you said is html.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub



    Generated Image Alt-Text
    [edited by: Trados AI at 6:12 AM (GMT 0) on 29 Feb 2024]
Children
  • It was indeed copy/pasted, but out of "view source" from the HTML file I have. I don't even know enough to try to make something like this up, sadly. <^^;

    But thanks for showing me that button! That's what I needed. Let me try this again.

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">
    <html lang="ja">
    <head>
    	<link rev="MADE" href="mailto:pao@f-o-rainbow.com">
    	<link rel="contents" href="../index.html">
    	<meta http-equiv="Content-Style-Type" content="text/css">
    	<meta name="Author" content="Pao">
    	<meta http-equiv="Content-Type" content="text/html; charset=shift-jis">
    	<title>マジック総合ルール(和訳 20220218.0 版)</title>
    	<style type="text/css"><!--
    body{color: #000; background: #fff; margin: 2pt 5pt;}
    h1 {font-size: 200%; font-weight: bold; margin: 0pt; text-align:center;
    	border: 5pt outset #08f; padding: 3pt}
    h2 {font-size: 150%; font-weight: bold; text-decoration: none; margin: 20pt 10% 7pt;
    	text-align: center; border: 2pt outset #080; padding: 3pt;}
    h3 {font-size: 140%; font-weight: bold; text-decoration: none; margin: 15pt 0pt 0pt;}
    .gloss h3 {font-size: 120%; font-weight: normal; text-decoration: none; margin: 7pt 5pt 2pt 0pt;}
    h4 {font-size: 120%; font-weight: normal; text-decoration: none; margin: 7pt 5pt 2pt 5pt;}
    p{margin: 3pt 15pt}
    .subsec{margin: 0pt 10pt}
    .gloss{margin: 5pt 10pt; border: 2pt outset; background: #ffd; padding: 3pt}
    a[id]{padding-top: 30pt; margin-top: -30pt;}
    .office{margin: 5pt 10pt; border: 2pt outset; background: #fdf; padding: 3pt}
    .comment{position:relative; top: -3ex; float: right;}
    a.g{color:#080; text-decoration: none; font-weight: bold;}
    a.r{color:#800; text-decoration: none; font-weight: bold;}
    a.cardlink{color:#008; text-decoration: none}
    --></style>
    <script type="text/javascript" src="./gatherlinkl.js"></script>
    </head>
    <body>
    <h1>マジック総合ルール(和訳 20220218.0 版)</h1>
    <p> このルールに関して、英語を正文とし、ウィザーズ・オブ・ザ・コースト社が著作権を持ちます。</p>
    <p> 英文は <a class="c" href="http://magic.wizards.com/en/gameinfo/gameplay/formats/comprehensiverules">http://magic.wizards.com/en/gameinfo/gameplay/formats/comprehensiverules</a> にあります。</p>
    <p>(このファイルの最後にも詳細は記述してあります)</p>
    <p></p>
    <p> このファイルは、マジック:ザ・ギャザリングのComprehensive Rulesを、Japan Netrep <a href="mailto:pao@f-o-rainbow.com">*ぱお*/米村 薫</a> とジャッジ・コミュニティが 2022-02-19 に翻訳したものです。</p>
    <p></p>
    <p> この翻訳に関して疑問等ありましたら、<a href="mailto:pao@f-o-rainbow.com">*ぱお*/米村 薫</a> までメールでご連絡ください。</p>
    <h2>もくじ</h2>
    <p><a class="r" href="#r0">0.</a> はじめに</p>
    <p class="eng">Introduction</p>
    <p><a class="r" href="#r1">1.</a> ゲームの考え方</p>
    <div class="subsec">
    <p class="eng">1. Game Concepts</p>
    <p><a class="r" href="#r100">100.</a> 原則</p>
    <p class="eng">100. General</p>

    Hm... it definitely doesn't look like it does on the page I have. I might still be copying the wrong part, or not enough? ><;

    emoji
  • I see Kelly had an idea using regular expressions(although he left plenty of space for you to fill the gaps ;-)).  That might be your best approach in reality as there isn't enough structure to the file to be able to use the Multilingual XML filetype... although I might have a play when I get a chance.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji