How to handle a review of an Excel file with embedded content

Hi

Can anyone give me some advice on the best way to handle the following scenario?

I have received a bunch of Excel files with two columns each, one containing source and the other the target. The assignment is to review the already translated segments and to translate the not yet translated ones.

This alone would be easy, the best way to handle this would be, I guess, to use the "Bilingual Excel" file type.

But in my case, there is an added difficulty: the text in the Excel document contains HTML-type tags such as "&bold;" and "&nobold;". These are not syntactically correct HTML tags, but that does not matter. I can easily handle those (there are only a few of them) using the "Embedded content" feature of the Excel file type, by defining the tag pair. My problem is that if I use the Excel file type, I can handle these tags, but then I don't have source and target, but only source and I can't see which segments are already translated.

On the other hand, if I use the "Bilingual Excel" file type, I do get source and target, but I don't have the capability to handle the tags, because that file type does not have an "Embedded content" functionality.

Any hint on how to best tackle this project is welcome.

Walter

  • Hello Walter,

    Maybe something like this might work:

    1. Take a copy of the spreadsheet
    2. Sort on the target column
    3. Delete all rows that have source but no target so you are left with only rows containing both source and target
    4. Split them into two files, one for source and one for target
    5. Align them
    6. Now take a copy of the original Excel file and copy the source into the target column.
    7. Hide the source column and translate with your custom filetype using your TM to get the stuff already done

    A bit longwinded but I think it would work.  Didn't test this theory yet though ;-)

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Hi Paul

    Thanks for the creative suggestion. I also thought about creating a TM as a first step and I could even do this easily without alignment using the bilingual Excel filetype, but this does not work because of the "bold/nobold" tags because then I would have those as plain text in the TM.
    I guess I'll have the same problem if I do the alignment the way you suggest or is there a way to tag these "bold/nobold" entities before doing the alignment?
    Or does the new alignment process take into account filetype settings, which would allow me to do this tagging?

    Let me also point out that it is not one Excel file, but 79 of them in a folder.

    Walter

  • Unknown said:
    Or does the new alignment process take into account filetype settings, which would allow me to do this tagging?

    I haven't had time to test this yet, but I'm assuming I would be able to do that.  So I would create a dummy project with the custom excel filetype and then make sure that was active prior to aligning.  It's not the most manipulative of solutions for alignment if you want specific filetype settings so I'm hoping this would work.

    I think it's supposed to work with the default template, so you could customise that, but in previous versions I found it was using the active project and not the default template... things do change ;-)

    If you try it before me let me know... if it doesn't do either of these things I'll report a bug.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Hi Paul

    I may test this.

    In the meantime, I tested a different approach using the bilingual Excel filetype because this allows me to very easily and quickly create a TM from all the segments. So I loaded all the 79 Excel files into the project (as Bilingual Excel), then ran the SDLXLIFF converter and replaced all occurrences of "&bold;" by "$B " and  "&nobold;" by " $N" (this is obviously not as comfortable as a protected tags, but at least it should give better matching than the full plain text version). After this replace, I ran the batch task "Update main TM", setting it to also store untranslated segments (this is a special temporary TM).

    Doing this, I discovered something really strange that reminds me of this recent post on the community (I think it was yesterday or the day before) of someone who asked about keeping spaces at the end of a segment. What  happens in my case is the following:

    - All segments in my document start with a space followed by an apostrophe, both in source and target.

    - As soon as I put my cursor in a target segment and Studio does a lookup and finds a match, it inserts the text from the TM and the leading space disappears.

    This is how it looks:

    You can see that the leading space in segment 5 has disappeared.

    The reason is that the "Update main TM" task dropped all leading spaces from the segments it stored in the TM. 
    Any idea whether this is intended behaviour and whether there is a way to control this?

    I agree that it is unusual to start a sentence with a space, but in this special case, it is quite annoying (although I can correct this with a global S+R).

    Walter