Unexpected behaviour of CSV file type in Trados Studio 2024

Hi RWS team,

I am using the CSV filetype in Studio 2024 to translate a comma-seperated file exported by Premiere Pro. When I select the checkbox "Exclude: First line is column headings" in the "Format" tab ot the filetype settings, I can't püreview the file anymore. I get a warning saying "Cannot find file type setting for the preview file". When I deselect that option, I can preview the file as expected. Is this intended bahaviour or a bug?

Here is a screenshot of my file:

Screenshot of a CSV file with columns for 'Start Time', 'End Time', 'Text', 'Video Track', and 'Layer ID'. Rows contain time-coded text data for video editing.

Thanks, Daniel



Generated Image Alt-Text
[edited by: RWS Community AI at 3:17 PM (GMT 1) on 19 Aug 2024]
emoji
  • Hello Daniel,

    I understand you're having trouble with the CSV file type in Trados Studio 2024. The issue you're experiencing seems unusual. When you select the "Exclude: First line is column headings" option, it should not affect your ability to preview the file.

    Here are a few steps you can try to resolve this issue:

    Step 1: Ensure that your Trados Studio 2024 is up to date. You can check for updates within the application.

    Step 2: Try resetting your CSV file type settings to default and then reapply your desired settings.

    Step 3: If the issue persists, you might want to try reinstalling Trados Studio 2024.

    If none of these steps resolve your issue, it might be a bug. In that case, you should report it to the RWS Support Team through the Gateway. They will be able to investigate further and provide a solution.

    Please note that the ability to preview certain file types, like HTML, requires an additional benefit to be added to your account. If you're having trouble previewing other file types, you might need to request this additional benefit.

    I hope this helps! If you have any other questions, feel free to ask.

    Best regards,

    RWS Community AI

    emoji
  •  

    I had a go with a sample file I made up and don't have this problem (see video).  But perhaps you can:

    • provide a sample file?
    • check if you're doing this in a project template (maybe it's an old one?)

    To be honest I don't know why this is causing a problem, but I think we’d need more info to validate it:

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Hi  

    Here is a sample file. Yes, it's in a template, but I believe I built it freshly in Studio 2024. (Is there a way to find out from the template file?)

    Daniel4401.Sample.zip

    emoji

  •  

    I played around a bit more, and when I select "FIles not matching formatting requirements", then it will preview the file even with the "FIrst line is headers" selected.

    Also, notice all the soft line breaks (CR):

    Screenshot of a CSV file in a text editor showing columns like 'StartTime', 'EndTime', 'Text', 'VideoTrack', 'LayerID' with example content and soft line breaks (CR).

    When I remove them, it works as expected, with or without regarding headers.

    Not sure whether a CSV file is allowed to have soft line breaks and Premiere Pro is taking liberties here?

    This is how it looks in the editor:

    Partial view of a text editor with a single line of text that reads 'Don't add the top connectors or caps yet!' with a soft line break symbol.

    Daniel

    emoji


    Generated Image Alt-Text
    [edited by: RWS Community AI at 3:45 PM (GMT 1) on 19 Aug 2024]
  •  

    Thank you.  I actually cannot preview the file at all, with or without the exclude button checked.  I believe the problem is that your file has control characters in it that break the format of the file.  I see you're using EPP so if you "Visualize line breaks" you'll see this:

    Screenshot of a CSV file with columns 'Start Time', 'End Time', 'Text', 'Video Track', 'Layer ID'. Rows contain time codes and text instructions with visible control characters 'OD' and 'OA'.

    I tried to reproduce your file staying consistent with the controls and this actually worked.  You can test this here - Sample_orig_fix_4.zip

    I zipped it to try and avoid any problems in transit.  If you use a compare tool it gets a little more clear:

    Comparison of two CSV files side by side. The left file has control characters 'OD' and 'OA' next to the text. The right file shows the same content without visible control characters.

    The file on the left (Sample_orig.csv - your file) has control characters visible as OD (Carriage Return \r) and OA (Line Feed \n).  These are typical end-of-line characters in text files.  The presence of OD and OA indicates that the multiline text fields contain explicit new line characters, which may cause issues.

    The file on the right (Sample_orig_fix_4.csv) that I edited has these control characters handled differently in that I tried to take a consistent approach with the OA following OD which is a Windows-style line break (\r\n).

    I'm going to guess that the file originated in a MAC or maybe a Linux system and somehow handling the file in a different environment along its life has broken it.  Or maybe it was just created this odd way.

    Anyway... I'm not sure whether this is a bug or the result of a file problem that only an editor dedicated to working with CSV files could be expected to avoid.  What do you think?

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji


    Generated Image Alt-Text
    [edited by: RWS Community AI at 4:58 PM (GMT 1) on 19 Aug 2024]
  • Hi  

    I did quite some testing on this as I expect we see this kind of file more often in the future. PremierePro, the program that exported it, is originally from Mac, although in my case the file got exported from PremierePro running on a PC. Still, that may be why there are Mac-style line breaks. (No idea how Adobe codes their programs, but there may be other ways this may have slipped in: https://stackoverflow.com/questions/3348460/csv-file-written-with-python-has-blank-lines-between-each-row)

    What throws the Trados Studio parser is not the CR as line feed within the fields, but the CRCRLF at the end of each record ("line"). CRLFCRLF produces the same issues.

    CRCRFL and CRLFCRLF basically insert a blank line between records. The official, but not very extensive definition for CSV seems to rule out empty lines when it says: "Within the header and each record, there may be one or more fields, separated by commas. Each line should contain the same number of fields throughout the file." So my reading is that such files may be considered not CSV. The question is now rather why are they accepted when the first line is NOT considered to contain headers?

    This is all a grey area. For the sake of stability, you could ignore blank lines (or have an option for that in the settings) - although one could argue that the "Process files not matching formatting requirements" is that option.

    In the editor, it does matter which kind of line break is used within fields. CRLF is a hard return, and AFAIK should not occur within a segment:

    Screenshot of Trados Studio interface showing a segment with a source text 'Building' followed by a line break and 'your' and a target text 'Gebaeude' followed by a line break and 'dein'. A red arrow points to the line break in the source segment.

    LF alone is what is the classic "soft line break" in the realm of PCs and it best here I believe:

    Screenshot of Trados Studio interface with a segment displaying source text 'Building' followed by a line break and 'your' and target text 'Gebaeude' followed by a line break and 'dein'. No arrows or highlights present.

    Not sure what CR alone does down the line:

    Screenshot of Trados Studio interface with a segment where the source text 'Building' is followed by a line break and 'your' and the target text 'Gebaeude' is followed by a line break and 'dein'. A red arrow indicates the line break in the source segment.

    So I guess this is the glory and the misery of a very open data exchange format.

    Daniel

    emoji


    Generated Image Alt-Text
    [edited by: RWS Community AI at 7:50 PM (GMT 1) on 19 Aug 2024]