Introduction
Post-Edit Compare is a tool designed to report translation modifications during the post-edit phases of a translation workflow.
The tool works by comparing two versions of the same SDLXLIFF file (before and after changes were applied); the files with changes are highlighted and all modifications are included in a detailed comparison report.
The comparison report is formatted in a way that simplifies the understanding of what changes were applied from one version of the file to the other, along with a full break down of the modifications and related cost analysis.
Post-Edit Versions is fully integrated with SDL Trados Studio 2014, represented as a 'view' from the application interface. It includes functionality that allows users to create and maintain versions of the SDL project that can be later passed to Post-Edit Compare to generate the comparison reports.
Key features
- Identify quickly and keep track of the modified files by comparing the (before & after) folder directories
- Create comparison reports that identify clearly the changes applied to the translations (and segment properties) of the files in a translation workflow.
- Report an overview and full break down of the cost analysis related to the translation modifications.
- Customize how the differences are highlighted in the comparison reports
- Provides full support for SDLXLIFF files through the integration of the SDL Trados Studio APIs
- The plugin 'Post-Edit Versions' is fully integrated with SDL Trados Studio 2014 which adds a level of project version control
- Maintain comparison reports as a permanent record of the changes applied during each stage of the translation workflow.
Technical Requirements
The following is a list of the minimum technical requirements
- SDL Trados Studio 2009/2011/2014 CU2
- Windows XP, Vista, 7, 8
- Microsoft .NET Framework version 3.5 SP1+
Support
Post-Edit Compare was originally developed by Patrick Hartnett but today it's maintained by the SDL AppStore Team. All your support questions or your feedback/suggestions to better understand how we can improve the next releases should be posted in the AppStore Forum.
Related Articles
A very interesting article (Solving the Post Edit puzzle) written by Paul Filkin of SDL explains in some detail the considerations to take into account when analyzing Post-Edit modifications and the benefits of using Post-Edit Compare to solve/automate this.
Getting Started
Post-Edit Compare is a tool designed to report translation modifications during the post-edit phases of a translation workflow.
The tool works by comparing two versions of the same SDLXLIFF file (before and after changes were applied); the files with changes are highlighted and all modificaitons are included in a comparison report.
Follow these simple steps to compare two folder structures and display the results in the Folder Viewer, then create a Comparison Report from the selected files.
- Choose the folders to compare by selecting 'Browse folder' or simply drag and drop the folders directly on to the viewer from windows explorer.
- Select / activate a filter (i.e. *.sdlxliff) to limit the list of files presented in the viewer
- Select 'Compare Folders' to start the folder comparison.
- Choose the files you want included in the report and select 'Create Comparison Report'
Folder Viewer
Folder Viewer
The 'Folder Viewer' window allows you to browse and compare two folder structures, typically containing the files before and after they were modified. The folders are aligned side by side and displayed similiarly to what you would expect to see when you browse for files in 'Windows Explorer'; it allows you to perform some of the basic file management operations and include filters to limit the folder/files presented in the viewer for comparison.
Understanding the Folder Viewer
The 'Folder Viewer' window allows you to browse and compare two folder structures, typically containing the files before and after they were modified. The folders are aligned side by side and displayed similiarly to what you would expect to see when you browse for files in 'Windows Explorer'
The folders, sub-folders along with the files are displayed in rows. The files and sub-folders with the same name (relative to the Base Folder) are automatically aligned and share the same row; files that do not have a corresponding matching file are treated as Orphan Files.
The folders are color coded to provide information regarding their contents. In the following example, you can see that orphan files exist in both folders (and/or sub-folders) but files that are newer exist only on the right side.
Note: The contents of the sub-folders can be expanded or collapsed by right clicking on the folder and selecting the command Expand all subfolders or Collapse all subfolders
(make reference to Viewer Actions for more information)
The files are color coded to provide information regarding their comparison status and file states. The Comparison Status indicates if the file is (equal, different or an orphan), whereas the File State further distinguishes the type of difference (file size and/or modified date is older or newer). In the following example, you can see the different uses of both the Comparison and File states.
(make reference to Viewer Styles for more information.)
Important: You should not see the Folder Viewer as a substitute to Windows Explorer (or to your default Folder/File Explorer).
Comparing the base folders (left vs right), the tool will search, compare and attempt to align all folders, sub-folders & files, so it could take several seconds depending on the size/contents of the folder structures that you are comparing (i.e. even though it is possible - it would not be recommended to compare your entire C: drive, in doing so, you could certainly entertain the idea of taking a coffee break before the results of the comparison is complete)
Viewer Actions
The 'Folder Viewer' allows you to perform some of the basic file management operations that you would normally expect from Windows Explorer (i.e. CopyImage:FilesCopy.png, Move , Delete files) along with the commands that are directly linked to the viewer (i.e. Expand All Image:ActionsFolderView_ExpandAll.png, Browse Folder etc..)
The following is a list of the commands available from the 'Actions' and 'Edit' menu
Compare Folders
Starts the comparison process, comparing the selected base folders
Create Comparison Report
Launches the wizard for creating a comparison report
Copy Files
Copies the selected files to the target folder keeping the relative folder structure
Copy to Folder
Copies the selected files to the target folder specified by the user; allowing to keep or ignore the relative folder structure
Move Files
Moves the selected files to the target folder keeping the relative folder structure
Delete Files
Deletes the selected files from the target folders; confirmation dialog is activated before deletion
Rename File
Renames the selected file
Expand All
Expands all folders in the viewer
Collapse All
Collapses all folders in the viewer
Expand Current Folder
Expands all folders under the currently selected folder
Collapse Current Folder
Collapses all folders under the currently selected folder
Base Folder
Includes all folders & files directly under the base folder in the comparison process
Load Folder
Loads the updated folder path and runs the folder comparison (activated when the base folder path information changes)
Browse Folder
Browsea / selects the base folder to compare
Set Base Folder
Browses / selects a sub-folder within the viewer browser control; the selected folder will become the base folder for comparison
Up One Level
Sets the base folder up one level
Both Folders Up One Level
Sets the base folders from both left and rigth up one level
Viewer Styles
Folder State
The folders are color coded to provide information regarding their contents. The usefulness of color coding at the folder level is more relevant when comparing larger folder structures with multiple sub-folders as it becomes visibly clear from the folder hierarchy if files with differences (and/or orphan files ) exist in the lower sub-folder levels without necessarily needing to expand and check each one independently.
Equal files
The current folder and all sub-folders contain files that are equal
Older files
The current folder and/or sub-folders contain modified files that are older
Older files with orphans
The current folder and/or sub-folders contain modified files that are older and/or orphan files
Newer files
The current folder and/or sub-folders contain modified files that are newer
Newer files with orphans
The current folder and/or sub-folders contain modified files that are newer and orphan files
Contains orphan files
The current folder and/or sub-folders contain orphan files
File State
The files are color coded to provide information regarding their comparison and file states. The File State highlights the type of difference (file size and/or modified date is older or newer).
Equal files
Files that are equal (also used to hint that the file has an older date when files are different)
Newer files - missmatches
The file size is different and the modified date is newer
Newer files - date
The file size is identical but the modified date has changed
Orphan files
Orphan files that have no opposite matching file
Note: The file name & properties also inherit the File State color coding to ensure that it is easier to distinguish the difference in the viewer, as in the following example:
Comparison Status
Equal files
Files that are equal
Mismatches
Files that are different
Mismatches - date
Files whose modified date is different (file size is identical)
Display & File Filters
To limit the amount of information in the viewer you can choose from the list of Display Filters or add a customized file Filter Image:Filter.png The Display Filters allow you to control what files are displayed given their File State (i.e. EqualImage:FilesEqual.png, NewerImage:FilesModified.png, OrphanImage:FilesOrphen.png)
Example: if you don't want to view orphan files from the panel on the left side, then simply selecting the option (Hide Orphan files - left sideImage:FilesFilterOrphenLeft.png) from the Display Filters toolbar would accomplish this.
Display Filters
Files that are equal
Show / Hide files that are equal
Orphan files (left side)
Show / Hide the files that exist only on the left side
Orphan files (right side)
Show / Hide the files that exist only on the right side
Mismatches (left side newer)
Show / Hide files that were modified with a date that is newer on the left side
Mismatches (right side newer)
Show / Hide files that were modified with a date that is newer on the right side
Hide empty folders
Show / Hide empty folders
File Filter
The File Filter is fully compatible with .NET Regular Expression rules so you can use the same wildchars and expressions (i.e. *.sdlxliff) that you would normally specify with other explorers.
You can assign/save a list of filters from the SettingsImage:Settings.pngarea and then choose from that pre-saved list in the Folder Viewer. The file Filter can be activated/deactivated by clicking on the the FilterImage:Filter.png toggle button.
The File Filter provides a variety of options that allow you to choose which method is best suited for filtering the files required in the viewer (i.e. File NameImage:Filters_byFile.png, DateImage:Filters_bydate.png, File Attributes)
It is possible to specify simple file filters like (*.sdlxliff; *.txt; *.log) etc... as you would with Windows Explorer to return a list of files based on the file extension, however keep in mind that the File Filter offers the possiblity to return a more precise list depending on the requirments.
Note: If you are not familiar with Regular Expression for .net then I would recommend you google search a few examples from the internet (search for 'Regular Expression Language - Quick Reference'); it is not so difficult to understand the expression language and it almost seems like fun after you successfully create your first Regular Expression (that works!);
Important: To match a literal period character (. or \u002E), you must precede it with the escape character (\.); this is also true for all other expression language characters (i.e. *()[] etc...), simply precede them with a backslash \ if you are matching the literal character.
Examples
Expression
Description
(.*\.sdlxliff|.*\.txt)$
Matches all files with the extensions .sdlxliff & .txt
Note: you could also use the windows explorer type expression to produce the same result: *.sdlxliff; *.txt;
The character $ symbolizes the end of a line
^
The match must start at the beginning of the string or line.
$
The match must occur at the end of the string or before \n at the end of the line or string.
*
Matches the previous element zero or more times.
+
Matches the previous element one or more times.
?
Matches the previous element zero or one time.
.
Wildcard: Matches any single character except \n.
To match a literal period character (. or \u002E), you must precede it with the escape character (\.).
[az]
Matches one of the characters within the group (i.e. equal to a or z)
[a-z]
Matches any character within a range, from a to z.
[a-z0-9]
Matches any character within the range of a to z and 0 to 9
[!az]
Matches any character not in the group (i.e. not equal to a or z)
Comparison Projects
The Projects window enables you to navigate quickly and compare the 'base folders' containing projects files whose changes you are monitoring.
The following information is maintained in the Comparison Project:
Name
The project name
Created
The date that the project was created
Base Folders
The base folders (left <> right) that you are comparing
File Name Alignment
The File Name Alignment area provides functionality to match and align the file names that have changed or been modified during the course of the translation workflow
[edit] File Name Alignment
The 'File Name Alignment' provides functionality to match/align the file names that have changed or been modified during the course of translation workflow.
On loading the window, the tool will automatically attempt to align all folders/files, locating first the 100% matches and then fuzzy match the remaining, respecting the fuzzy match limit % value specified from the interface. The percentage relative to the similarity of the file names is calculated and associated to each linked file.
Use the toolbar options to either link all fuzzy matched files Image:FileAlignmentFuzzLinkAll.png or manually align file names by first choosing a single file from one side, then select the button 'Link selected file to...' Image:FileAlignmentFuzzLinkIndividual.png and choose the corresponding file on the opposite side; a possible match to the file name will be highlighted when hovering over them.
Use the display Image:FileAlignmentExact.png Image:FileAlignmentFuzzy.png and fileImage:Filter.pngfilters to limit the amount of information presented in the window.
Note: Only files with the same directory path relative to the base folders can be linked.
Image:FileNameAlignment.png
[edit] Report Viewer
The 'Report Viewer' window displays the comparison results of the selected files considering the options choosen by the user.
The user can choose some or all of the fields (representative of the properties in the files) to be viewed in the report display. A full modifications analysis (along with the cost analysis) is included the headers for each single file that has been compared along with the accumulated totals located in the report header.
[edit] Understanding the Report Viewer
The 'Reports Viewer' window displays the comparison results of the selected files considering the options choosen by the user.
The Report Wizard provides the user with various options to control what properties are to be displayed in the comparison report (including the association of the Price Group)
The comparision report is designed to present the relevant File Statistics associated with each file including the detailed comparision modificaions for each segment.
The overall Total Statistics are included in the header of the comparison report including graphical charts that provide a visual display of statistical data.
Note: it is imoprtant to remember that the charts are generated from the google API, therefore an internet connection is required; however, in the absense of an internet connection the report can still be generated by deselecting the option 'Show google charts' from the either the 'Report Wizard' or general settings area.
[edit] Translation Modifications
The overall 'Translation Modifications' statistics is located in the report header and displays the total modifications made on each of the segments grouped by the original segment 'Match' status (i.e. Perfect Match, Context Match etc...). It displays the total segments in each group and the amount of segments that were modified in those groups, along with a subtotal of the words, characters and tags that were added/removed.
Note: this statistical block can also be found in the 'File Statistics' for each the files in the comparision report without the graphical chart.
Image:Reports_translation_modifications.png
[edit] Post-Edit Modifications Analysis
The overall 'Post-Edit Modifications Analysis' statistics is located in the report header and displays the total weight and associated cost of the modifications applied to all the files. Make reference to Post-Edit Modifications Analysis for information related to the algorithm used to recover the PEM %.
Note: this statistical block can also be found in the 'File Statistics' for each the files in the comparision report without the graphical chart and including the individual prices from the associated Price Group.
Image:Post-edit_modifications_analysis.png
[edit] Confirmation Statistics
The overall 'Confirmation Statistics' is located in the report header and displays the total amount of changes made to the segment statuses (i.e. Not Translated, Draft, Translated etc...)
Image:Reports_confirmation_statistics.png
[edit] File Statistics
The 'File Statistics' accounts for all changes that occured to the individual files; this block is located at the header of each file in the comparision report. The layout resembles much the same as you see from the 'Total Statistics' but representative of the individual files.
It displays the total modifications made on each of the segments within the file along with the Post-Edit Modifications Analysis.
The 'Translation Modifications' are grouped by the original segment 'Match' status (i.e. Perfect Match, Context Match etc...). It displays the total segments in each group and the amount of segments that were modified in those groups, along with a subtotal of the words, characters and tags that were added/removed.
The 'Post-Edit Modifications Analysis' block calculates the cost of transaltion taking into consideration the Price Group and PEM % (i.e. the weight of changes made to each translation during the post-edit phase represented as a percentage).
Image:File_statistical_data.png
[edit] Post-Edit Modifications Analysis
The 'Post-Edit Modifications Analysis' (in combination with the Price Group) calculates the cost of all modifications made to existing translations in the file. The segments are grouped using the standard SDL Trados Studio analysis bands (i.e. 100%, 99-95%, 94-85% etc...) taking into consideration that the PEM % quantifies the amount of modifications made to each translation represented as a percentage.
[edit] Calculating the PEM % (i.e. Post-Edit Modifications percentage)
The Post-Edit Modifications percentage in principal is calculated using an algorithm respecting the 'Damerau–Levenshtein' edit distance, by counting the minimum number of operations needed to transform one string into the other where an operation is defined as an insertion, deletion, or substitution of a single character, or a transposition of two adjacent characters.
Understanding the 'Damerau–Levenshtein' edit distance, we can then calculate the PEM % (i.e. the weight of changes made to each translation during the post-edit phase represented as a percentage).
A single character, tag & placeable represent the smallest single unit when calculating the distance.
Note: to include the influence of changes made to tags or placeables - they are represented as single units
[edit] PEM %
d = the distance by comparing the original segment against the updated one.
n = the maximum number of units (charaters, tags, placeables) from either the orginal or updated segment.
w = the weight of changes, represented as a precentge (d / n) * 100
p = the PEM % (post-edit modifications percentage) 100 - w
Exampe
In the following example you will see that the only real difference between the original and updated segments is the addition of one letter 's', although the complete word has been highlighted as changed (including the relevant characters).
Orginal: Utilisation de masques
Updated: Utilisation des masques
d = 1 (i.e. the addition of one letter 's')
n = 23 (i.e. the number of charaters from the updated segment)
w = 4.35 { (1 / 23) * 100 }
p = 95.65 { 100 - 4.347 }
Image:Reports_PEM.png
Understanding the PEM % we can correctly allocate the segments to their respective analysis band categories (i.e. 100%, 94-99% etc...) and calculate the cost of translation with the aid of the Price Groups
Image:Post-edit_modifications_analysis_file.png
[edit]
Report Wizard
The 'Report Wizard' creates the comparision report based on the options choosen by the user.
Choose the files you want included in the report and select 'Create Comparison Report' either from the 'Actions' menu or right click context menu.
[edit] Report Actions - Step 1 of 2
Note: Select the action and optionally choose the 'Price Group'
[edit] Actions
Compare selected files
Compare the selected files
Optional: extend selection
If single files were selected from either the left and/or right side in the Folder Viewer, then this option will extend the selection to compare against the matching file on the opposite side; otherwise the selected file on the left will be matched against the selected file on the right.
Compare all files
Compare all files visible from the Folder Viewer
Optional: include all subfolders
This option will attempt to expand all subfolders and include the files matching the filter criteria in the comparison report
[edit]
Price Groups
Price Group
Select the Price Group that is be used for calculating the cost in the Post-Edit Modifications Analysis area
Note: you can setup the Price Groups from the settings window (i.e. tools>settings>price groups)
Image:Report_wizard_01.png
[edit] Report Options - Step 2 of 2
Note: Changing the settings from 'Report Wizard' will override the general settings for the current report.
[edit] File Summary
Show files with no translation differences
This option allows you to filter out files that are identical
Show google charts in the file summary area
The google charts API requires an internet connection, so if you are working off-line then deselect this option
Calculate the 'Post-Edit Modifications Analysis' based on the segment rows that are filtered in the report If this option is not checked then, all translation modifications are considered in the Post-Edit Modifications Analysis area
[edit] Segment Columns
Segment Columns
Multiple options to view the required columns/properties associated with the segment properties.
Example: source segment column, target segment column, segment status column etc...
[edit] Segment Rows
Show/Hide the named rows
Multiple options that allows the user to filter the segments that are visible from the comparison report, given some conditions of the segment.
Example: show/hide segments with modifications, comments etc...
Image:Report_wizard_02.png
[edit] Price Groups
A 'Price Group' is an accumulation of prices, where each price is associated to an analysis band category (i.e. Perfect Match, Context Match, Repetition etc...) for a specific language set (i.e. en-US->it-IT etc...); this combined with the PEM % permits us to calculate the cost of all translation modifications that occur during the Post-Edit phase of a project.
It is possible to maintain multiple price groups that represent different prices for different uses. The 'Price Group' is selected from the Report Wizard when creating the Comparison Report.
Image:Price_Group_01.png