While creating and analysing projects using the project automation SDK, I've noticed strange analysis results.
First of all, here's the code which runs the analysis:
private void FormulateAnalysisSettings(IProject project)
{
var settings = project.GetSettings();
var analyzeSettings = settings.GetSettingsGroup<AnalysisTaskSettings>();
analyzeSettings.ReportCrossFileRepetitions.Value = true;
analyzeSettings.ReportInternalFuzzyMatchLeverage.Value = true;
project.UpdateSettings(settings);
}
private Guid RunFileAnalysis(IProject project)
{
var targetFiles = project.GetTargetLanguageFiles();
var analyzeTask = project.RunAutomaticTask(targetFiles.GetIds(), AutomaticTaskTemplateIds.AnalyzeFiles);
var thisReport = analyzeTask.Reports[0];
return analyzeTask.Reports[0].Id;
}
What I've found is that when creating two projects with the same source files, say German > Italian and German > French, the analysis reports can be identical. Once... ok, a coincidence; regularly... something's wrong.
Looking further, I've found that in the directory where the project is created is a sub-directory named 'Reports' and which contains the following files (in this case a DE > IT project):
- Analyze Files de-CH_it-CH(1).xml
- Analyze Files de-CH_it-CH.xml
- Pre-translate Files de-CH_it-CH.xml
- Translation Count de-CH_it-CH.xml
- Word Count .xml
What's puzzling here is why there are two files named 'Analyze Files de-CH_it-CH' - particularly as these both contain different results. I've built a small tool which parses these files and adds up the analysis for each file into the same totals which are displayed inside Trados (I've removed the band matches to simplify things):
- Analyze Files de-CH_it-CH(1).xml
Context: 0
CrossFileRepetition: 347
Exact: 2
Locked: 0
NoMatch: 1314
Perfect: 0
Repetition: 712
Total: 2659 - Analyze Files de-CH_it-CH.xml
Context: 0
CrossFileRepetition: 357
Exact: 2
Locked: 0
NoMatch: 1496
Perfect: 0
Repetition: 706
Total: 2659
These projects are getting uploaded to GroupShare. If the GroupShare project is then fetched from the server - or if the project is simply opened from the file-system - then the analysis results shown in Trados are, I think, the same as Analyze Files de-CH_it-CH(1).xml, although I can't find the source of the band matches as these differ.
However, if I then run the analysis on the project from inside Trados (running the 'Analyze Files' task sequence) then the analysis results change again...
So now I have three different sets of analysis results.
Another point is that the project files themselves contains analysis results, and these also differ before and after the 'Analyze Files' task has been run, as well as from the local file-based project and the version which is retrieved from GroupShare, although I'm not sure if these differences affect the numeric totals - for example, the GroupShare version contains <Fuzzy> elements in <AnalysisStatistics> which aren't in the locally-created project file.
I'm inclined to believe that this post-analysis-task result is the true result, given that we're sometimes seeing the same source language into two different languages producing identical analysis results. If one of those same-analysis-result projects is then run through the 'Analyze Files' task sequence in Trados then the report differs across languages, which seems a natural expectation.
So can anyone explain why the first analysis - using the code shown earlier - appears to be returning erroneous analysis results? It's my understanding that I'm running exactly the same task sequence programmatically as I'm later running in Trados.
Also, why are there two different analysis files in the project directory?
Generated Image Alt-Text
[edited by: RWS Community AI at 9:07 AM (GMT 0) on 14 Nov 2024]