While creating and analysing projects using the project automation SDK, I've noticed strange analysis results. First of all, here's the code which runs the analysis: private void FormulateAnalysisSettings(IProject project) { var settings = project.GetSettings(); var analyzeSettings = settings.GetSettingsGroup<AnalysisTaskSettings>(); analyzeSettings.ReportCrossFileRepetitions.Value = true; analyzeSettings.ReportInternalFuzzyMatchLeverage.Value = true; project.UpdateSettings(settings); } private Guid RunFileAnalysis(IProject project) { var targetFiles = project.GetTargetLanguageFiles(); var analyzeTask = project.RunAutomaticTask(targetFiles.GetIds(), AutomaticTaskTemplateIds.AnalyzeFiles); var thisReport = analyzeTask.Reports[0]; return analyzeTask.Reports[0].Id; } What I've found is that when creating two projects with the same source files, say German > Italian and German > French, the analysis reports can be identical. Once... ok, a coincidence; regularly... something's wrong. Looking further, I've found that in the directory where the project is created is a sub-directory named 'Reports' and which contains the following files (in this case a DE > IT project): Analyze Files de-CH_it-CH(1).xml Analyze Files de-CH_it-CH.xml Pre-translate Files de-CH_it-CH.xml Translation Count de-CH_it-CH.xml Word Count .xml What's puzzling here is why there are two files named 'Analyze Files de-CH_it-CH' - particularly as these both contain different results. I've built a small tool which parses these files and adds up the analysis for each file into the same totals which are displayed inside Trados (I've removed the band matches to simplify things): Analyze Files de-CH_it-CH(1).xml Context: 0 CrossFileRepetition: 347 Exact: 2 Locked: 0 NoMatch: 1314 Perfect: 0 Repetition: 712 Total: 2659 Analyze Files de-CH_it-CH.xml Context: 0 CrossFileRepetition: 357 Exact: 2 Locked: 0 NoMatch: 1496 Perfect: 0 Repetition: 706 Total: 2659 These projects are getting uploaded to GroupShare. If the GroupShare project is then fetched from the server - or if the project is simply opened from the file-system - then the analysis results shown in Trados are, I think, the same as Analyze Files de-CH_it-CH(1).xml , although I can't find the source of the band matches as these differ. However, if I then run the analysis on the project from inside Trados (running the ' Analyze Files ' task sequence) then the analysis results change again... So now I have three different sets of analysis results. Another point is that the project files themselves contains analysis results, and these also differ before and after the ' Analyze Files ' task has been run, as well as from the local file-based project and the version which is retrieved from GroupShare, although I'm not sure if these differences affect the numeric totals - for example, the GroupShare version contains <Fuzzy> elements in <AnalysisStatistics> which aren't in the locally-created project file. I'm inclined to believe that this post-analysis-task result is the true result, given that we're sometimes seeing the same source language into two different languages producing identical analysis results. If one of those same-analysis-result projects is then run through the ' Analyze Files ' task sequence in Trados then the report differs across languages, which seems a natural expectation. So can anyone explain why the first analysis - using the code shown earlier - appears to be returning erroneous analysis results? It's my understanding that I'm running exactly the same task sequence programmatically as I'm later running in Trados. Also, why are there two different analysis files in the project directory?

Differing Analysis Results

While creating and analysing projects using the project automation SDK, I've noticed strange analysis results.

First of all, here's the code which runs the analysis:

private void FormulateAnalysisSettings(IProject project)
{
   var settings = project.GetSettings();
   var analyzeSettings = settings.GetSettingsGroup<AnalysisTaskSettings>();

   analyzeSettings.ReportCrossFileRepetitions.Value = true;
   analyzeSettings.ReportInternalFuzzyMatchLeverage.Value = true;

   project.UpdateSettings(settings);
}

private Guid RunFileAnalysis(IProject project)
{
   var targetFiles = project.GetTargetLanguageFiles();
   var analyzeTask = project.RunAutomaticTask(targetFiles.GetIds(), AutomaticTaskTemplateIds.AnalyzeFiles);

   var thisReport = analyzeTask.Reports[0];

   return analyzeTask.Reports[0].Id;
}

What I've found is that when creating two projects with the same source files, say German > Italian and German > French, the analysis reports can be identical. Once... ok, a coincidence; regularly... something's wrong.

Looking further, I've found that in the directory where the project is created is a sub-directory named 'Reports' and which contains the following files (in this case a DE > IT project):

Analyze Files de-CH_it-CH(1).xml
Analyze Files de-CH_it-CH.xml
Pre-translate Files de-CH_it-CH.xml
Translation Count de-CH_it-CH.xml
Word Count .xml

What's puzzling here is why there are two files named 'Analyze Files de-CH_it-CH' - particularly as these both contain different results. I've built a small tool which parses these files and adds up the analysis for each file into the same totals which are displayed inside Trados (I've removed the band matches to simplify things):

Analyze Files de-CH_it-CH(1).xml
    Context: 0
    CrossFileRepetition: 347
    Exact: 2
    Locked: 0
    NoMatch: 1314
    Perfect: 0
    Repetition: 712
    Total: 2659
Analyze Files de-CH_it-CH.xml
    Context: 0
    CrossFileRepetition: 357
    Exact: 2
    Locked: 0
    NoMatch: 1496
    Perfect: 0
    Repetition: 706
    Total: 2659

These projects are getting uploaded to GroupShare. If the GroupShare project is then fetched from the server - or if the project is simply opened from the file-system - then the analysis results shown in Trados are, I think, the same as Analyze Files de-CH_it-CH(1).xml, although I can't find the source of the band matches as these differ.

However, if I then run the analysis on the project from inside Trados (running the 'Analyze Files' task sequence) then the analysis results change again...

So now I have three different sets of analysis results.

Another point is that the project files themselves contains analysis results, and these also differ before and after the 'Analyze Files' task has been run, as well as from the local file-based project and the version which is retrieved from GroupShare, although I'm not sure if these differences affect the numeric totals - for example, the GroupShare version contains <Fuzzy> elements in <AnalysisStatistics> which aren't in the locally-created project file.

I'm inclined to believe that this post-analysis-task result is the true result, given that we're sometimes seeing the same source language into two different languages producing identical analysis results. If one of those same-analysis-result projects is then run through the 'Analyze Files' task sequence in Trados then the report differs across languages, which seems a natural expectation.

So can anyone explain why the first analysis - using the code shown earlier - appears to be returning erroneous analysis results? It's my understanding that I'm running exactly the same task sequence programmatically as I'm later running in Trados.

Also, why are there two different analysis files in the project directory?

Generated Image Alt-Text
[edited by: RWS Community AI at 9:07 AM (GMT 0) on 14 Nov 2024]

Translate

Rate translation

Suggest better translation

GroupShare Developers > GroupShare Developers forum

Differing Analysis Results