Differing Analysis Results

While creating and analysing projects using the project automation SDK, I've noticed strange analysis results.

First of all, here's the code which runs the analysis:

private void FormulateAnalysisSettings(IProject project)
{
    var settings = project.GetSettings();
    var analyzeSettings = settings.GetSettingsGroup<AnalysisTaskSettings>();

    analyzeSettings.ReportCrossFileRepetitions.Value = true;
    analyzeSettings.ReportInternalFuzzyMatchLeverage.Value = true;

    project.UpdateSettings(settings);
}

private Guid RunFileAnalysis(IProject project)
{
    var targetFiles = project.GetTargetLanguageFiles();
    var analyzeTask = project.RunAutomaticTask(targetFiles.GetIds(), AutomaticTaskTemplateIds.AnalyzeFiles);

    var thisReport = analyzeTask.Reports[0];

    return analyzeTask.Reports[0].Id;
}

What I've found is that when creating two projects with the same source files, say German > Italian and German > French, the analysis reports can be identical. Once... ok, a coincidence; regularly... something's wrong.

Looking further, I've found that in the directory where the project is created is a sub-directory named 'Reports' and which contains the following files (in this case a DE > IT project):

  • Analyze Files de-CH_it-CH(1).xml
  • Analyze Files de-CH_it-CH.xml
  • Pre-translate Files de-CH_it-CH.xml
  • Translation Count de-CH_it-CH.xml
  • Word Count .xml

What's puzzling here is why there are two files named 'Analyze Files de-CH_it-CH' - particularly as these both contain different results. I've built a small tool which parses these files and adds up the analysis for each file into the same totals which are displayed inside Trados (I've removed the band matches to simplify things):

  • Analyze Files de-CH_it-CH(1).xml
        Context: 0
        CrossFileRepetition: 347
        Exact: 2
        Locked: 0
        NoMatch: 1314
        Perfect: 0
        Repetition: 712
        Total: 2659

  • Analyze Files de-CH_it-CH.xml
        Context: 0
        CrossFileRepetition: 357
        Exact: 2
        Locked: 0
        NoMatch: 1496
        Perfect: 0
        Repetition: 706
        Total: 2659

These projects are getting uploaded to GroupShare. If the GroupShare project is then fetched from the server - or if the project is simply opened from the file-system - then the analysis results shown in Trados are, I think, the same as Analyze Files de-CH_it-CH(1).xml, although I can't find the source of the band matches as these differ.

Screenshot of an analysis report for a German (Switzerland) to Italian (Switzerland) translation project showing various match types and their counts, with a total of 2659 words.

However, if I then run the analysis on the project from inside Trados (running the 'Analyze Files' task sequence) then the analysis results change again...

Duplicate of the first screenshot, showing the same analysis report for a German (Switzerland) to Italian (Switzerland) translation project with a total of 2659 words.

So now I have three different sets of analysis results.

Another point is that the project files themselves contains analysis results, and these also differ before and after the 'Analyze Files' task has been run, as well as from the local file-based project and the version which is retrieved from GroupShare, although I'm not sure if these differences affect the numeric totals - for example, the GroupShare version contains <Fuzzy> elements in <AnalysisStatistics> which aren't in the locally-created project file.

I'm inclined to believe that this post-analysis-task result is the true result, given that we're sometimes seeing the same source language into two different languages producing identical analysis results. If one of those same-analysis-result projects is then run through the 'Analyze Files' task sequence in Trados then the report differs across languages, which seems a natural expectation.

So can anyone explain why the first analysis - using the code shown earlier - appears to be returning erroneous analysis results? It's my understanding that I'm running exactly the same task sequence programmatically as I'm later running in Trados.

Also, why are there two different analysis files in the project directory?



Generated Image Alt-Text
[edited by: RWS Community AI at 9:07 AM (GMT 0) on 14 Nov 2024]