RunAutomaticTask with AnalyzeFiles seems to run very slow when number of ProjectFileIds input is large

Have any one ever come across an issue when running a RunAutomaticTask with AnalyzeFiles API with large number of files in the project?

Here's my comparison:

This one had less than 100 files in the project:

This one had about ten times more files:

I would expect 10 times longer, but 37 minutes is definitely much longer than 5 minutes.

Have anyone ever come across this issue? Is there any way to solve this?

Thank you!

Rieko

Parents
  • Analysis is quite a complex process.

    Here is an example of an analysis I ran today:

    >I would expect 10 times longer, but 37 minutes is definitely much longer than 5 minutes

    Unfortunately it isn't that simple. A couple things to consider:

    1. Are all the files the exact same size and have the same exact structure? Unless that is the case, each file will take a different amount of time to analyze.

    2. Each file is located at a different location on your disk. The time it takes to find and open each one varies greatly.

    3. Trados (and PCs in general) have a limited amount of resources, once you use them up, extra work has to be done.

    For example, with a few files there maybe no need to release any other memory, but once you increase the number of files

    you process, then extra work has to be done to release memory which didn't occur when you had only a few files.

    Anyways, one thing you can do is try upgrading your hardware, i.e. increasing RAM, getting a faster CPU or a solid-state drive.

    However, 783 files in 37 minutes seems quite fast so maybe you already have a fast PC.

    The other solution is to wait and see if the SDL developers can improve how analysis is performed, but I wouldn't count on that.

  • Hello Jesse,

    Thank you very much for showing me that I'm not the only one suffering with analysis task!  I use Windows 7 Professional 64-bit with 3.2 GHz CPU and 12 GB memory, so it's not too bad right?

    I have also found out that SaveTaskReportAs with HTML reportFormat input takes quite some time as well.  You can probably see the affect of this even from Studio 2014 Interface because when there are large number of files in the project, loading the analysis report takes a while like this:
      

    I sure hope SDL developer will improve the way analysis is performed!  Can we have some comment from them?

    Thank you!

    Rieko

  • Another to keep in mind about the analysis is that, during the analysis, the segments of every new file have to be compared with the previous files to check for repetitions.

    For example:

    - the last file in a batch of 10 files, will be compared with the previous nine files.

    - the last file in a batch of 1000 files has to be compared with the previous 999 files.

    So, if you have ten times more files, you should not expect just ten times longer analysis time but even longer times because of this internal analysis. (the size of the files will have an impact of course).

    How many segments do you have in the two analysis that you compared?  is really difficult to tell whether the time difference is normal without knowing the size of the files.

    Daniel

  • Ah!  Thanks for the good reminder, Daniel!

    So, does this mean it should take much less time if we disallow "Report Cross-file Repetitions" in Analyze Files Setting?

    How about the issue with creating an HTML report?  Like I said, it is very slow even in the Studio environment.  Is there setting I can use to make it go faster?

    Thank you!

    Rieko

Reply Children
No Data