SDL TMS 11.0 | Incorrect handling of timeouts on SDL TMS multi-server environments

Symptoms: 
In a multi-server environment, sometimes when a timeout occurred on server A and server B picked up the tasks, the sequence of actions happening at database level were managed badly, generating an overlap. This overlap led to the creation of duplicate entries in the database, as well as to incorrect multiple possibilities to access the TMS task, for the same action to be performed. The overall environment isn't performing as well as it should.

Explanation: 
The issue could manifest itself this way:

  • On server A, a plug-in runs up to its timeout, but the exit has not yet occurred.
  • On server B, when the plug-in starts, it observes that the server A version has exceeded its timeout and therefore presumes that it crashed, so it is taken over by Server B.
At this point both Server A and Server B are running the same plug-in. A snowball effect can then occur, increasing the server's resources consumption for invalid reasons.

Resolution: 
The solution consists in the following:

  • Each plug-in is now notified when it times out, so that the appropriate plug-in termination action can happen.
  • When a plug-in is running under a given server, the ownership of this plug-in can be taken by another server, only if the plug-in has timed out and has been properly terminated on the first server.

SDL- hosted customers should contact SDL Support to arrange for a hotfix to be deployed.

Customers who host SDL TMS themselves should download the installer for SDLTMS11.0 Hotfix for TMS-3513.exe from the following FTP site: ftp://sdlpatches:5dlpatch35@ftp-emea.sdlproducts.com/SDL TMS/11.0/Hotfix.

Note: 
This issue was detected in a previous version of SDL TMS. Per SDL Engineering policy, the Engineering team provides hotfixes for the latest release of SDL TMS to encourage our customers to run the latest version of SDL TMS to benefit from the latest features and fixes.