Studio 2022+ TranslationMemoryAPI: GetHashCode() for TM segment gives wrong hash code

Hello,

Recently, we had to develop an algorithm to parse an sdlxliff file and find the matches from a file-based reference TM that were copied during translation, and to generate statistics on their metadata and the modifications done in post-editing. To find each relevant segment in the TM, we used the "SDL:OriginalTranslationHash" value from the SDLXLIFF file.

Then, we used this code snippet to obtain the corresponding segment from the translation memory:

FileBasedTranslationMemory tm = new FileBasedTranslationMemory(pathToTM);
bool matchFound = false;
RegularIterator ri = new RegularIterator();
var units = tm.LanguageDirection.GetTranslationUnits(ref ri);
while (units.Count<TranslationUnit>() > 0 && !matchFound)
{
    var tu = units.FirstOrDefault(u => u.TargetSegment.GetHashCode() == originalTranslationHash);
    if (tu != null)
    {
        // ... process field values and segment content
        matchFound = true;
        break;
    }
    units = tm.LanguageDirection.GetTranslationUnits(ref ri);
}

This worked well in Studio 2021 SR2. But with the Studio 2022 and Studio 2024 APIs, this code no longer returned the relevant segment from the TM.

After some thorough debugging, it turned out that the cause of the problem was that in the Studio 2022+ API TargetSegment.GetHashCode() for the segment in question returned a different value than SDL:OriginalTranslationHash saved in the sdlxliff file.

After some more digging, the reason became clear: The formula in the custom code for GetHashCode() in Sdl.LanguagePlatform.Core.Segment is based on the value for Culture.GetHashCode(), but while the Culture property of the segment was a CultureInfo object in Studio2021, it has changed into a CultureCode object in Studio 2022+. Different object = different hash code. 

To get the hash code as originally calculated in the Studio2021 API, we created a custom function by slightly modifying the custom GetHashCode() method taken from the Studio 2022 API:

private static int HashCode(Segment seg)
{
    if (seg.Culture == null || seg.Elements == null)
    {
        return -1;
    }

    int num = new CultureInfo(seg.Culture.Name).GetHashCode();   // was: seg.Culture.GetHashCode()
    foreach (SegmentElement element in seg.Elements)
    {
        num = (num << 1) ^ element.GetHashCode();
    }
    return num;
}

So although the algorithm now works correctly with this workaround under Studio 2022+, some questions remain:

  • Why is the hash code for the Segment calculated differently in the API than in the Studio application when filling in the SDL:OriginalTranslationHash value in the sdlxliff file?
  • Was the change in the GetHashCode result intentional or just an unfortunate side-effect when changing the base object for the Culture property?
  • Is there an "official" way to obtain the original function that calculates the OriginalTranslatoinHash value programmatically through some Studio API means? 

Thank you,

Attila



clarification
[edited by: Attila Grozli-Nagy at 7:33 AM (GMT 0) on 7 Feb 2025]