corrupt TM

Hi, I cannot repair my TM, I tried the plugin but the result was an etehnic us-ro.zipmpty TM, could you please help?

Regards,

Anca

emoji
Parents Reply Children
  • Unexpected end of file has occurred. The following elements are not closed: body, tmx. Line 5, position 1.
    
    Source: System.Xml
       at System.Xml.XmlTextReaderImpl.Throw(Exception e)
       at System.Xml.XmlTextReaderImpl.Throw(String res, String arg)
       at System.Xml.XmlTextReaderImpl.ThrowUnclosedElements()
       at System.Xml.XmlTextReaderImpl.ParseElementContent()
       at System.Xml.XmlTextReaderImpl.Read()
       at Sdl.MultiTerm.Tools.GlossaryConverter.Filters.TmxReader.GetDefinitionLanguages()
       at Sdl.MultiTerm.Tools.GlossaryConverter.Filters.TmxReader.GetFields(IFilter filter)
       at Sdl.MultiTerm.Tools.GlossaryConverter.Filters.SdltmReader.GetFields(IFilter filter)
       at Sdl.MultiTerm.Tools.GlossaryConverter.Core.FieldReader.Read(List`1 inputFilters, List`1 inputPaths)
       at Sdl.MultiTerm.Tools.GlossaryConverter.Workflow.Converter.Convert(List`1 inputFilters, List`1 inputPaths, IFilter outputFilter)
       at Sdl.MultiTerm.Tools.GlossaryConverter.Workflow.FileHandler.HandleMultipleFiles(List`1 files)
       at Sdl.MultiTerm.Tools.GlossaryConverter.Workflow.FileHandler.ProcessFiles(List`1 files)
    Program version 6.0.7588.22667
    
    Windows version: Windows 10, Build 19045, 64 bit (English (United States))
    
    .net versions:
    v2.0.50727  2.0.50727.4927  SP2
    v3.0  3.0.30729.4926  SP2
    v3.5  3.5.30729.4926  SP1
    Client  4.8.04084
    Full  4.8.04084
    Client  4.0.0.0
    
     Libre/OpenOffice version: 
    
     MultiTerm version: MultiTerm 16.0.0.0
    
    
    1/21/2023 3:24 PM	Could not find LibreOffice.CalcDocument\CurVer in registry
    1/21/2023 3:24 PM	Could not find opendocument.CalcDocument\CurVer in registry
    
    <Settings version="3">
      <AlwaysShowFieldsDialog>false</AlwaysShowFieldsDialog>
      <CheckForUpdates>true</CheckForUpdates>
      <PlaySound>false</PlaySound>
      <FastExcelMode>true</FastExcelMode>
      <ExcelTags>true</ExcelTags>
      <ExcelRawMode>false</ExcelRawMode>
      <DefineTermbaseOutputFormat>false</DefineTermbaseOutputFormat>
      <MergeFiles>false</MergeFiles>
      <UseTermbaseTemplate>false</UseTermbaseTemplate>
      <UseMasterTermbase>false</UseMasterTermbase>
      <MergeLanguages>false</MergeLanguages>
      <MergeSubLanguages>false</MergeSubLanguages>
      <MergeEntryNumber>false</MergeEntryNumber>
      <MasterTermbase></MasterTermbase>
      <MergeField></MergeField>
      <TermbaseTemplate></TermbaseTemplate>
      <Synonyms>
        <Type>OneLine</Type>
        <Column></Column>
        <Repeat>false</Repeat>
        <Separator>|</Separator>
      </Synonyms>
      <DefaultGlossaryFormat>Excel 2007 Workbook</DefaultGlossaryFormat>
      <UiLocale>en</UiLocale>
      <UiTheme>Default</UiTheme>
      <SettingsTab>0</SettingsTab>
      <EmptyOutput>false</EmptyOutput>
      <IgnoreUnknownFields>false</IgnoreUnknownFields>
      <CreateEmptyFields>false</CreateEmptyFields>
      <WriteDocType>false</WriteDocType>
      <LargeFileExcelMode>false</LargeFileExcelMode>
      <MultiFieldMode>ignore</MultiFieldMode>
      <CreationUser>glossaryconverter</CreationUser>
      <TbCopyright></TbCopyright>
      <TbDescription></TbDescription>
      <UseContentFilter>false</UseContentFilter>
      <RegexContentFilter>false</RegexContentFilter>
      <ContentFilterText></ContentFilterText>
      <Tbx>
        <Dialect>Core</Dialect>
        <ResolveNote>false</ResolveNote>
        <MappingFile></MappingFile>
        <UseMappingFile>false</UseMappingFile>
      </Tbx>
    </Settings>
    
    in: E:\anca2018\TM\tehnic us-ro.sdltm
    out: E:\anca2018\TM\tehnic us-ro.sdltb
    
    Conversion start: 1/21/2023 3:24:22 PM
    
    ****  Error connecting to Studio
    Sdl.LanguagePlatform.Core.LanguagePlatformException: The translation memory data file is corrupt.
       at Sdl.LanguagePlatform.TranslationMemoryApi.FileBasedTranslationMemoryLanguageDirection.GetTranslationUnits(RegularIterator& iterator)
       at SdlTmConverter.SdlTmConvertReader.WriteTus(StreamWriter sw, ITranslationMemoryLanguageDirection ld)
       at SdlTmConverter.SdlTmConvertReader.CreateTmxTm(String sdltmPath, String tmxPath)
       at SdlTmConverter.Program.HandleFromSdltm(String[] args)
       at SdlTmConverter.Program.Main(String[] args)
    ****  Conversion from sdltm failed, error code -4
    ****  There was an error creating the output file.
    System.Xml.XmlException: Unexpected end of file has occurred. The following elements are not closed: body, tmx. Line 5, position 1.
       at System.Xml.XmlTextReaderImpl.Throw(Exception e)
       at System.Xml.XmlTextReaderImpl.Throw(String res, String arg)
       at System.Xml.XmlTextReaderImpl.ThrowUnclosedElements()
       at System.Xml.XmlTextReaderImpl.ParseElementContent()
       at System.Xml.XmlTextReaderImpl.Read()
       at Sdl.MultiTerm.Tools.GlossaryConverter.Filters.TmxReader.GetDefinitionLanguages()
       at Sdl.MultiTerm.Tools.GlossaryConverter.Filters.TmxReader.GetFields(IFilter filter)
       at Sdl.MultiTerm.Tools.GlossaryConverter.Filters.SdltmReader.GetFields(IFilter filter)
       at Sdl.MultiTerm.Tools.GlossaryConverter.Core.FieldReader.Read(List`1 inputFilters, List`1 inputPaths)
       at Sdl.MultiTerm.Tools.GlossaryConverter.Workflow.Converter.Convert(List`1 inputFilters, List`1 inputPaths, IFilter outputFilter)
       at Sdl.MultiTerm.Tools.GlossaryConverter.Workflow.FileHandler.HandleMultipleFiles(List`1 files)
       at Sdl.MultiTerm.Tools.GlossaryConverter.Workflow.FileHandler.ProcessFiles(List`1 files)
    
    

    Hi, unfortunately it does not work with glossary converter, it is a locat TM.

    Thanks,

    Anca

    emoji
  •  

    If your TM is corrupt then none of these solutions is going to help you.  You need to try and repair it.  Use this tool from the appstore: https://appstore.rws.com/plugin/41/

    There is no guarantee here because your TM may be damaged beyond repair, but this is your best bet.

    Going forward it's a good idea to back up your TM at least weekly, and in my opinion you should also export to TMX when you do this.  If the worst comes to the worst it's far easier to recover a text based file like a TMX than it is a database,

    emoji
  •  

    I also had a play with your sdltm (just noticed you provided it).  SDLTM repair cannot recover the data.  So I had a play with the sql and get the same result no matter what I try.  I then tried a slqite recovery tool that gives me mixed results but sometimes I can get something out and I did notice you seem to have TUs created from multiple users so I wondered if you are sharing this TM with others in some way?  This is always a risky business and you should either take sensible precautions to never write to the same memory, or use a TM designed for sharing such as a server-based TM in GroupShare, or maybe use Language Cloud.  File-based TMs are not designed to be worked on by multiple users.

    The recovery tool also failed after some hours processing.  So then I went old fashioned and I did manage to get the translation units from the table just with DB Browser for SQLITE.  It'll need some cleaning up but at least you have the data if you want to try and clean it up and recreate a TM for use.  I'll drop you an email with the link because now the data in it could be seen.

    emoji
  •  

    I couldn't resist playing with this and in the end redid what I sent you as I found some problems with the content.  So for the benefit of anyone else with a corrupt TM that SDLTM Repair won't fix because it's beyond repair, this is what I did.

    1. Opened the SDLTM in the DB Browser for SQLITE
    2. Exported the "Translation Units" table to a csv
    3. Added a BOM to the csv(very important if you don't want to corrupt characters using diacritics when you import to Excel... I had to do this exercise twice because I didn't realise this would be a problem the first time)
    4. Imported the csv to Excel
    5. Deleted all the columns I didn't want... in this case I just kept the source and target content.  So I have this sort of stuff:
    6. I deleted all the ones that were missing target and copied the content of both source ad target into a text editor so I could clean them up.I used this expression...
      Search for:
      (?:"?<Segment.+?<Value>(.+?)</Value>.+?</Segment>"?(\t)"?<Segment.+?<Value>(.+?)</Value>.+?</Segment>"?)
      Replace with:
      $1$2$3
    7. The regex may seem unnecessary in places but as I was cleaning the text I had to adapt this for some odd segments.  Eventually this seemed to do the trick.
    8. This got me a clean tab delimited text for source and target (for example... the screenshot above becomes this)
    9. I pasted that back into Excel as it's easier to clean up in there

    Creating the TMX and then an SDLTM from this is trivial at this point.

    I'm 100% certain there will be a few problems with the quality of the alignment of all the data (there were some carriage returns within a segment and I'm sure I didn't find them all) and it should be thoroughly checked, but I think this is a useful exercise to be able to recover what would otherwise be over 100k TUs of many years work completely lost.  So better than nothing at all!

    emoji