TD14 Content Importer

Now we are using SDL KC 2016 and preparing upgrade to TD14. With KC 2016, we often use publication export to correct multiple topics at once. Because, it's not possible to search and replace for multiple topics at once with KC environment. After we export the publication, performs a search and replace for bunch of topics. Then, imports corrected topics to KC using Batch Import.

It seems that this method is not usable with TD14. As you know, Batch Import is not available with TD14 so that we have to use Content Importer to import corrected topics. When I tried to import corrected topics with Content Importer, following errors were reported:

2019-08-27T14:49:16.5476875 Error Error updating 'href'. Unable to find 'GUID-2E78A786-F4C9-433E-BAFC-2395FCB2FEA4'. (C:\tmp\SVCDOC-1\GUID-1ED4C826-A446-471D-9CB8-2AFB945E6AAB.xml)

Since href attributes in the topics and maps were GUID based, all references were not be able to resolve. If errors occurred during conversion phase, Content Importer does not import those erroneous topics and maps.

Is there alternative way? Or am I wrong something?

Kind regards,

Naoki Hirai

  • Hi Ann,

    The 'Without conversion' import does not require the filemap.xml file. The filemap.xml file is generated when performing 'Standard import'. The filemap.xml file include mapping information between a file path and a GUID.

    In my case, I use the filemap.xml file when duplicating a publication. I perform 'Standard import' with the 'Generate new identifiers' option ON and stop importing after the conversion phase is finished. There are old and new GUID information in the filemap.xml. My program read the filemap.xml file and process *.xml and *.3sish files. Then, I import the processed *.xml files with relevant *.3shish files using 'Without conversion'.

    Cheers,

    Naoki

    emoji
  • This is a good conversation thread around content import. It sounds like you have a dialed in process for your use case Naoki which is great. In case others tracking this thread  have a similar need to duplicate a publication, I want to mention that RWS Professional Services has developed a Duplicate Publication software extension for Tridion Docs. The utility installs as a new button available in the Content Manager web client interface, named Duplicate Publication. The button allows the user to take an existing source language publication and duplicate that publication, its maps, topics, images, and output formats based on the selected publication’s baseline (while not duplicating library objects). Customers often use this to address use cases around using an existing publication as a template to create new publications, or using an existing publication to create a similar publication for scenarios where leveraging conrefs, variables, conditions, and topic reuse is not feasible or too cumbersome, or working with a very large existing publication where the customer needs to duplicate a chapter (submap) within that publication. Customers can contact RWS Professional Services if interested in using this utility.  

    emoji
  • I didn't see a response to the question about href not being resolved. Maybe I missed it. Together with import without conversion, There is this element in:

    %localappdata%\SDL\InfoShare Client\14.0\Trisoft.ContentImporter.config

    <validateReferences>Error</validateReferences>

    Supposedly, you can change it from Error to one of:

    Hidden Don't show in UI, don't log, import source
    Info Don't show in UI, log issue, import source
    Warning Show in UI, log warning, import source

    emoji
  • Although it is a couple of months ago that this question was raised I like to share the following just in case somebody else is interested in this thread.

    Using the TD14SP4 the content importer supports basically two variants of the "Standard" conversion scenario.

    When you start a new project in content importer and chose "Standard" then there are basically two option tabs: "Generic DITA Content" and "Repository DITA Content". With "Repository DITA Content" the Content Importer will not report broken links for @href attribute that contain GUIDs instead of existing filenames.

    For me this is the most ideal route to export content from one instance using the publication export and import it into another instance.

    The metadata in the exported .met files will also be preserved and this will work if all values are valid in the other instance. There are also metadata fields that are system controlled and are not allowed during import. You can remove such fields from the generated .3sish files or use an XSL that will do the same.

    I have an example of such an XSL file for those who are interested.

    emoji
  • I do not know how to attach files so herewith an example showing the code with some hints what could be changed to tailor to specific needs:

    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    
      <!-- Example of an XSL file to process a metadata file after conversion and before import. -->
    
      <xsl:output method="xml" />
    
      <xsl:param name="username" select="'Unknown'" />
    
      <!-- Input  folder path, e.g. "C:\Import\In". -->
      <xsl:param name="sourcebasepath" />
    
      <!-- Output folder path, e.g. "C:\Import\Out". -->
      <xsl:param name="targetbasepath" />
    
      <!-- Relative path to content file, e.g. "Sample\Topics\Topic.dita". -->
      <xsl:param name="filepath" />
    
      <xsl:variable name="fullpath" select="concat('file:///', $sourcebasepath, '/', $filepath)" />
      <xsl:variable name="objecttype" select="/ishobject/@ishtype" />
    
      <!-- reset the status to Draft: -->
      <xsl:template match="ishfield[@name = 'FSTATUS']">
        <xsl:copy>
          <xsl:copy-of select="@*" />
          <xsl:text>Draft</xsl:text>
        </xsl:copy>
      </xsl:template>
      
      <!-- example of other fields that can be defaulted to avoid metadata conflicts on the target Tridion Docs instance -->
      <!--
      <xsl:template match="ishfield[@name = 'VERSION']">
        <xsl:copy>
          <xsl:copy-of select="@*" />
          <xsl:text>1</xsl:text>
        </xsl:copy>
      </xsl:template>
      -->
      
      <!--
      <xsl:template match="ishfield[@name = 'FRESOLUTION']">
        <xsl:copy>
          <xsl:copy-of select="@*" />
          <xsl:text>Default</xsl:text>
        </xsl:copy>
      </xsl:template>
      -->
      
      <!--
      <xsl:template match="ishfield[@name = 'FAUTHOR']">
        <xsl:copy>
          <xsl:copy-of select="@*" />
          <xsl:text>Admin</xsl:text>
        </xsl:copy>
      </xsl:template>
      -->
     
      <!-- fields that are system control and not allowed to specify during import -->
      <xsl:template match="ishfield[@name = 'ED']" />
      <xsl:template match="ishfield[@name = 'CREATED-ON']" />
      <xsl:template match="ishfield[@name = 'MODIFIED-ON']" />
      <xsl:template match="ishfield[@name = 'FUSERGROUP']" />
      <xsl:template match="ishfield[@name = 'FISHREVCOUNTER']" />
      <xsl:template match="ishfield[@name = 'READ-ACCESS']" />
      <xsl:template match="ishfield[@name = 'FRESOLUTION'][not(. != '')]" />
      <xsl:template match="ishfield[@name = 'FISHLASTMODIFIEDON']" />
      <xsl:template match="ishfield[@name = 'FISHLASTMODIFIEDBY']" />
      <xsl:template match="ishfield[@name = 'FUSERGROUP']" />
      <xsl:template match="ishfield[@name = 'FUSERGROUP']" />
      
      <!-- catch all -->
      <xsl:template match="@* | node()">
        <xsl:copy>
          <xsl:apply-templates select="@* | node()" />
        </xsl:copy>
      </xsl:template>
    
    </xsl:stylesheet>
    

    emoji
  • I shared the XSL code but that reply has been marked as spam,

    Not sure how I can attach a file.

    emoji
  • Please, see the previous response. There was just some delay due to moderation provisions.

    emoji
  • Some users had problems using repository import and I never looked much into what the problem was, because of time, and there being an alternate solution. There are the server-side plugins that are involved, also the doctype gets changed on import, sometimes.

    For example a topic using a base DITA doctype gets changed to an SDL doctype on import. It would be nice to know where this happens.

    It's confusing that there are multiple options for import and that there need to be multiple options, to account for broken XML.

    Actually, the changing of doctype can be managed by this setting in %LOCALAPPDATA%\(SDL|RWS)\InfoShare\14.0\Trisoft.ContentImporter.config

    <convertDocumentTypes>false</convertDocumentTypes>

    But, implied conversion happening on import/check in, is still unclear to me

    emoji
  • Hi Kendall,

    The following file located server-side controls updating the public identifier when using content importer: \InfoShare\Web\ASP\DocTypes\documentTypeMap.xml.

    <?xml version="1.0" encoding="utf-8"?>
    <documentTypeMap>
      <!-- Maps a public ID to another public ID. -->
      <!-- It is used by Content Importer to convert a document type into another document type before importing it, e.g. -->
      <!-- Replace "-//OASIS//DTD DITA Concept//EN" with "-//SDL//DTD DITA Concept//EN" -->
      <rewritePublicId publicIdStartString="-//OASIS//DTD DITA 1.2" rewritePrefix="-//OASIS//DTD DITA 1.2" />
      <rewritePublicId publicIdStartString="-//OASIS//DTD DITA 1.1" rewritePrefix="-//OASIS//DTD DITA 1.1" />
      <rewritePublicId publicIdStartString="-//OASIS//DTD DITA 1.0" rewritePrefix="-//OASIS//DTD DITA 1.0" />
      <rewritePublicId publicIdStartString="-//OASIS//DTD DITA" rewritePrefix="-//SDL//DTD DITA" />
    </documentTypeMap>
    

    The logic of this mapping is to keep the same OASIS public identifier that contain a DITA version and otherwise replace it by the corresponding SDL public identifier. The reason for this mapping is that most software maps the unversioned OASIS public identifiers to DITA 1.3 while in Tridion Docs it maps to the proprietary DITA 1.2 grammar files from before moving to DITA 1.3. We typically update this file to do the mapping to a custom public identifier scheme.

    Another modification that is done during the convert phase is to replace DITA condition attributes by a single ishcondition attribute.

    On another note: sometimes the server-side plugins can help to do some basic operations on the XML such as removing unwanted attributes or elements.

    emoji