Proposal: conref migration tool

Hi all,

I'd like to ask if people see a point to having a tool that allows you to take elements used as conref targets in one Library Topic, and move them to another Library Topic. Currently, this will break all references to those conref targets, because conref references include the (Library) Topic ID. The migration tool would go through your repository and update all the references.

Use case 1:

I have created one very large Library Topic, containing a large number of reusable steps, reusable strings and so on. I also have a deliverable that uses a small fraction of those steps and strings, and I want to have this deliverable translated. Because the deliverable references my Library Topic, the Library Topic gets sent out for translation as well, and translators spend a lot of time translating content that isn't part of my deliverable.

What I would like to do is take the few reusable steps and strings that are actually used in my deliverable, and split them off, that is, put them in their own Library Topic. However, if I just copy-paste the steps and strings into a new Library Topic, all my references to those steps and strings are broken, and I have to spend a lot of time fixing them by hand.

Use case 2:

I have documentation for a software product that consists of three software components, let's call them Foo, Bar and Fred. My Library Topic contains reusable steps related to all three software components. Now, the Fred software component gets repurposed and used by a completely different software product. The docs for this other product need to reference some steps in my Library Topic, but only steps pertaining to Fred.

Here, too, I want to separate off some of the steps in my Library Topic and put them in a separate, new Library Topic. And again, doing so, would break tons of references.

In both use cases, you could argue that I should have organized my Library Topics more "finely," not mixing translatable content with non-translatable content, or not mixing Foo, Bar and Fred content. But it's often hard to predict where and how pieces of content are going to be used.

The problem that arises is one that is almost impossible to fix by hand, but would be very easy to build as a product feature.

  • Hi.

    Define "very easy"...

    Interesting idea. So, suppose you wanted to make this kind of change, would it be forward-looking only or should the references really be updated throughout the database, that is, including released content in all languages? Those two are rather different propositions to start with. I think I would steer clear of the latter option.

    I guess my first reaction would be to limit the functionality so that any changes would be made to draft content. Meaning for any references found in released content, a new version would be (automagically) created and the conref references updated there. And probably an XML comment to flag the change made with a timestamp. The changes would trickle down to translations as any other changes, over time.

    Definitely the functionality should also generate a detailed report of:

    • new objects created
    • objects modified (those which were already in draft state)

    Finding the conrefs and modifying them is not per se a problem in principle. Those could be mined from metadata, FISHFRAGMENTLINKS and FISHLINKS fields.

    I sense a lot of edge cases in the wings, this would have to carefully analyzed.

    Alternatively, at least as as first step you could generate a report of the object/version/language that have the conref reference you wish to change. Then that would give an idea of the workload.

    Just some thoughts.

    Joakim

  • My input here is to broathen the horizon on this thread. Technically I think a tool can be written based on the API, but a generic tool might not be that easy as everybody's starting situation regarding information architecture on conref usage is different.

    For those of you who didn't start yet, I would like to point out that the Knowledge Center Content Manager has an alternative called 'Variables'.

    • With the OASIS DITA @conref structure, you have to specify something like A/B#C as conref value assisted by the tools. Now in the CMS the A and B-part are usually the same unless you start using advanced nesting structures. The C-part is your identifier in the library identified by A.
    • Variables only have the C-part in the referencing system we have before OASIS DITA was there. So in one topic you use the variable reference (@varref) and in another library file you resolve the variable (@varid).

    As @conref structure are more rigid on the A-part, the @varref structure is wide open. Tools like Publication Manager will tell you if you try to resolve a variable usage with more than one assignment, which in theory could from different library files.

    As a best practice I once picked up in a user group is to organize your variable assignment libraries by

    1. company like copyright year, company name
    2. product level like name
    3. per publication

    It was also mentioned that variables are easier for localization than conrefs. I also remember a statement: "A condition on a conref is probably a variable".

    My 2 cents
    Dave

  • All good points. The problem is more complicated than I at first imagined. I would agree with taking a forward-thinking approach, that is, from this point forward, update all the references in draft content only.

    The best approach would then be to make a clean break, that is, not to remove some items from the old LT, but to completely abandon the old LT in favor of two (or more) new LTs. That would also make it clearer for any topic whether it's old-style or new-style.

    I also agree that a report-generating plugin would be a great first step. It's hard to quantify the effort involved in a manual task without such a tool. I guess you could very primitively get an impression by doing the following:

    1. Identify the GUIDs of the reusable content elements you would want to split off.
    2. Search for those GUIDs in Browse Repository in Publication Manager and count up the number of hits you get.

    Depending on how many items you want to split off, this could take you several hours, but at the end you'd have a rough idea of the work that would be involved.
  • Hi Dave,
    Those are good points. Essentially I guess I see variables and conrefs as two rather different concepts, though. Conref could replace variables in principle, but not the other way around.

    I mean, variables are used to manage simple textual and numerical values or graphics. For the use cases in your list they are perfect and a much better idea than conrefs. But I suspect variables would not cut it in Mathijs' use case, where bigger chunks of markup is referenced (and also validated at the insertion point of course).

    Indirect addressing is always a good idea, so use @conkeyref instead of @conref.

    Ps. I have never tried what would happen if a variable resolved to markup - I guess it would just show as a string of text? Not that it is relevant to the questions in this thread...