Find reuse of topics across publications

Hi Developers,

I am trying to get the reuse percentage of topics in a publication.

I am getting all baseline topics and find each topics in all publication baseline in SDL repository and if any item found in other baseline considering as that topic is reused.  for 50-100 topic publication it takes 2-4 hours to get the response. some of the publication have 1000+ topics, so this approach is not useful. Is there any way to find reuse in ISHRemote or any other modules?

 

Thanks

Roopesh

  • There are is some report functionality on the web client (ISHCM) but as far as I know, the API doesn't offer similar functionality. Which means that ISHRemote doesn't offer it either.

    PowerShell is not fast when data processing happens. I would measure time needed for the same processing implemented in e.g. .NET, and if that is satisfactory then I would switch to that. Otherwise consider this as a scheduled long executing process.

    If you need to create reports from powershell, there are a couple of ways that combine different technologies. I used markdown with the help of MarkdownPS module that I created. If you are interested for such a path take a look in these posts

    - sarafian.github.io/.../ (All markdownps related posts)
    - sarafian.github.io/.../simple-markdown-web-server-for-windows.html (how to easily host markdown pages)

    It works very nice and easily for a dashboard page I've created.
  • Hi Roopesh, the 2-4 hours is a lot for the size of publications you refer to. Can you extend your one paragraph with some codes (or perhaps pseudocode) on your thoughts. Also at some point - depending on your data sizes - you should consider intermediate storage to track counts of all those topic versions versus baselines.
  • Hi Roopesh,

    I don't know if I understood you correctly, but it sounds as if you are downloading all baselines in the repository to get at the information you are looking for. That for sure will take a lot of time.

    As Dave said, you basically need some form of intermediate storage here. In terms of (hopefully) useful pointers, I am thinking along the lines of:

    • Initially download all baseline reports, store them as [baseline_id].xml in a local directory.
    • Get basic metadata (Baseline 2.5 GetMetaData) for all baselines, including the fields (the first time around you can combine this with step #1):
      • MODIFIED-ON
      • FISHLABELRELEASED

    At the end of this exercise you have all the data available locally, so processing will be fast. Processing the data to yield the reports you need should not be a problem.

    A little while later it is time to update the reports. Baselines that are released will no longer change, so those never have to be downloaded again. So we can cut that chunk out right away - we simply keep what we already have. For all other baselines, download just the baseline metadata. Chances are only a small subset of active baselines have changed since the last iteration, and those are the only ones you need to download again. The metadata is just a few lines of XML per baseline, so that should not take long.

    You'd have to set up a simple tracking mechanism, of course. But all of this could be done using just a local folder on your computer where you donwload the stuff and put your tracker file (JSON or whatever).

    PS: Then there is the question of how you define reuse - I mean how you turn this information into an easy-to-understand format for say C-level managers who are not familiar with modular documentation. There are "Lies, damned lies, and statistics", this is certainly true for reuse reports, too. Oh, a topic for another time...

    HTH

    Joakim

  • Yes, this is really a good Idea. this will save API response time. Thank you for the suggestion, I have to create a script to download all baseline.