Need help extracting a large number (8-25k) of files from older SDL (11.1)

Hi all from an SDL novice - my group needs to extract a large number of files (up to 25k) out of SDL (and older version, I believe 11.1?) so that it can be placed on an FTP server. We have a tight timeline and budget and don't know of a script or quick way to extract more than a small number at a time (maybe 20 files?) manually.

Does anyone have any experience with this or know of a script that can be used? Any help or input is appreciated. Thanks

Parents
  • Hey

    Not sure if you imply all versions/languages/resolutions of these objects, but what some clients have done is to create a temporary/dummy publication in which they insert the maps that reference the objects they need. Possibly by creating a temporary 'supermap'.

    Make sure you select the appropriate versions of the objects so that the baseline is up to date.

    Assuming you need clean DITA XML (not file system resolved), you could then use the publication report in the web client to export the data to the file system. In this publication report you can specify the language(s) that you want to export.

    Depending on the structure of your data, it may make sense to create a couple of such dummy publications.

    If you use variables/conref, you have to make sure they are part of your publication obviously.

    Best regards
    Kurt

Reply
  • Hey

    Not sure if you imply all versions/languages/resolutions of these objects, but what some clients have done is to create a temporary/dummy publication in which they insert the maps that reference the objects they need. Possibly by creating a temporary 'supermap'.

    Make sure you select the appropriate versions of the objects so that the baseline is up to date.

    Assuming you need clean DITA XML (not file system resolved), you could then use the publication report in the web client to export the data to the file system. In this publication report you can specify the language(s) that you want to export.

    Depending on the structure of your data, it may make sense to create a couple of such dummy publications.

    If you use variables/conref, you have to make sure they are part of your publication obviously.

    Best regards
    Kurt

Children
  • Thanks for your info Kurt! Bear with my novice question: will creating that dummy publication/supermap allow me to extract the thousands of content files at once (even if that means creating 2 dummy publications?). And then a script is not needed?
  • And yes, I believe all version/languages of the objects as there are few in use, if that makes sense.
  • Hey Susan,

    the report is available in the Web Client > publications folder > select a publication version > click "Reports"". Then you get a dialog where you can specify a resolution for the images and a language. Then click the "Show Report" button. A new window will appear with all objects listed. At the top of the screen, you can then click "Export Publication". This starts a background process and exports all objects in one go onto the file system on the server.

    There are some things to consider:
    - if you need multiple resolutions, you should run the report once for each resolution (XML files will be exported as well, but you can igore these).
    - you need to repeat this per language
    - only the versions specified in the baseline get exported
    - don't make the dummy publication too large as the loading of the report in the browser may take a while
    - by creating a couple of dummy publications you will be able to 'organize' the objects you export in a somewhat meaningful manner

    The above does not require a script as it uses existing functionality and some manual work to create and export the publications.

    Alternative is to create a custom script using the KC APIs. But in that case you also need a way to identify the objects you need.

    Hope this helps.

    Best regards,
    Kurt
  • Thanks so much for your details and time Kurt. I am going to share this with our IA and see what we can do. My initial goal was to find a script or a script how-to for this giant extraction, but maybe this method can avoid this. If ok with you, I will circle back later with either a follow up that yes, this will work, or might see what other info I can get from your team. Thx again
  • Susan,

    Couple things to consider/balance.
    Creating a publication or series of publications that contain 25,000 objects or more could take some considerable time.
    If all the objects are already linked into maps, then you can fairly easily link all the maps into a new master map for a publication.
    If the objects are not linked into maps then you may spend more time getting everything set up in publications/maps then you would creating a script with the API.
    The benefit of a script that is written against the API is greater flexibility to make adjustments around what content you extract and the script is reusable for other sets of content.
    If you are going the route of creating publications and you want to grab more content you are always manually updating the pubs which is a very manual process vs. changing config on the script and executing.
    If you could explain how you are going to identify the content that needs to be extracted, we can better point you in one direction or another.
    For example:
    - Are you grabbing all the content from a root folder and all its the subfolders recursively?
    - Is it all content that has certain metadata set
    - All content created after or before a certain date
    - Any combination of the above.
    These are all scenarios that could be scripted with API.
    If you are going to pick and choose the objects, then there is really no good way to use the API and the publication route is your only option.
    Hope this helps.
  • Hi ,
    With regard to your comment about clean DITA XML. As per my understanding (please comment/correct as needed) there are two ways to export the content:

    1) As resolved DITA XML, in which case the exported source would not contain any ishcondition or conref attributes, to mention two examples. In this export, elements with ishcondition attributes are kept/dropped based on the publishing context. Content references are populated (and so there is no content reference in the resulting XML). Variable placeholders are populated. And so on. (The result is valid DITA XML).

    2) As SDL custom DITA, that is, the exported content directly reflects that in the database. To use this content in another system would require some reverse engineering, e.g. implementing ditaval instead of ishcondition. And implementing keyrefs instead of variables.

    Or, then again, another system may introduce some spices of its own that in some way or other extend the core DITA feature set, as laid out in the OASIS specification(s).

    , using either alternative there is some rework to be done.

    Thanks
    Joakim