Downloading large outputs using Get-IshPublicationOutputData is slow, is there anything I can do to speed it up

I am currently using Get-IshPublicationOutputData to download output from a bunch of publications. I use this with powershell's foreach-object -parallel, which executes jobs in parallel without waiting for the current item in the for loop to complete. This works quite well for outputs where are relatively small, less than 100MB. I can download 300+ distinct outputs in a couple of minutes.

However, we also have some outputs that are 1GB+, and when trying to download those outputs using Get-IshPublicationOutputData, it is painfully slow to the point of being unusable. Is there anything I can do to speed this up?

emoji
  • Hi Mic-uh

    The magic behind Get-IshPublicationOutputData is an HTTP request to the server, probably a secured SOAP request depending on your server-version. In turn all your parallel requests arrive at the application server to execute on, for this the server needs to have CPU/Memory available. The application server goes to the database server to retrieve your small-medium-large files, for this the server needs to have CPU/Memory available. Bigger requests (stretches of 1GB of memory) take longer than small requests. And on these servers, you are not alone, you have colleagues who potentially are doing the same or worse Slight smile

    So as often in IT, there is a next bottleneck... typically maxed out CPU, or memory or I/O throughput. The system will cope with queueing and retries, but at some point you hit an absolute timeout.

    Based on the above description for your environment you'll need to try to find the balance in how many requests the server can run in parallel... leave some room for your colleagues... often for throughput less parallel could still mean you arrive at the finish line faster.

    Best wishes,
    Dave

    emoji
  • Hey Dave, thanks for the answer. Is there some fundamental between the API call and going into the web GUI and download the publication?

    emoji
  • fundamental difference*

    emoji
  • By default, the -Parallel parameter in PowerShell's ForEach-Object cmdlet sets the maximum number of parallel jobs to the number of available logical processors. You can try increasing this value to allow for more concurrent downloads. For example, you can set it to a specific number like -Parallel 10 to have up to 10 parallel downloads at a time. Be cautious not to set it too high, as it may overwhelm the system.

    basket random

    emoji
  • Yes there is a difference, and it was tuned over the various Tridion Docs versions. In essence the Public SOAP API has no easy notion of streaming binaries from the database server over the web/application server over the network to the client (your PowerShell session). So it does chunked retrieval, retrieving blocks of data. The chunks can be controlled client side (over $ISHRemoteSessionStateIshSession.ChunkSize defaults to 10485760 bytes so 10Mb) but again the chunks mostly affect the server, so requesting larger chunks pressures the shared web/app server eventually leading to Out-Of-Memory errors.

    The web browser, and Private web API behind it, do have better streaming capabilities end-to-end. So a web browser will show you a continuous stream of incoming data in the progress bar; while in Publication Manager you see the progress bar move in chunks/jumps. 

    emoji
  • The magic of -Parallel happens on the client machine, your laptop. Making your client machine trigger 10 parallel requests (e.g. 10 times 'Print a 1000 page PDF') will not make the server go faster in handling 10 requests (of 'Print a 1000 page PDF'). Just trying to be clear that doing 10 simple requests in parallel on the client goes fast... but the actual execution is on the server which has other limitations.

    emoji