Features for finding fuzzy matches of content in the SDL CCMS to facilitate and drive content reuse

Both in terms of generating metrics and supporting identifying fuzzy matches and refactoring content for reuse, I think it would be helpful to hear about any plans to integrate Acrolinx's capabilities more into SDL Tridion DX. Right now, Acrolinx is just available in SDL Knowledge Center Content Editor (which I very much regret we don't have). But it would be really useful to have Acrolinx more integrated also in the SDL authoring bridge for SDL Tridion Doc. I remember an initial highlight announcement of integration into SDL a couple years ago, but not too much in particular since. People in my org are interested in a combination of Acrolinx with Schematron as part of our quality assurance, and so I've been looking into the possibilities.

We have Acrolinx client plug-ins now for our writers, but that doesn't help us much beyond the initial authoring moment. And Acrolinx has a bunch of really useful things that are not SDL-aware, such as Acrolinx batch-checking of a set of content. I can see batch-checking being useful to run on subsets of content in the CCMS. I can understand your reluctance enable batch-checking across the entire CCMS, due to system load, just as with data warehouse operations in the CCMS generally. But Oracle DBs, at least, give you a lot of options to ship data onto a data warehouse server without affecting your OLTP database operations with the CCMS, by using Data Pump, or transportable tablespaces, or other options. Maintaining concurrency between multiple data instances to support both DSS and transaction processing is certainly a database feature.

This is the information I have regarding Acrolinx with SDL currently:


As I understand it, there is currently no standard feature for batch-checking content in SDL Knowledge Center, and I didn't hear Acrolinx come up at all on the Roadmap. I also have not heard of any plans to integrate Acrolinx's reuse module.


We may be able to do some things on the Acrolinx server end too, of course (sentence bank/reuse repository, which is now just within our reach with Acrolinx 4.3), but it would be interesting to know if you have plans to leverage your partner vendor's capabilities.

Both in terms of generating metrics and supporting identifying fuzzy matches and refactoring content for reuse, I think it would be helpful to hear about any plans to integrate Acrolinx's capabilities more into SDL Tridion DX. Right now, Acrolinx is just available in SDL Knowledge Center Content Editor (which I very much regret we don't have). But it would be really useful to have Acrolinx more integrated also in the SDL authoring bridge for SDL Tridion Doc. I remember an initial highlight announcement of integration into SDL a couple years ago, but not too much in particular since. People in my org are interested in a combination of Acrolinx with Schematron as part of our quality assurance, and so I've been looking into the possibilities.

We have Acrolinx client plug-ins now for our writers, but that doesn't help us much beyond the initial authoring moment. And Acrolinx has a bunch of really useful things that are not SDL-aware, such as Acrolinx batch-checking of a set of content. I can see batch-checking being useful to run on subsets of content in the CCMS. I can understand your reluctance enable batch-checking across the entire CCMS, due to system load, just as with data warehouse operations in the CCMS generally. But Oracle DBs, at least, give you a lot of options to ship data onto a data warehouse server without affecting your OLTP database operations with the CCMS, by using Data Pump, or transportable tablespaces, or other options. Maintaining concurrency between multiple data instances to support both DSS and transaction processing is kind of an Oracle thing. And though you are on AWS by default, that could be with Oracle DB instances.

This is the information I have regarding Acrolinx with SDL currently:


As I understand it, there is currently no standard feature for batch-checking content in SDL Knowledge Center, and I didn't hear Acrolinx come up at all on the Roadmap. I also have not heard of any plans to integrate Acrolinx's reuse module, which, in my new role as the reuse evangelist for my group, I think could be of real assistance:


We may be able to do some things on the Acrolinx server end too, of course (sentence bank/reuse repository, which is now just within our reach with Acrolinx 4.3), but it would be interesting to know if you have plans to leverage your partner vendor's capabilities.

  • Oracle Text is another option. Oracle Text is capable of using SQL queries on an Oracle database to categorize fuzzy matches inside character large object data (CLOBs—documents and images) into category sets, and to find strings using SQL queries. I think the best option would be to perform an Oracle Data Pump operation to copy the DB to an external data warehouse repository, and then to run indexing there. But I know AWS is the common DB default for many of your customers. There are other tools out there that could manage similar operations (Oracle or 3rd party) using Hadoop/Hive, though of course we are talking about operations external to SDL at that point.
  • Hi Doug,

    Thank you for your suggestions and insights.

    I want to correct one statement: SDL Knowledge Center and SDL Tridion Docs both support the Acrolinx Plugins for XMetaL Author and oXygen Author. Both authoring tools support batch capabilities with DITA Maps checked out from the repository. I have added a comment to the article on the Acrolinx site.

    Best regards
