Putting Redis to Production - Part II - Availability

A Transformers Analogy 

Optimus: [offers the Matrix to Sentinel] You were our leader, Sentinel. it is your right to lead us again.
Sentinel: In a world I do not know, I am no longer your teacher, Optimus, you are mine.

Apparently there was no agreement of which Prime should be the leader. The reason is simple, there was no majority.

A highly available Redis needs a leader and a leader needs a majority to be elected. Use an odd number of instances (minimum 3) to make election possible. 

We are going to take a look at the 2 popular options of providing the above, Redis Sentinel and Zookeeper. On a high level, these 2 options works in similar ways

  • Separate processes providing monitoring of the actual Redis instances
  • One master / leader and multiple slaves / followers
  • Master / leader is taking write requests and slaves / followers take reads (load balancing)
  • Should master / leader fail, one of the slaves / followers will be elected to be the new master / leader (fail-over)

Redis Sentinel

I like the word "sentinel", which literally means Guard. Coming back to the analogy of Transformers, Sentinel Prime was the leader of the Cybertron Elite Guard. In the medical world, the word means "an indicator of the presence of disease". These are exactly what Redis Sentinel does.

What else does one expect with Redis Sentinel apart from a healthy Redis!

According to Redis Sentinel official documentation, it provides the following features

  • Monitoring. Sentinel constantly checks if your master and slave instances are working as expected.
  • Notification. Sentinel can notify the system administrator, another computer programs, via an API, that something is wrong with one of the monitored Redis instances.
  • Automatic failover. If a master is not working as expected, Sentinel can start a failover process where a slave is promoted to master, the other additional slaves are reconfigured to use the new master, and the applications using the Redis server informed about the new address to use when connecting.
  • Configuration provider. Sentinel acts as a source of authority for clients service discovery: clients connect to Sentinels in order to ask for the address of the current Redis master responsible for a given service. If a failover occurs, Sentinels will report the new address.
A typical set up of Redis Sentinel is to have 3 Sentinels (with 3 Redis instances), 1 to manage a master Redis instance and 2 to manage 2 slave Redis instances. You can have both Redis Sentinel and the Redis instance it manages on the same server. However if you are concerned with the possibility of the server failure that would bring down both Sentinel and Redis, you can have separate servers for Sentinels. The Redis Sentinel official documentation gives and analyses different set ups.
 
When you set up Redis Sentinels, part of the configuration is the "quorum". This is the number of Sentinels that have to agree that the master Redis instance has failed before the failover process starts. Once the failover process has started, the rule of "majority" kicks in to elect the new master.

Client Libraries

ServiceStack.Redis 

This is a commercial product with free quota of 20 Types (think about using Redis to store your strongly typed C# objects) and 6000 requests per hour with the Redis Client. 
 
ServiceStack.Redis has a nice Sentinel support. The idea is that, instead of creating a Redis client connecting to Redis instance directly, you create a RedisSentinel instance specifying those Sentinel endpoints, and the master endpoint. This allows Sentinel to be a true configuration provider and the client application does not need to know where the Redis instances are, or how and which slave is promoted to master during the failover.

ServiceStack even created a quick Sentinel configuration project in its github and provide an easy guide for us to quickly set it up.

StackExchange.Redis

StackExchange.Redis is a free open source client library. At this moment of writing, it does not have official support on Redis Sentinel. The community has made the effort of contributing to this open source project and there are several pull requests submitted (an example is here). However it has not been merged into master. So we still expect some time before the support is put into a release branch.

You can still specify multiple Redis instance endpoints to initialise the connection, and in the background StackExchange.Redis will distribute the read and write operations automatically. However you will get exceptions when the operation hits a Redis instance and that instance failed.

SDL Web 8.5 .NET Client Interaction Libraries (nuget) depends on StackExchange.Redis (the strongly named version) in its implementation of RedisCacheHandler. We can safely conclude that you will not get Redis Sentinel support for Redis client side caching in Web 8.5.

Jedis 

A free Java client library that supports Redis Sentinel. Use JedisSentinelPool passing the master and other sentinel endpoints. Documentation is not great but it is a library with really small footprint and easy to use.
 
SDL Web 8.5 Java Client Interaction Libraries uses Jedis for Redis caching. However the options on cd_client_conf.xmlare limited to Redis host names, and I doubt that the implementation would use support Redis Sentinel. I have not investigated different un-documented options in depth in case there is any, but if it is not in the documentation, it does not exist!

Zookeeper

The concept of Zookeeper came even before Redis Sentinel, and not restricted to Redis. You may have been familiar with it already if you have set up Solr Cloud and use SI4T (Search Integration 4 Tridion) to serve your website search requirements.

In the context of Redis, it works in very similar ways as Redis Sentinel. A typical setup still includes 3 Zookeeper nodes (1 leader, 2 followers - just different terminology from master and slave), managing 3 Redis instances. When Zookeeper nodes agree that a leader fails, they will elect one of the followers to the leader position and other followers get notified of the new leader.

Client Libraries

There are many Zookeeper client libraries such as Apache Zookeeper (Java and .NET). I have yet found a good Redis client working with Zookeeper except for this Ruby client called Redis_Failover. The lack of Java and .NET client libraries makes Zookeeper a second choice to me when I want to choose a high-availability Redis solution in the context of SDL Web. However, since Zookeeper is a generic technology to achieve high availability as apposed to just restricting to Redis, you may want to unify your solution by writing your own Redis client library, communicating with Zookeeper using existing Zookeeper libraries.

More "serious" usage of Redis

If you have a very large Redis database with many clients connections, you may want to add the following two things into your architecture picture. I had considered these in projects, but had not had any real-life project that makes using of them worth it. It doesn't hurt to write some theories, though :).

Twemproxy

Essentially Twemproxy is one of the ways of proxying requests to the backend Redis servers or  Redis Sentinels. It was developed and used by Twitter and you can imagine the size of the twitter data and the number of client-side servers that made this necessary. It mainly does two things

Reducing number of connections

Twemproxy can "pipeline" and forward requests from different clients in batches to the backend Redis. This results much less connections and increase the throughput and backend reliability, even though it adds some overhead to the client-server round-trip.

Automatic sharding

Before Redis Cluster was Production ready, people relied on client-side sharding for large datasets (i.e. some data is served from and written to one Redis server and others are from another). Twemproxy was one of the proven ways of providing sharding. The proxy will automatically forward the request to the right shard.

Redis Cluster

Since Redis 3.0, Redis Cluster is a recommended way of doing data sharding. But note that, although it will improve the Redis reliability, Redis cluster itself is not a high availability solution.

To acheive higher availability, in a Redis Cluster you can have multiple masters each of which has multiple slave replicas to provide failover. Each master-slave set holds a section of the data. Since there needs to be majority of working masters to elect slaves into new masters, it will stop working when majority of masters failed.

A nice feature of a Redis Cluster comparing to Twemproxy is that, when the data size changes or a node fails (basically some Redis keys needs to move from one node to another), it provides node re-balancing without downtime. This is made possible by its introduction of hash slots where their movements from one node to another does not need operations to be stopped. Consistent hashing based clients including Twemproxy, on the other hand, would cause some downtime when doing node re-balancing.

Conclusion

Here you go. We have introduced some concepts of High Availability Redis using Redis Sentinel and Zookeeper, then additional data sharding and proxying to further increase the overall reliability. Either way, currently SDL Web 8.5 CIL does not support High Availability Redis well. However after all it is for client-side caching, and in case the cache fails, with your code written in a nice way, the site should still be functional. If you site is implemented based on DXA 1.7, you should not have ended up with an broken site, at least in a "normal" load.

If a highly available cache is indeed your absolute requirement, my suggestion is to build it in your test plan, so you have an idea how the microservices stand with load in the absense of Redis. Effective monitoring and notifications need to be set up so that you can fix the failed Redis in time.

Outside SDL CIL, you can achieve highly available Redis Sentinel with mature client side libraries such as ServiceStack.Redis and Jedis. The flexibility is yours.