A Transformers Analogy
Optimus: [offers the Matrix to Sentinel] You were our leader, Sentinel. it is your right to lead us again.
Sentinel: In a world I do not know, I am no longer your teacher, Optimus, you are mine.
Apparently there was no agreement of which Prime should be the leader. The reason is simple, there was no majority.
A highly available Redis needs a leader and a leader needs a majority to be elected. Use an odd number of instances (minimum 3) to make election possible.
We are going to take a look at the 2 popular options of providing the above, Redis Sentinel and Zookeeper. On a high level, these 2 options works in similar ways
- Separate processes providing monitoring of the actual Redis instances
- One master / leader and multiple slaves / followers
- Master / leader is taking write requests and slaves / followers take reads (load balancing)
- Should master / leader fail, one of the slaves / followers will be elected to be the new master / leader (fail-over)
Redis Sentinel
I like the word "sentinel", which literally means Guard. Coming back to the analogy of Transformers, Sentinel Prime was the leader of the Cybertron Elite Guard. In the medical world, the word means "an indicator of the presence of disease". These are exactly what Redis Sentinel does.
What else does one expect with Redis Sentinel apart from a healthy Redis!
According to Redis Sentinel official documentation, it provides the following features
- Monitoring. Sentinel constantly checks if your master and slave instances are working as expected.
- Notification. Sentinel can notify the system administrator, another computer programs, via an API, that something is wrong with one of the monitored Redis instances.
- Automatic failover. If a master is not working as expected, Sentinel can start a failover process where a slave is promoted to master, the other additional slaves are reconfigured to use the new master, and the applications using the Redis server informed about the new address to use when connecting.
- Configuration provider. Sentinel acts as a source of authority for clients service discovery: clients connect to Sentinels in order to ask for the address of the current Redis master responsible for a given service. If a failover occurs, Sentinels will report the new address.
Client Libraries
ServiceStack.Redis
ServiceStack even created a quick Sentinel configuration project in its github and provide an easy guide for us to quickly set it up.
StackExchange.Redis
StackExchange.Redis is a free open source client library. At this moment of writing, it does not have official support on Redis Sentinel. The community has made the effort of contributing to this open source project and there are several pull requests submitted (an example is here). However it has not been merged into master. So we still expect some time before the support is put into a release branch.
You can still specify multiple Redis instance endpoints to initialise the connection, and in the background StackExchange.Redis will distribute the read and write operations automatically. However you will get exceptions when the operation hits a Redis instance and that instance failed.
SDL Web 8.5 .NET Client Interaction Libraries (nuget) depends on StackExchange.Redis (the strongly named version) in its implementation of RedisCacheHandler. We can safely conclude that you will not get Redis Sentinel support for Redis client side caching in Web 8.5.
Jedis
Zookeeper
The concept of Zookeeper came even before Redis Sentinel, and not restricted to Redis. You may have been familiar with it already if you have set up Solr Cloud and use SI4T (Search Integration 4 Tridion) to serve your website search requirements.
In the context of Redis, it works in very similar ways as Redis Sentinel. A typical setup still includes 3 Zookeeper nodes (1 leader, 2 followers - just different terminology from master and slave), managing 3 Redis instances. When Zookeeper nodes agree that a leader fails, they will elect one of the followers to the leader position and other followers get notified of the new leader.
Client Libraries
More "serious" usage of Redis
Twemproxy
Essentially Twemproxy is one of the ways of proxying requests to the backend Redis servers or Redis Sentinels. It was developed and used by Twitter and you can imagine the size of the twitter data and the number of client-side servers that made this necessary. It mainly does two things
Reducing number of connections
Automatic sharding
Redis Cluster
Since Redis 3.0, Redis Cluster is a recommended way of doing data sharding. But note that, although it will improve the Redis reliability, Redis cluster itself is not a high availability solution.
To acheive higher availability, in a Redis Cluster you can have multiple masters each of which has multiple slave replicas to provide failover. Each master-slave set holds a section of the data. Since there needs to be majority of working masters to elect slaves into new masters, it will stop working when majority of masters failed.
A nice feature of a Redis Cluster comparing to Twemproxy is that, when the data size changes or a node fails (basically some Redis keys needs to move from one node to another), it provides node re-balancing without downtime. This is made possible by its introduction of hash slots where their movements from one node to another does not need operations to be stopped. Consistent hashing based clients including Twemproxy, on the other hand, would cause some downtime when doing node re-balancing.
Conclusion
Here you go. We have introduced some concepts of High Availability Redis using Redis Sentinel and Zookeeper, then additional data sharding and proxying to further increase the overall reliability. Either way, currently SDL Web 8.5 CIL does not support High Availability Redis well. However after all it is for client-side caching, and in case the cache fails, with your code written in a nice way, the site should still be functional. If you site is implemented based on DXA 1.7, you should not have ended up with an broken site, at least in a "normal" load.
If a highly available cache is indeed your absolute requirement, my suggestion is to build it in your test plan, so you have an idea how the microservices stand with load in the absense of Redis. Effective monitoring and notifications need to be set up so that you can fix the failed Redis in time.
Outside SDL CIL, you can achieve highly available Redis Sentinel with mature client side libraries such as ServiceStack.Redis and Jedis. The flexibility is yours.