So many changes since the last post, and maybe the best one is that my English should improve slightly since I’m in San Francisco for more than one year, working at PunchTab.
First of all, to set the context, PunchTab provides a loyalty program for website publishers. After installing a code snippet on their website, their users earn points for their visits, likes, tweets, +1’s and comments. Then these points can be redeemed toward prizes like Starbucks coupons. Long story short, I’ve installed the PunchTab loyalty program on this blog to let you see.
As a company, PunchTab explicitly focuses on building great product, and performance of the product is a big part of the experience. Redis has always helped improve the performance of part of our loyalty program.
But after having used it as a basic cache engine, our needs have made us use it more extensively. Also this post will present the main features of Redis and our use cases. The power of Redis is its data structures. You can find all the documentation for them in the command documentation.
To follow the Redis documentation, I’m gonna starts with key commands. All the data you store are referenced by a key. You can set or remove an expiration time on each key, search for them using basic regex, delete them, etc. All the classic stuff for a cache engine.
The first data structure we have used is the string. That’s what Django uses when you set Redis as the cache backend. It’s what is commonly used in cache engines. We are using it to cache Django views, to store some results which are heavy to compute. Nothing new on this side.
Where it starts to be interesting is that a string can be considered as an integer or a float. Redis provides INCR and INCRBY commands to increment these values (and respectively DECR and DECRBY). Even more interesting, it returns the new value. You can then define really efficient counters and that’s what we are using to synchronize parallel tasks while building our leaderboard.
If you want to store an object or dictionary that you will access entirely at once, you will use a hash. It will be more efficient than storing each attribute to a different key. You can consider a hash as a dictionary of strings (so you can increment a particular field for instance). You can get the whole stored dictionary for a key with HGETALL or just a particular field with HGET. That’s what we use for our leaderboard to store the last activity of each user (the field) for each publisher (the key) as a JSON serialized string (the value). We could have used the pair publisher/user as a key and avoid the serialization, but the purpose was to avoid making grow the number of keys.
A set is an unordered list where each value is unique. You can blindly add an element with SADD, if it already exists, it will not be duplicated. Then you can get the content of a set with SMEMBERS, but even better, you can check if a element is in a set with SISMEMBER. Finally you have the usual set operations like union, intersection and diff. We’re using this to store the list of opted out users. Indeed a publisher can remove some users from his loyalty program by opting them out. We needed a really fast way to check if a user is opted out or not while storing an activity. So instead of checking a SQL database or getting a cached list entirely to then make something like « if user in opted_out_users » on the django side, we save time by directly asking redis if the user is opted out with SISMEMBER.
They’re called ZSET in Redis. We started to use this when we needed to optimize the leaderboard. And actually ZSETs are leaderboards. A ZSET contains unique members like a SET with a score associated to each member. For each member, you can set (ZADD) or increment their score (ZINCRBY). And then, you can retrieve the score (ZSCORE) and the rank of a member (ZRANK), or get a range of members with their score and relative rank (ZRANGE). It becomes easy to get a TOP 10 or get the surrounding members of a particular user. Everything we needed for our leaderboard.
I was about to forget a really useful command to see what’s happening: monitor. Type this in your redis client and you will see all commands run on your server. Perfect while developing or to see what’s happening in production for a short time.
I have only talked about what we are using inside PunchTab but there are some other interesting features like the Publisher/Subscriber pattern which I’ve tested really quickly to implement a chat system with socket.io and GEvent during a tutorial at Pycon this year.
The last two important things I would want to highlight are the in-memory model and complexity. As I’m used to say, the complexity may be the most important thing I’ve learned at school because the biggest problems I’ve encountered so far were related to this. Fortunately, each Redis command is documented with its complexity. To me, it shows that the developers really know their business and you can have a good idea how your Redis will scale or how to architecture it (like splitting logically your data on different servers). Concerning the in-memory model, you have to know that Redis stores everything in memory, syncing to disk for restarts or crashes. To keep it efficient you obviously cannot let it swap, so always be careful to use the best suited data structure to be fast and consume the less memory. For instance, a hash with 20 fields will be more efficient than 20 keys in terms of memory. Just have a look to the documentation to see that it’s far more than a cache engine. It’s an in-memory database which is really well suited for specific purposes, like leaderboards.