Patrick Desjardins Blog
Patrick Desjardins picture from a conference

Redis Experimentation with Full List Cache against using Redis Sorted List

Posted on: 2015-10-15

I am improving the performance of a system right now with Redis and the library StackExchange. Onne particular case was that I needed to cache a list of data that are ordered by rank from a value that change often. One requirement is that it's possible that two items can have the same rank. For example:

 Rank - Data - Value 1 - User 1 - 100 2 - User 2 - 99 3 - User 4 - 99 4 - User 5 - 97 

The data is in fact a serialized object that contains multiples classes. For not making this article too heavy, I will just use a string. Nevertheless, keep in mind that this is not a simple string unique identifier. The value column is required by Redis when using the Sorted List. In reality, this value is inside the data, in a property.

This is also information that must be paged because the list can go around five thousand entries.

The first step was to measure the time when using the database. I got an average of 264ms per query for 20 items on a set of 200 items. The database contains thousand of entry, the page is using a clause to filter down depending of other criteria defined inside the data (inside the class that we serialize). The next step was to use Redis as a simple cache -- once we get the result of the database we store it for few times. The first hit will have the same average, because it goes to the database, but the subsequent request will go inside Redis instead of the database. This was producing an improvement of 50% faster, with 125ms in average. The key was determined by the type of list, by the filter attribute and the page number. For example, "MyListOfObjectXXX_PartitionYYY_Page_1". The speed was interesting for me, I was aiming around 100 ms but I was satisfy with the result. The time also contains the time to deserialize the object to create a generic list of all 20 results. I count the deserialization process time in my benchmark because I was counting the ORM time to instantiate the object too. My concern with that solution is that every object can change its value at any time. The value does change the rank by consequence. Since I am also caching the data with a separate key for each instance, I duplicate this information in the cache. The size of the cache can be a problem in the long run, but the bigger problem is that the information become desynchronize. In fact, the source of truth is the individual cached version in the system. It looks like this : "MyData_Key_1". I set an expiry because this is not the real source of data. I will not invalidate that data like the rest of the software when values change from the entity. I will let them expire and than change it. It means that a user that drill down from the list can get an up-to-date data. This is the cost to pay (so far) for a one minute delay.

 db.StringSet(MyListOfObjectFoo_PartitionRed_Page_1, myListOfDataForPage1, TimeSpan.FromMinutes(1)); 

To overcome this issue, Redis offers to be able to store an ordered list that is sorted by a value. What is interesting is that the value can be the same which will produce the same rank. So far, this is exactly the answer of the problem. However, that solution does not fix the problem of having to duplicate the data in the cache. The sorted list solution can query by range, so it's interesting for paging, but not by unique key. Thus, it solves only the problem of having desynchronized value since I can push easily in the sorted list an entry in a specify (updated) rank.

 // Initial push db.SortedSetAdd("MyListOfObjectFoo_PartitionRed_Page_1", new[] { new SortedSetEntry("User 1",100), new SortedSetEntry("User 2",99), new SortedSetEntry("User 3",99)});

// Later when one entity change with a value of 100. This will produce two rank 1. db.SortedSetAdd("MyListOfObjectFoo", objectToCache, 100); 

This was surprising in many ways. First of all, the main problem was that if you have several same ranks that it is not possible to have a second ordering value from the object. You are stock with value you set which is a double. This allow you to do some mathematics trick but if you would like to sort by alphabetic order than you need to manually in C# do your second sort. I didn't go more deep with that solution because of the second problem. The second bigger problem was the performance. To get the information, you use the get by range method.

 db.SortedSetRangeByRank("MyListOfObjectFoo", 1, 20) 

From that, you need to loop and deserialize all values which is the same tax to pay that we have when caching the whole page in 1 Redis key-value entry. However, the performance was disastrous. My average on three run was 1900ms. This was really surprising. I double check everything because it wasn't making any sense to me. My initial hypothesis was that this was highly optimized for this kind of scenario -- I was wrong. However, the fault is not Redis. After some investigation, I found that the serialization, done with Json.Net library, got some harder time deserializing 20 times a very complex objects than a list of 20 objects. This is mostly because when serializing a list, if the complex object has already a reference that this one is not serialized again but use a reference system. For example, instead of having a deep object, Json.Net will use "$ref": "20". This has a huge impact in performance.

I finally decided to optimize my model classes and have a more light classes for this page. Instead of using a list of objects that has a lot of sub-rich objects, using a simple list of a basic class with properties did an awesome job. The list that was taking 1900ms to get from Redis and deserialize is not taking less than .17 ms. That is right and not a typo, it is less than a single millisecond.

I am still learning how to maximize the use of Redis and so far like the flexibility that it offers compared to Memcached that I used for more than a decade. So far it's interesting and will keep you inform with any new optimization I can find. In short term, I think a solution may be to cache not the whole complex object but just a part of it in an aggregate view of objects.