Kangaroo: A brand new flash cache optimized for tiny objects

What the research is:

Kangaroo is a new flash cache that enables more efficient caching of tiny objects (objects ~ 100 bytes or less in size) and overcomes the challenges of existing flash cache designs. Since Kangaroo is implemented in CacheLib, Facebook’s open source caching engine, developers can use Kangaroo via CacheLib’s API to create their own custom cache services.

Our work on Kangaroo won a Best Paper Award at 2021 Symposium on Operating System Principles (SOSP) conference.

CacheLib’s API enables developers to create and customize concurrent caches.

Optimized for tiny objects, Kangaroo minimizes dynamic random access memory (DRAM) usage and the number of writes – and introduces a new cache eviction policy that reduces cache misses with minimal DRAM overhead and the load on back-end storage systems further reduced.

Kangaroo’s CacheLib implementation also enabled our researchers to quickly assess the impact of the Kangaroo design on real-world production workloads. We rated Kangaroo based on traces from social graph caching workloads in production on Facebook and Twitter. Our experiments show that Kangaroo reduces misses under realistic system constraints (DRAM and write rate constraints) by 29 percent compared to previous state-of-the-art flash cache designs. These results are confirmed by a test deployment of Kangaroo in a shadow production setup.

Miss ratio for all three systems via a seven-day Facebook trace. Kangaroo reduces cache misses by 29 percent compared to sentence-associative (SA) caches and by 56 percent compared to log-structured (LS) caches.

How it works:

Current flash caches can be divided into two main categories: sentence-associative caches and log-structured caches. Set-associative caches write many more bytes than necessary, since they have to write a 4 KB Flash page for each object – the minimum write granularity. For tiny objects around 100 bytes or less in size, this leads to significant write gain (i.e. set-associative caches write ~ 40x more bytes than absolutely necessary).

Log-structured caches have a large DRAM overhead, since they have to keep an index entry for every on-flash object. Therefore, neither set-associative nor log-structured flash caches are well suited for small object flash caches.

Kangaroo combines log-structured and set-associative caches to reduce both DRAM and flash write overhead. Kangaroo consists of two main parts: KLog, a small log-structured flash cache, and KSet, a large set-associative flash cache.

Lookups first check the DRAM cache (1). If the object is not in the DRAM cache, requests check the index of KLog (2a) and then read the object from the flash if the object was found in the index (2b). Finally, requests check the bloom filter for each record in KSet (3a) and read the record as to whether the object could be in KSet (3b).
When inserting, Kangaroo first inserts objects into the DRAM cache (1). Objects removed from the DRAM cache are either discarded by a preflash permission policy (2a) or added to KLog’s DRAM index (2b) and appended to KLog’s flash log (2c). Objects removed from Klog are either discarded by another admission policy (3a) or added to KSet (3b).

One of Kangaroo’s innovations is how it moves objects from KLog to KSet. Kangaroo uses KLog to find multiple objects associated with the same set in KSet, such as the pink and yellow objects shown above. Whenever an object is removed from KLog, Kangaroo proactively removes objects from KLog to minimize write gain in KSet.

Since writing a set always requires 4 KB, regardless of the number of objects inserted, writing two objects instead of just one halves the write overhead per object. This means that Kangaroo can amortize writes to KSet across multiple objects, reducing the total number of bytes written to flash. KLog achieves this goal with a small capacity (~ 5 percent of flash), so Kangaroo only needs a small amount of DRAM to index the full capacity of KLog. Thus, Kangaroo addresses both the DRAM and the flash write overhead when caching tiny objects on flash.

In addition to this basic design, Kangaroo introduces additional techniques to realize its hierarchical design and increase its effectiveness. Specifically, because Kangaroo is a cache rather than a key-value store, it can remove objects to minimize writes. Kangaroo takes this opportunity by adding a threshold allow policy that discards objects instead of allowing them to KSet when fewer than n objects need to be inserted from KLog. This permission policy enables Kangaroo to guarantee that the write overhead for moving objects in KSet is much lower than with a set-associative cache. Kangaroo offers further tweaks to reduce DRAM overhead and reduce misses, as explained in our SOSP 2021 paper.

Together, these optimizations allow Kangaroo to overcome the limitations of log-structured caches and set-associative caches and create a flash cache that achieves the goal of efficient caching for tiny objects.

Why it matters:

Caching plays an important role in large systems as it enables faster access to data and reduces the load on databases and other backend systems. With large caches, it is far more efficient to use flash than DRAM. Flash caches fall short when it comes to caching tiny objects. Flash caches either require too much DRAM, which loses the efficiency benefits of flash, or too many writes, which wears out the flash device too quickly. In either case, Flash caching does not live up to its potential as an efficient, large cache for tiny objects, which ultimately results in smaller caches and a higher load on back-end systems.

Many social media and IoT services have large numbers of tiny objects, each of which is a few hundred bytes or less. For example edges in the social graph of Facebook, which are needed to connect friends, posts and pictures, among other things. Average under 100 bytes. The efficient caching of these objects on Flash enables rapid, large-scale retrieval and minimizes the burden on back-end storage services. Kangaroo enables large-scale flash caching of tiny objects through a low DRAM and write overhead, which leads to a lower miss rate in large flash caches.

Read The full paper:

Kangaroo: Caching Billions of Tiny Objects on Flash

Get it on GitHub:

kangaroo and CacheLib

Acknowledgments:

The authors would like to thank Sara McAllister for leading the project and the members of the Parallel Data Lab at Carnegie Mellon University for their contribution to leading the research and collaboration with Facebook.

Comments are closed.