Virtualizing Consensus in Delos for Fast Upgrades and Blissful Engineers – @Scale

We will give a talk about our work Virtualizing the consensus in Delos for quick upgrades and happy engineers during our virtual Systems @ Scale event on Wednesday March 17 at 11 a.m. PT, followed by a live Q&A session. Please submit any questions you may have systemsatscale@fb.com before the event.

When the Delos project started at the end of 2017, we wanted to create a new storage system for the Facebook control plane. Our first use case required a database with a table API. However, we may have had to support a number of other APIs (e.g. ZooKeeper).

In order to implement and deploy Delos quickly, we have the new technology from virtual consensusHere, multiple copies of a database are synchronized via a VirtualLog abstraction, in which commands are stored and arranged. The VirtualLog itself is layered over various consensus protocols, so-called loglets, and can switch between them so that Delos can change its consensus subsystem without downtime.

Thanks to the virtual consensus, the first Delos-based database (DelosTable) with ZooKeeper as the underlying loglet was able to reach production. In a way, we used ZooKeeper’s proven consensus protocol while supporting a spreadsheet API. We later switched DelosTable to a custom NativeLoglet protocol for better performance and removed ZooKeeper as a critical path dependency.

However, Delos’ ultimate goal was not to create another monolithic database with a fixed API. Instead, we wanted to enable a number of databases with different APIs so that applications could choose the API that best suited their needs. Specifically, our next immediate task was to replace the ZooKeeper service on Facebook with the construction of a second database in which the ZooKeeper API is made available. We had to frame Delos: not only in terms of the usual dimensions of throughput or capacity, but also in relation to the engineers working on the code base who had to understand, expand and operate Delos.

Abstraction makes engineers happy … right?

In order to scale the development and operation of multiple databases, we developed Delos as platform for creating databases. Every Delos-based database (e.g. DelosTable) consists of a thin layer of API-specific code that is executed via the Delos platform. The platform provides support for common but complex functions such as consensus, local storage, backup, and so on. As a result, creating, deploying, and operating new databases with custom APIs is greatly simplified: much of the code and operational tools are identical between databases.

Such a separation enabled the ZooKeeper team at Facebook to implement a ZooKeeper clone (Zelos) through the Delos platform. Basically, the ZooKeeper team could concentrate on implementing the complex ZooKeeper API as a Delos database and migrating its customers. The DelosTable team could also focus on meeting use cases that require a table API. In the meantime, a Delos platform team would keep improving the low-level consensus and storage code for better reliability and performance. Each team can focus on its own goals and core competencies while amortizing the costs of developing and operating a complex service. In other words, we would have happy engineers!

However, we immediately encountered a number of challenges. While the vast majority of the code Zelos required was identical to DelosTable, it still required a certain amount of customization. For example, ZooKeeper needs a session order property from its order history that is stronger than any other system. In other cases the Zelos team required changes to the platform (e.g. batching support according to the performance of ZooKeeper) and was blocked in the cycles of the Delos platform team. Somehow, by linking these different teams through a single code base, we found unhappy engineers.

Virtualization saves the day … again!

“Again, use a good idea.” – Butler Lampson, Notes on Computer System Design

To get rid of this jam, we invented an abstraction called a Protocol-structured protocol: A log that is layered over the VirtualLog. Each log consists of a log-structured log engine (or just) engine) on each end host and interact with its counterparts via the VirtualLog. The Delos platform code is organized as a collection of such engines, each of which provides specific functionality. Any database (e.g. DelosTable or Zelos) can run from a user-defined set of these protocols.

In a sense, this approach is a form of virtualization. Any protocol-structured protocol is a replicated protocol virtual state machine via the VirtualLog. Instead of creating the functionality of a database as a single replicated state machine, we modularize it into multiple reusable virtual state machines. A log structured log can only access its own partition of the local database state, which allows isolation between logs.

As the name suggests, there are also protocols with a protocol structure Logs This can be layered in a stack between the application and the VirtualLog. Such protocols add functionality above the VirtualLog in the same way as network protocols on a traditional point-to-point network. New log entries flow over the stack towards the VirtualLog, while existing entries in the log flow over the stack towards the database. As with a protocol in a traditional network stack, an engine can piggyback headers on log entries generated by the above. It can filter or reorder log entries before they reach higher levels. and batch, encryption, compression, or other mutated entries before they reach the lower levels.

Log-logged logs in Delos allowed multiple teams with different goals, customers, and philosophies to share a common code base. We were able to unlock innovations from individual engineers in both teams who could write new logs without worrying about side effects and interactions with other code. For example:

  • We gradually updated Production databases with no downtime by simply adding new engines. For example, the DelosTable team provided a LogBackupEngine that coordinated the upload of the VirtualLog to backup storage.
  • We reused code reuse the production-ready batch of DelosTable across databases for bootstrapping Zelos; For example, Zelos reused the DelosTable ViewTrackingEngine to keep track of the number of persistent copies of the database.
  • We adapted behavior for each database via specific engines. For example, the Zelos team enabled stronger guarantees of session order by adding an additional SessionOrderingEngine to its stack.
  • We improved performance across databases by developing common engines. For example, the Zelos team implemented a batching engine that seamlessly contributed to improving DelosTable performance.
  • Finally we have custom roles Passive, read-only replicas are supported within each database, executing a stripped-down subset of the log stack.

Additionally, protocol-structured protocols allowed the Delos platform team engineers to focus on the challenging “last mile” to ensure consistency and durability. Both databases benefit immediately from changes that protect the platform from the failure of corner cases. In addition, having a single, unified toolchain for the two databases greatly simplifies the call for engineers.

In some ways, replicated database virtualization offers the same benefits as other parts of the system: faster development and deployment cycles, and simpler operations. Innovation and impact are democratized across several teams and roles: Every engineer can write a protocol-structured protocol for a new function without having to think about complex error scenarios. Ultimately, virtualization has had the biggest impact on the metric that really matters: our engineers are happy!

Comments are closed.