Tectonic file system: Consolidating storage infra

What the research is:

Tectonic, our data center-scale distributed file system, enables better resource utilization, promotes simpler services, and requires less operational complexity than our previous approach. Our previous storage infrastructure consisted of a number of application-specific storage systems. Clusters or instances of these storage systems are used to scale up to several dozen petabytes. As Facebook grew in size, this constellation of storage system architecture became increasingly resource-inefficient and operationally complex.

Each tectonic cluster can be scaled to exabytes and meets the storage requirements of an entire data center. With Tectonic, our consolidated storage architecture promotes resource efficiency by collecting resources that would otherwise be stranded in smaller clusters. This consolidation has also greatly simplified our storage operations as we now have to manage a single system and fewer clusters.

How it works:

In setting up this system, we solved three overarching challenges at the same time: support in the Exabyte area, isolation of the service between tenants and enabling tenant-specific optimizations.

Exabyte clusters are important for ease of operation and resource sharing. Tectonic disaggregates the file system metadata into independently scalable layers and hash partitions of each metadata layer into a scalable shared key-value store. Combined with a linearly scalable storage node layer, this disaggregated metadata enables the system to meet the storage needs of an entire data center.

Tectonic simplifies performance isolation by solving the isolation problem in each tenant for application groups with similar traffic patterns and latency requirements. Instead of managing resources between hundreds of applications, Tectonic only manages resources between dozens of traffic groups.

Tectonic uses client-specific optimizations to match the performance of specialized storage systems. These optimizations are made possible by a client-controlled microservice architecture that includes a variety of client-side configurations for controlling the interaction of tenants with Tectonic.

Why it matters:

Most large cloud services rely on storage. As cloud services become more popular, the need for data storage and processing is growing rapidly. Distributed storage systems must scale and evolve to efficiently store and process this data. With increasing storage requirements, for example, the scalability of individual storage clusters can become a bottleneck.

The introduction of Tectonic helped our storage scaling and resulted in many operational and efficiency improvements. By converting our data warehouse to Tectonic, we have reduced the number of data warehouse clusters tenfold, simplified operations and freed up resources. Tectonic manages these increases in efficiency and at the same time offers a performance that is comparable to or better than that of our previous specialized storage systems.

Read the full paper:

Facebook’s Tectonic File System: Exascale Efficiency

Comments are closed.