Working BGP in large-scale information facilities

What the research is:

A unique study detailing the scalable design, software implementation, and operation of routing design for Facebook’s data center based on Border Gateway Protocol (BGP). BGP was originally developed to connect Autonomous Internet Service Providers (ISPs) on the global Internet. BGP is highly scalable and widely recognized as an attractive choice for routing. It is the routing protocol that connects the entire internet. Similar to online road map services, BGP forwards packets of data and helps determine the most efficient route through a network.

Based on our experience in implementing it in our data centers, BGP can provide a robust routing foundation, but requires tight code design with the data center topology, configuration, switch software, and data center-wide operational pipeline. We developed this routing design for our data centers to build our network quickly and provide high availability for our services while the design itself remains scalable. We know that errors occur in any large system. Therefore, our routing design aims to minimize the impact of potential errors.

How it works:

To achieve the goals we set for ourselves, we didn’t have to use BGP just as a routing protocol. The resulting design creates a basic connectivity configuration in addition to our existing scalable network topology. We use a consistent AS numbering scheme that is reused across different areas Data center fabricsSimplify ASN management in data centers. We use hierarchical route aggregation at all levels of the topology to scale to our data center sizes while ensuring that the routing tables in the hardware are small.

Our policy configuration is tightly integrated with our basic connectivity configuration. Our guidelines ensure reliable communication using route propagation areas and pre-defined safety paths for errors. They also allow us to keep the network going by gracefully diverting traffic seamlessly from problematic / faulty devices. Finally, they ensure that services remain accessible even if an instance of the service is added, removed, or migrated.

The data center fabric architecture, which consists of server pods and spinal tiers, supports growing compute and network requirements.
The BGP confederation and AS numbering scheme for server pods and spine levels in the data center can be reused across all data centers.The BGP confederation and AS numbering scheme for server pods and spine levels in the data center can be reused across all data centers.

In order to support the growing scope and changing routing requirements, our BGP agent at switch level needs regular updates to add new functions, optimizations and bug fixes. To streamline this process (ie to ensure quick and frequent changes to the network infrastructure to support good route processing performance) we implemented an internal BGP agent. We keep the code base simple and only implement the necessary protocol functions that are required in our data center, but do not deviate from the BGP specifications.

In order to minimize the impact on production traffic and at the same time achieve a high release speed for the BGP agent, we have created our own framework for tests and incremental provisioning, which consists of unit tests, emulations and canary tests. We use a multi-phased deployment pipeline to push changes to agents.

Test and deployment pipeline.Test and deployment pipeline.

Why it matters:

BGP has made serious strides in the data center thanks to its scalability, extensive policy control, and a proven track record of running the Internet for several decades. It is known that data center operators use BGP for routing, often in different ways. Because data center requirements are very different from the Internet, using BGP to achieve effective routing for data centers is much more complex.

Facebook’s BGP-based routing design for data centers combines the stringent requirements of data centers with the functionality of BGP. This design gives us flexible control over routing and keeps the network reliable. Our in-house BGP software implementation and its test and deployment pipelines allow us to treat BGP like any other software component, enabling rapid incremental updates. After all, our operating experience with BGP for more than two years in our data center fleet has influenced our current and ongoing routing design and operations. Our experience with BGP has shown that this is an effective option for large data centers. We hope sharing this research helps others looking for a similar solution.

For more information, see our presentation of NSDI 2021.

Read the full paper:

BGP in Facebook data centers

Comments are closed.