OCP Summit 2021: Open networking {hardware} lays the groundwork for the metaverse

Open infrastructure technologies and network hardware will play an important role as we develop new technologies for the Metaversewhere one day billions of people will come together in virtual spaces. As we move towards the next major computing platform with an enduring spirit of openness and disaggregation, we are announcing two new milestones for our data centers: We are sharing our next-generation network hardware portfolio in our data centers, developed in close collaboration with multiple vendors . In this context, we have migrated our network hardware for data centers to a standard and open API – the Open Compute Project (OCP) Change abstraction interface (SAI).

We have come a long way in the ten years since we decided to design and build our own data centers. Even then, we knew they would be based on concepts of openness and disaggregation, with technologies that are modular to make upgrades easy and efficient. Since OCP was founded in 2009, we have shared our data center and component designs and made our network orchestration software open source to generate new ideas both in our own data centers and across the industry.

Today, these ideas have made Meta’s data centers the most sustainable and efficient in the world. Now, through OCP, we are bringing new open, advanced network technologies into our data centers and across the industry for new frontiers in computing – from advanced AI applications to the metaverse.

Wedge 400 / 400C: New TORs for more powerful open networks

The Wedge 400 is Meta’s next generation TOR switch.

We have teamed up with Broadcom, our long-time ASIC partner, and Cisco Systems, our newest ASIC partner, to use their ASICs in our two next-generation top-of-rack (TOR) switches – the Wedge 400 and 400C, the latest versions of our wedge GATE. The Wedge 400 uses Broadcom’s Tomahawk 3 ASIC, while the 400C uses Cisco’s Silicon One – our first post with Cisco’s new chip. Both TORs offer a higher port density on the front panel and higher performance for AI and machine learning applications, while also enabling future expansion.

The Wedge 400 and 400C have already been used in our data centers and have several improvements over the Wedge 100S, including 4 times the switching capacity (upgraded from 3.2 Tbit / s to 12.8 Tbit / s), the 8- times the burst absorption performance and a field replaceable CPU subsystem. Both the Wedge400 and 400C are and are manufactured by Celestica open platforms developers of all sizes, from startups to large ISPs, can use them for their own projects.

FBOSS is now from SAI. operated

In the past, FBOSS, Meta’s own network operating system for controlling network switches, used the specific API of the ASIC provider. Now that FBOSS is adapted to OCP SAI and deployed on a large scale in the meta-network, we can work with more silicon vendors. Broadcom worked closely on our migration from FBOSS from OpenNSA to SAI. Additionally, we worked with Cisco Systems to support FBOSS with SAI with their ASIC.

Adapting and migrating from FBOSS to SAI means that we can integrate multiple ASICS from multiple providers into the future more quickly and easily. SAI’s API enables engineers to configure new network hardware without having to worry about the specifics of the underlying chipset’s SDK. In addition, SAI has even been extended to the PHY layer, with Credo Semi supporting FBOSS with its own SAI implementation.

Since this hardware is shared through OCP, support from SAI also means closer collaboration and feedback from the broader industry. Developers and engineers from all over the world can work with this open hardware and contribute their own software, which they in turn can use themselves and share with the broader industry. All of this serves our goal of creating a future where networking is both open and disassembled.

Next generation 200G and 400G fabrics

We have already used 200G optics in our data centers and plan to use 400G in the future.

Meta’s data center fabrics have evolved from 100 Gbit / s to next-generation 200 Gbit / s / 400 Gbit / s. Meta has already deployed 200G-FR4 optics on a large scale and has contributed to specifications for 400G-FR4 optics that will be used in the future.

Meta has developed two next-generation 200G fabric switches, the Minipack2 (the latest version of Mini pack, Meta’s own modular network switch) and the Arista 7388X5 in collaboration with Arista Networks. Both are also backwards compatible with previous 100G switches and support upgrades to 400G.

The Minipack2 is based on the Broadcom Tomahawk4 25.6T Switch ASIC and Broadcom Retimer. The Arista 7388X5 is also based on the Broadcom Tomahawk4 25.6T Switch ASIC, with versions of the 7388X5 also using a Credo chipset. These are high-performance switches that use modular line cards to transmit up to 25.6 Tbit / s and 10.6 Bpps. They support 128x 200G-FR4 QSFP56 optical modules and can maintain consistent SerDes speed on the switch ASIC, the optical host interface, and on the optical line / wavelength. They simplify connectivity without the need for a gearbox to convert data streams. They also have a significantly reduced performance per bit compared to their predecessor models (the OCP-approved meta-minipack and OCP-inspired Arista 7368X4, respectively).

The Minipack2, Meta’s own modular network switch, developed in collaboration with Broadcom

In addition to sharing the most important functions of the Minipack2, the Arista 7388X5 offers hyperscale cloud scalability and flexible operating systems (it can support Arista EOS, FBOSS and SONiC).

The Arista 7388X5 is a next generation 200G fabric switch developed in collaboration with Arista Networks.

Look at the metaverse and more

The Metaverse will rely on many technologies, including advanced large-scale AI. To serve the multitude of new workloads that this creates, we continue down the path of the disaggregated global networks and data centers that support all of this. The technologies that Meta and the entire industry will develop must of course be fast and flexible, but they must also work efficiently and sustainably – from the data center to edge devices. The only way to achieve this is through collaboration through communities like OCP and other partnerships.

Open hardware drives the innovation necessary to achieve these goals. And our collaboration with long-term and new vendors to develop open designs for racks, servers, storage boxes, motherboards, and more will help bring Meta and the entire industry to the next major computing platform. We’ve only gotten at about one percent on the journey, but the path to the metaverse is being paved with open, advanced network hardware.

thanksgiving

The authors would like to acknowledge the work of many teams within Meta, including the teams from FBOSS, Network Hardware Engineering, DNE, and SOE. We would also like to thank our partners and their engineering teams for working closely together on these contributions.

Comments are closed.