Legacy, multi-tiered data center architectures, originally designed for north-south traffic, have been pushed to breaking point with cloud computing, big data, virtualization and more. With east-west traffic now dominating the majority of data center traffic, a new approach to network architecture was imperative. Step forward the data center fabric…
The data center fabric architecture has been designed to deal with the proliferation of east-west traffic, the general explosion in the volume and scale of data, the ability to consolidate network, compute and storage resources, spiraling costs associated with vendor lock-in and much more. ResearchAndMarkets recently forecast a CAGR of 24.5% for the fabric market for the period 2019-2024[1] and that looks remarkably on the safe side. Let’s delve a little deeper and look at what a fabric really is, the hardware and software involved and some implementation options.
What is a data center fabric?
A data center fabric can be described as the layout of how the connections are made between the switches and the servers. The connections take on a woven, crisscross appearance, hence the term fabric. As the size of the data center increases so too does the matrix of connections. Each individual server can reach every other server using this fabric, thus, eliminating the issues caused by increased east-west traffic.
The data center fabric usually has 1 or 2 tiers of switches, the most common being the 2-tier spine and leaf architecture or Clos as it is also known. The leaves in this scenario can also be called ToR (top-of-rack) switches. The traffic is transmitted between the servers by traversing the leaf and spine switches, which means servers are often only 1 switch hop away from each other. This results in the internal traffic within the data center being extremely low latency and highly efficient.
The idea of this networking fabric is not limited to the traditional data center either. The Open Networking Foundation has been working on the CORD (central office rearchitected as a data center) project since 2016. The idea behind it is to bring disaggregation, whitebox economics, and open source standards and software to the carrier industry (more information on carrier industry here). They saw how the hyperscale data centers evolved using bare metal and wanted to get in on the act. Within CORD there is a sub-project called Trellis which is a data center fabric in the telco head office. This has been deployed already in the production networks of AT&T and Comcast. You can read more about in a previous blog, Life on the Edge.
Choosing the right solution for your data center fabric is never going to be an easy task no matter what the vendor tells you! First and foremost, EPS recommends you focus on open networking – this disaggregated approach helps break vendor lock-in and spur innovation of both hardware and software. Next, an in-depth knowledge of the different options will help identify the pitfalls, and the benefits, to the different implementation options that are available now in the open networking world. The 2 main options are, to go down the route of independent devices or to have a centralized control plane.
Independent Devices – In this scenario each switch will act independently from all the other devices on the network. Even with protocols like BGP/EVPN each device must be configured and managed separately. Although tools like Zero Touch Provisioning, Chef, Puppet, and Ansible can make life easier, each device must still be managed individually. The internet has been built this way, so it does scale well but there is a drawback, as the number of devices increases the management of the devices becomes more and more time consuming for the network technicians. Some examples of software in the space are Cumulus Linux, OcNOS from IP Infusion and PICOS from Pica8.
Centralized Control Plane – This solution abstracts the control and management planes to a separate central location which frees up the data plane switches to concentrate on what they do best - forwarding traffic. The control and management plane will appear as a single entity and the entire fabric can be controlled from a single device. There are multiple ways this can be implemented. In the case of the Pluribus’ Adaptive Cloud Fabric, the SDN implementation is “controllerless”, where all the switches federate together to build the management fabric. Each switch has a distributed, mini-SDN controller and distributed database and the full state of the network fabric is shared with every other switch. The entire fabric can then be controlled from the CLI or via a REST API on any switch or the Pluribus UNUM Management Platform. The power of this approach is that it is cost effective as there are no external controllers to deploy and it seamlessly stretches across geo-distributed locations to create a Multi-site Data Center Fabric.
Hardware and Software
In the last 6 or 7 years the software in this sector has evolved into a minefield. New software springs up, seemingly on a weekly basis, making it difficult to extract the wheat from the chaff. From talking to prospective customers, the main worry is usually around the quality of support they will get from the software vendor and not the product itself. From my own personal experience, I cannot speak highly enough of the support, configuration help, bug fixes and general troubleshooting I have seen customers get from Cumulus Networks, IP Infusion and Pluribus Networks to name but a few. Cumulus Networks has recently been acquired by Nvidia/Mellanox and the jury is still out on whether or not they remain an integral part of the open networking ecosystem or not.
On the bare metal hardware side of things, there are fewer players, but all are established and well-known names in the industry now. They are mainly Taiwanese companies, like Ufi Space, Quanta, Delta Agema and probably the most recognizable name of the 4, Edgecore Networks. Edgecore is part of the Accton Group who are an ODM/OEM manufacturer of switches. To give you an idea of scale, they shipped 7.5 million switches last year! Below gives you a look a data center fabric using Edgecore switches.
That’s all for this month folks. Next month we have a look at Open Networking in the carrier market and some of the whitebox products that have come out of the Telcom Infra Project, such as the Disaggregated Cell Site Gateway and Edgecore’s optical transponder, the 'Cassini’. For more information on products and services related to Open Networking come check out our website at www.epsglobal.com.
Slán go fóill,
Barry
P.S. You might be interested in our webinar held on September 3rd 2020: Automating Distributed Data Centers with Controllerless SDN and Open Networking. See it here.