Transcript of Episode 9: What are P4 Programmable Switches and how do they solve power, performance and cost challenges for Data Centers
Barry McGinley: There's a topic I wanted to talk about, P4 switching, or programmable switches. We do bit of P4 ourselves but I'll let you explain programmable switching within the data center and what it means? Do NOS vendors have a part to play here? Do your best at explaining P4 switching please!
Nanda Ravindran: Firstly I want to thank you for inviting me to this podcast by EPS Global. So P4 programmable switches – ‘P4’ is just one of the programming languages to program switches. Let's talk about what programmable switches are.
Initially data centers were built by the data center operators using traditional switches and routers. These switches, I think we talked in our previous podcast, were from large OEMs, who were offering proprietary solutions, both the hardware and software were combined together, and data centers initially were built based on this type of deployment model or technology and were pretty limited.
I think the data center operators soon realized that the requirements for a data center were changing rapidly. The number of customers were scaling, the data that they had to process within a data center were exponentially increasing, and the traditional vendors couldn't keep up. So the data center operators realized that they had to build networks based on a completely different model.
They came up with this model of open networking hardware switches, which was based on off-the-shelf, merchant silicon. These open networking switches were called white boxes, and the software was called the network operating system (NOS). The software was based off Linux.
The data center operators separated this, they developed the network operating systems, and they started deploying data centers based on the disaggregated, or open networking model. Operators soon realized that the kind of flexibility and the level of control that they had on their networks was significant. They were able to manage the networks pretty well, they had more control, and they were able to deploy new services that their end customers required.
It enabled a new level of flexibility, but there are still some elements in the network that were closed or restricted. So here, I'm going to talk about the ASIC or the network processing silicon that's on the white boxes or switches.
These chips are the network processing silicon. The way they process the packets within them were in a very fixed model. They had fixed tables and fixed logic to process packets in a certain way. And that's what is called fixed pipeline silicon.
So this fixed pipeline meant the kinds of protocols or the kinds of services that the data center operator would deploy on the switches were predetermined. It could not be dynamically changed based on a particular new feature, introduction, or a use case because of a particular data center requirement. This was a huge restriction. This happened when VXLAN was first introduced. The VXLAN technology was specified, it was defined, but it took several years for this VXLAN technology to be available on a silicon, and then eventually deployed. That was getting the silicon into white box hardware, then eventually getting into the data center network, and then getting it deployed and ensuring the feature was used. It's a pretty long cycle for the silicon chips to adopt a new technology and then get it realized in an actual data center network.
This is where the programmable switches or the programmable pipeline comes in. The logic within an off-the-shelf silicon was generic, it had a fixed set of generic tables that can be programmed, and any new technology or an any new protocol or a service could be implemented on this on the silicon.
So P4 has several benefits:
- Data center operators can come up with a new protocol, or they can decide to remove a protocol for the sake of efficiency.
- They can define it. They can design it. They can write a program and right after they have the program they can deploy it on all the switches and the data center is now be based on this new design that they just defined.
All this can be accomplished within a day or two, it significantly improves the way the data center operators can deploy new features or services for the sake of efficiency.
Who provides programmable silicon options?
We have two large vendors who provide programmable silicon. One is Broadcom, and the other is Intel.
So, first let us talk about Broadcom. Edgecore has come out with a programmable switch based on the Broadcom Trident4 chipset. This chipset has a programmable module in it, and the programming of this chipset happens through an open networking language called Open NPL. So, the language is defined by Broadcom, and this is one of the ways data center operators can program the silicon itself. There is another model from Broadcom as well that has the programming capabilities, it's called the Jericho2 platform, and this silicon is usually deployed in a service provider or in a carrier network, not in a data center.
Next let’s discuss the programmable options available from Intel. Intel came out with the Tofino chips, which are programmable, and Edgecore has 3 different switches based on the Tofino1 silicon. These models are called the “Web Series” and we have several 100G and 400G platforms based on this silicon. We have a newer switch based on the Tofino2 chipset, and it is called DCS810.
The Tofino platforms are programmable using P4 Technology. We discussed several reasons why programmable platforms are beneficial, they just make the whole packet processing more efficient in terms of power consumption and in terms of latency. So this is a broad summary of what programmable switches are.
Barry McGinley: When P4 programmable switches came out, I had doubts about them. I didn't really see where the benefit was going to be within the data center. When we look at Layer-3, CLOS Topology, Spine and Leaf, what is the need for multiple programmable pipelines? What’s the benefit to data centers for this programmability?
Nanda Ravindran: Programmable switches or programmable silicon is a huge deal for hyperscale data center providers. Hyperscale data center providers want to run their data centers as efficiently as possible, and programmable switches enable higher efficiencies. It allows the data center operators to customize several layers of the infrastructure stack and make the processing of data packets very efficient. The efficiencies are in terms of power consumption and lower latencies. Efficiency is one of the key benefits of programmable switches.
Apart from that, it allows data center operators to introduce new features very quickly. They come up with a new application, or a new type of service, they have the ability to define a program for the switches and implement them quickly. They do not have to wait for these new features and protocols to be implemented by a silicon vendor, and then that silicon getting into a network switch or a white box, and then eventually getting deployed. I think the cycle time of introducing new features, or new protocols, has significantly improved using programmable switches.
We talked about efficiency, and one of the ways the efficiencies are achieved is by taking out protocols that are not needed. This is again possible using programmable switches. One other benefit is about the visibility of packets flowing across the network. The customers or the data center operators have the ability to tag packets as they are flowing across the data center, and track this packet as it flows across the entire network.
This level of visibility is possible, because at runtime, you can program some rules on the programmable switches and track the packets. This feature also allows you to introduce security features, and the ability to look at flows and block them at runtime is also a key feature of programmable switches. Programmable switches or programmable silicon have a pretty bright future. We will see more of them in data centers and probably eventually in the carrier market as well.
Barry McGinley: So just to finish up, I want to ask you about the future of open networking. Edgecore has been a big part of this during the last decade, and a lot of data center infrastructure is open networking now. You have Facebook, you have Google, you have all the big cloud operators now developing their own operating systems… Facebook has FBOSS and Microsoft in their Azure sites have SONiC, they're putting a NOS on to bare metal, and obviously, here at EPS Global we’ve been doing a lot of work with our customers on open networking. I feel that the de facto way of building out your data center now is bare metal and putting an operating system on it.
So how far along are we on in the telecoms industry with this change? It's only been going since 2016, we've had the Telecom Infra Project (TIP), we've had the Open Networking Foundation (ONF), and Edgecore have been designing products for these guys for the last 4-5 years. Where are we now and how far can it go?
Nanda Ravindran: The open networking or disaggregation model has started gaining quite significant traction in the service provider or telecom segment of the market, especially in the last couple of years. I think the open and disaggregation model is now well understood, and the Service Providers or telecom operators have several options in terms of NOS software and in terms of white boxes available for deploying network based on disaggregation.
As you know, the Service Provider adoption for new products and new technology is longer. It takes anywhere from between 1-3 years for the new technology or a new product to be adopted in the Service Provider networks. We have started seeing that there are more Service Providers looking for disaggregated solutions in their network. I think the supply chain crunch has pushed the Service Providers and telecom operators to start looking at open and disaggregated solutions for the network sooner.
They have started to realize that they need multiple options to source their networking needs. There is no better way to do that than with open and disaggregated deployments.
I think to summarize, the future of Open Networking and disaggregation is bright, we expect over the years that open and disaggregated networking will be the model, the de facto model, of deployment in Service Provider and telecom networks.
Barry McGinley: Good answer! I think it's getting close to dinner time for you, Nanda. Thank you very much for your time this morning.
Nanda Ravindran: Thanks for having me on this podcast.
Glossary of Terms:
- P4 Programming: Programming Protocol-independent Packet Processors (P4) is an open source, domain-specific programming language for network devices, specifying how data plane devices (switches, routers, NICs, filters, etc.) process packets. 
- NOS: Network Operating System
- OEM: Original Equipment Manufacturer
- ASIC: Application Specific Integrated Circuit
- Tofino: Intel's range of P4-programmable Ethernet switch ASICs
- Trident4: Broadcom's range of P4-programmable Ethernet switch ASICs
- VXLAN: Virtual Extensible LAN
- SONIC: SONiC is a free and open source network operating system based on Linux and developed by Microsoft and the Open Compute Project.