The Critical Lowdown Podcast Episode 20
How SONiC is Disrupting the Networking World - Breaking Vendor Lock-in at Scale
Barry McGinley, Senior Systems Engineer, EPS Global
In 2016 Microsoft developed Software for Open Networking in the Cloud – better known by its acronym SONiC, for its Azure cloud data centers.
In 2022, Microsoft ceded oversight of the project to the Linux Foundation, who will continue to work with the Open Compute Project for continued ecosystem and developer growth. Despite its increased profile since 2017, SONiC’s growth has mostly been focused among large hyperscalers and enterprises that have the resources to build their own automation and management tools – and according to Forbes has been the preserve of “Hardcore Networking Geeks”.
So what is SONiC? What makes it unique? And how can businesses make the most of it to cut costs and improve efficiency?
This is Episide 20 the Critical Lowdown.
Subscribe to The Critical Lowdown from EPS Global wherever you get your podcasts:
If you have any questions about or need advice or tech support for your upcoming project, don’t hesitate to get in touch. Or check out our Service Provider Solutions here.
Dáire: So we're joined today by our regular guest, Senior Systems Engineer for EMEA at EPS Global, Barry McGinley.
So Barry, SONiC has been getting a lot of focus over the last few years for its potential, and this focus sharpened last year, when it was revealed that Microsoft were ceding oversight of the SONiC project to the Linux Foundation. But Forbes magazine seems to think that it is still - to quote: "mostly known to hardcore networking geeks and has limited management tools". So we're going to get to the bottom of that today, hopefully, and I hope you don't mind me characterizing you as a hardcore networking geek, because I know you are somewhat of an expert in this!
The first question is - can you explain to our listeners, as simply as you can, what is SONiC?
Barry: SONiC - Software for Open Networking in the Cloud, was created by Microsoft in about 2016 for a specific task. Microsoft wanted to move away from the usual vendor lock-in, and it was basically a test as well, and they just tested it on their Top-of-Rack switches, which is a simple feature set. That's how it started.
Then Microsoft open-sourced this, giving it to the OCP before they gave it to the Linux Foundation. It's a Network Operating System, started for a specific task within Microsoft Azure Top-of-Rack sites. In 7 years it's progressed a lot, and we'll talk about the changes later in the podcast.
Dáire: So you've mentioned "vendor lock-in" and "a specific task". What sort of challenges were Microsoft trying to solve when they developed SONiC?
Barry: The first one was vendor lock-in which was a big one, which essentially was to prevent being held over a barrel by Juniper, Cisco, Dell and so on.
The second reason was scalability. As I said it started with top-of-rack switches, so Top-of-Rack are the ones that sit just above the servers and connect into the servers. You don't need a huge feature set. So they said to themselves: "why don't we build our own operating system"? They also built SAI, the Switch Abstraction Interface, which basically allowed SONiC to sit on just about any piece of hardware. There's roughly 150 boxes on the hardware interoperability list now, but it basically contains every vendor that's involved in Open Networking, you have Edgecore, Ufispace, Celestica, Delta, Quanta, etc.
Thirdly - Azure is huge, so being able to do the same thing over multiple sites was really important.
Next - Automation. They needed to be able to link into their automation tools. The idea was that SONiC was very easy to plug into. When we're looking at our telemetry or automation, it's very, very easy to link in with the SONiC operating system.
Following on from this - interoperability is important, and that goes along with the telemetry and automation. You're not stuck with one vendor, you can build your own tools. You can imagine how many engineers Microsoft has to work on the operating system. Microsoft were the first ones to deploy it, but then LinkedIn, eBay, Alibaba, and Tencent, organizations with a lot of engineering resources were able to take it, add more features, and feed their versions back into the community. That is the idea behind SONiC and there's lots of different versions out there. So to summarize, the key benefits of SONiC are:
- Remove vendor lock-in
- Easy scalability between all the Azure sites
- Interoperability to link in with telemetry and automation
Dáire: How does SONiC differ from other Network Operating Systems?
Barry: The majority of the Open Networking software is Linux-based, and it is Linux-based, but that similarity is where it stops.
There has been open source operating systems, but this has had a very different birth to all the other ones. Instead of being built by the OCP or the ONF or one of these community groups, it was built by a company to solve their specific tasks. It was being deployed at scale, and production-hardened before it kind of went out to the community. As I previously mentioned, what happened next was very large organizations, Tencent, Alibaba, eBay, took it over and then all add features back into the community. It started to build and grow with each kind of new deployment. And again, one of the big things for networking is that at each point, it's adding features, but it's working at scale while adding these features.
Just to give a little bit of background, there's a community version, and this is the version that anyone working on it (for example Broadcom, Edgecore, Dell, Mellanox, etc) or deploying it (LinkedIn, Alibaba, and so on) put their features back into that community. So this open source version available on GitHub will grow and evolve, which can become a bit messy, and we have commercial versions.
Open source OSs have been around before, but they've have remained open source; whereas with SONiC, the fact that companies have started to build their own commercial versions is an indicator in the Open Networking industry that this is going to be the de facto operating system. However, because not everybody has the engineering resources as Alibaba or LinkedIn to take the open source version and add and remove features, we need hardened and tested commercial versions, and we have them now from companies like Broadcom, Edgecore, Dell, and Mellanox.
Standardization is desired in this situation, rather than an excess of versions. We want people, particularly enterprises without technical resources, to adopt and utilize this system. As it reaches critical mass, we now have a secure, widely-used operating system that can be deployed in any data center. While having too many operating systems may hinder this progress slightly, we are reaching a point where enterprise data centers, not just large ones, are ready to deploy and use it.
Dáire: You've pointed out a few aspects where SONiC can be somewhat messy. Could you elaborate on the drawbacks of using SONiC compared to a conventional operating system?
Barry: We get many enquiries from people looking to buy the hardware from us, for example a AS7326 - 25G switch from Edgecore which is on the hardware interoperability list with SONiC, but they think the operating system will be free because they can use the GitHub version. But it's not as simple as just installing the OS because the full open source version has absolutely everything included, which is not what everybody wants, and that full version hasn't really been tested or hardened. That is a bit of a problem.
Most guys do want support, and luckily there are multiple commercial versions. So what we would advise is to take out support for 1/3/5 years. You're better off buying a commercial version, testing it, getting used to it. More than likely the open source version is going to be the one that's used down the road because it's going to be good enough, but at the minute it's not.
The approach of going with a commercial version and support is that it allows you to do your first cycle with SONiC with the ability to pick up the phone and get tech support if there's a problem. Bug fixes coming in every couple of months, new versions coming in, which are again tested at scale.
The drawback at the minute are the different versions, but the commercial versions are all good, so that's not a huge drawback. It's up to which one the customer has decided to use and whether it's the best fit for them. It's the idea that "this is open source, this is free, now send it to me and it's supposed to work perfectly". Open source has never truly worked like that. But people are open to be educated about it, when we talk about the different versions that are out there, the Edgecore version, the Broadcom version, people are more than happy to take our advice and take their 1/3/5 year support. It's not a huge effort. It's not a huge outlay for the software and for smaller companies, having the peace of mind to be able to pick up the phone to tech support is really important.
Dáire: So what do you say to them when you're explaining it to them?
Barry: You want to be able to pick up the phone, it's as simple as that! The open source version is a bit of a mess, it has everything on it, there are a lot of engineers working on it but you need to be able to call somebody if there's an issue. You need bug fixes, and for the time being, for the next cycle of hardware, so three to five years, personally, I don't think the open source, GitHub version is ready for daily use, unless you have the engineering resources.
Dáire: Yeah, having done a very limited amount of programming in my university days, going through Stack Overflow is not something I would recommend for anybody with a nervous disposition!
Dáire: Our partner Edgecore Networks has released Enterprise SONiC Distribution by Edgecore. How does that differ from the community version of SONiC?
Barry: Edgecore were one of the biggest contributors to SONiC, having the most commits to the community version after Microsoft and Broadcom. This is a huge amount of work for Edgecore considering that it was Microsoft's operating system, and Broadcom's baby, because they sell ~90% of the switching ASICs in the world, it's in their interest that they sell this open hardware that works on all their ASICs.
I mentioned earlier that the community version has absolutely everything. With Enterprise SONiC Distribution by Edgecore they've taken the feature set required for 3 use cases: overlay and VMWare and hardened that. By hardened I mean they developed and tested it, then deployed it at scale and tested again, and performed any bug fixes well before releasing it to the outside world. I like Edgecore's approach, in not trying to take on too much at the one time, making sure it’s a success. Broadcom have done something similar.
What Edgecore are doing in the background is adding features for their next use cases instead of releasing everything to the world all at once. Once we have the majority of use cases covered within the Spine and Leaf topology within the data center, they'll move on to campus networks, then telecoms networks, or at least some form of routing. It makes sense to deploy this version, and report successes before moving onto the next release.
What they're doing in the background then is adding features for their next use cases instead of trying to throw everything out to the world all at the one time, they're specifically working on some use cases, and then once the data center is covered, once we have the majority of use cases covered within Spine and Leaf topology, within the data center, they'll move on to campus networks, and they'll move on to telecoms networks, or at least some form of routing. But that's a bit away, and I like that they're kind of not trying to eat the full cake, they're just taking a bit of it at the moment and make sure that works and getting big deployments and big successes with this certain version.
SONiC is to Switching as Linux is to Servers
The goal is that SONiC becomes the operating system for everything, just like Linux is within the server world. I like that they're not being too greedy with it, or pushing it out when it's not truly ready. That happens all too much in the Open Networking world. People shouting from the rooftops about something, but they're trying to bite off too much and trying to be a one-size-fits-all. We've seen operating systems trying to work in the data center, in campus enterprise networks, and then trying to do telecom stuff as well.
Pick one use case, stick with it, attain success, and then look at other areas.
Dáire: At EPS Global we've recently launched SONiC Express. Can you tell us what that is?
Barry: SONiC Express is a bundle that takes away all the hassle but includes all the benefits. It is SONiC on a box pre-packaged plus support. We've been involved in Open Networking for over a decade and what we've noticed, especially with the telecoms market is that people don't want to be messing around with a piece of hardware. They want the benefits of Open Networking, but they don't want to have to talk to several companies about the hardware, the software, and then deal with licenses. Then when there's a problem, who do I pick up the phone to? Is it a hardware problem? Is it a software problem?
Bundling the software and the hardware has worked really well in the telecoms world, and we're bringing that to the data center market. It will be Enterprise SONiC Distribution by Edgecore pre-installed on an Edgecore box. You don't have to worry about licenses or installing software, and if you choose we can bundle in optics and DACs along with whatever else you require, and we make sure everything is on the hardware interoperability list with SONiC.
It's a commercial version of SONiC - Edgecore's hardened version of the software. It will work well, and we'll be able to tell you whether it suits your use case or not. Maybe after 3-5 years you could move to the enterprise version or open source SONiC, but for companies who don't have the engineering teams that large organizations such as LinkedIn or Alibaba have, this is a nice segue into SONiC.
The support for the hardware and software is provided by Edgecore. EPS Global can be the first port of call, but basically it’s one number to call, one ticket to submit, whether it's hardware or the software. You're getting the benefits of disaggregation and no vendor lock-in while avoiding the hassles usually associated with purchasing software and hardware and interoperable optics etc separately. We're a one stop shop for a SONiC deployment and everything's prepackaged for you.
What's the future for SONiC?
Barry: I briefly mentioned earlier that the community version is going to be a highly functional and practical edition in the future. Due to this, it might lead to companies shifting their focus from providing the software itself, which is free, to offering an automation platform and support for it instead. This is primarily due to the extensive visibility into the network that SONiC's telemetry and automation features allow. So if you have your support with the same crowd that are providing the automation and telemetry, obviously, you will have a much better insight into what's actually happening in your network.
There are a few companies out there doing this, Aviz Networks are using the community version and providing an automation layer on top of it, and because they're providing that layer, they have much better visibility into the network and can solve problems much quicker. Broadcom have the same, they can use more of the ASIC, so they get brilliant telemetry and visibility of what's happening in the box and within the fabric. From my conversations with Edgecore, Broadcom, and various automation experts, it appears that enhanced network visibility will be a significant advantage of using the community version.
In two to three years, we could still be looking at using commercial versions with an automation platform sitting above it, but I do think that within maybe one hardware cycle, which is probably three to five years, we will see the community version being deployed more with an orchestration platform sitting above it and the support being provided by guys like EPS Global, Aviz Networks, Hedgehog, Dorado, Apstra (owned by Juniper now but they used to be quite a big Open Networking and orchestration platform), and AtriNet etc.
That's the way I'd like to see it go in the next three to five years.
Dáire: So thanks a million, Barry. I'm certainly more educated than I was on SONiC. So thanks again and we'll chat to you soon.
Glossary of Terms