This article is the second article of my blog series “Audio over IP Networks for Events - An Opinionated Guide”. In the first article, I have established why Layer 2 networks with stretched VLANs and Spanning Tree Protocol should be considered harmful. If you haven’t read it yet, I recommend you do so before continuing with this article. You can find the first article here.
I assume that the reader has basic knowledge of networking concepts, which means that you should have heard about IP addresses before and should know that 192.168.12.0/24 is an IPv4 address.
I will try to guide you through the concepts in a way that lays the foundations for you to get a deeper understanding, but this is by no means a complete guide to networking. Given the starting point laid out above, my goal is that you don’t have to go back and forth, googling every second word to understand the concepts I am trying to explain. I have tried to find a middle ground for the level of details - however, if you feel that I am skipping too much detail, please let me know and I will try to improve the article.
- Part 1: Foundations and why L2 is considered harmful
- Part 2: Layer 3 Network Design Principles
- Part 3: OSPF for self-healing networks that just work (TM)
- Part 4: BGP as advanced routing protocol for when you need a little bit more spice
- Part 5: Using PIM-SM to distribute Multicast
- Part 6: Best Practices: Proven Design Patterns and Reference Designs
- Part 7: Gear Guide: Selecting Hardware That Actually Works
- Part 8: Test Before You Deploy! Network Simulation Tools and Techniques
As the title warns, this is an opinionated guide that reflects my personal opinions and field experience.
I make no claims to absolute truth and it is well within the realms of possibility that some statements I make are just plain wrong or lack exposure to scenarios that would shift my thinking.
I welcome all questions, suggestions and feedback (even if it’s a rant about how you think I’ve completely missed the mark).
Introduction to Layer 3
On Layer 3, we are generally dealing with IP Addresses (instead of MAC Addresses like on Layer 2).
In opposite to Layer 2, Packets on Layer 3 also have a field called “Time To Live” (TTL) that is decremented by one by each router the packet passes. If the TTL reaches zero, the packet is dropped. This is an additional safeguard to prevent packets from looping indefinitely in the network.
IP (Internet Protocol) exists in two common versions: IPv4 and IPv6. While IPv4 has 32-bit addresses, IPv6 has 128-bit addresses. IPv4 should be considered legacy. However, the basic principles of Layer 3 are identical for both, thus the following introduction will work for both.
Basic Terminology - Routing, Bridging, Switching and Forwarding
Before we dive into the details, let’s clarify some basic terminology:
- Forwarding: The process of relaying packets in general. This is the most basic operation of a network device. It is based on some kind of forwarding decision, which is clarified below. Examples for forwarding decisions are Bridging, Routing, Multi Protocol Label Switching (MPLS) or flow based forwarding.
- Bridging: Bridging is the process of forwarding packets based on their MAC address. This is what Network Bridges (or Layer 2 switches) do. They learn the MAC addresses of devices connected to them and use this information to forward packets to the correct port.
- Routing: Routing is the process of forwarding packets based on their IP address. This is what routers (or Layer 3 switches) do. They maintain a routing table that contains information about the networks they are connected to and the next hop for each network. How this works exactly will be explained in more detail below.
- Switching: Switching is an ambiguous term that can refer to both bridging or other means of forwarding depending on the context, e.g. MPLS (Multi Protocol Label Switching) which is based on labels. I try to avoid this term in this article to prevent confusion, but you might encounter it in other contexts.
Routing Tables
Routing is generally based on the destination IP address of a packet. In some cases (like firewalls) it can also take the source IP address into account, but this is more of a niche use case and called Policy Based Routing (PBR).
A router maintains one or multiple routing tables, sometimes also called FIB (Forwarding Information Base) or RIB (Routing Information Base). The routing table contains information about the networks the router is connected to and the next hop for each network. The routing table is used to make forwarding decisions based on the destination IP address of a packet.
The most common types of a next hop is either a directly connected network or another router respectively its IP address. A next hop can also be a tunnel, a physical interface, etc…
Example for a routing table (Taken from the Arista docs)
switch> show ip route bgp
Codes: C - connected, S - static, K - kernel,
O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
N2 - OSPF NSSA external type2, B I - iBGP, B E - eBGP,
R - RIP, A - Aggregate
B E 170.44.48.0/23 [20/0] via 170.44.254.78
B E 170.44.50.0/23 [20/0] via 170.44.254.78
B E 170.44.52.0/23 [20/0] via 170.44.254.78
B E 170.44.54.0/23 [20/0] via 170.44.254.78
B E 170.44.254.112/30 [20/0] via 170.44.254.78
B E 170.53.0.34/32 [1/0] via 170.44.254.78
B I 170.53.0.35/32 [1/0] via 170.44.254.2
via 170.44.254.13
via 170.44.254.20
via 170.44.254.67
via 170.44.254.35
via 170.44.254.98
switch>
Classless Inter-Domain Routing (CIDR)
Above I explained that routing is generally based on the destination IP address of a packet. We will now look at how the routing table is actually used to make forwarding decisions based on the destination IP address.
The principle behind routing decisions is called Longest Prefix Matching (LPM), which goes hand in hand with Classless Inter-Domain Routing (CIDR). In the dark days, we had classful networking, which meant that IP addresses were divided into classes (A, B, C, D, E). This is totally irrelevant nowadays, you don’t have to spend any time researching this. CIDR was introduced in 1993 with RFC 1519 and others. It allows for more flexible allocation of IP addresses and is the basis for modern IP networking. CIDR allows for variable length subnet masks (VLSM), which means that you can have subnets of different sizes within the same network. A subnet is just some kind of logical division or grouping of IP addresses.
An IPv4 address in CIDR notation looks like this: 192.168.90.50/24
.
An IPv4 address has 32 bits, which are usually represented as a group of four integers between 0 and 255 separated by dots: [0-255].[0-255].[0-255].[0-255]
. Each number between the dots is called an octet and represents 8 bits of the address. That’s why the maximum value for each octet is 255 (2^8 - 1
).
The /24
part is the subnet mask, which indicates that the first 24 bits of the IP address are the network part and the remaining 8 bits are the host part. The network part is also called the prefix.
There exists a different notation for the subnet mask, which is called dotted decimal notation. In this notation, the subnet mask is represented as four octets, just like the IP address. For example, the subnet mask /24
is represented as 255.255.255.0
.
This alternative notation theoretically allows the specification of non-contiguous subnet masks (the binary representation of the subnet mask is not a contiguous block of 1s followed by 0s). This practice has effectively been killed by CIDR and RFC 4632, states that the mask must be contiguous.
The easiest subnet masks are /8
, /16
and /24
, which divide the IP address at the octet boundaries (dots). Lets re-use the example of 192.168.90.50
:
192.168.90.50/24
or192.168.90.0/24
corresponds to the range192.168.90.[0-255]
or192.168.90.0
to192.168.90.255
192.168.90.50/16
or192.168.0.0/16
corresponds to the range192.168.[0-255].[0-255]
or192.168.0.0
to192.168.255.255
192.168.90.50/8
or192.0.0.0/8
corresponds to the range192.[0-255].[0-255].[0-255]
or192.0.0.0
to192.255.255.255
As you can see, when specifying prefixes, you usually use the lowest IP address of the range.
I will not go deeper into details, like the calculation of binary addresses and subnet masks, because enough resources exist on the internet that explain this in detail.
One key fact to remember is that a subnet mask specifies the length of the prefix or network part of an IP address. The longer the prefix, the more specific the route is.
Another key fact to remember is that the prefix length is also used to determine which other hosts can be reached on Layer 2 (that’s why the last part is called host part).
If a host has the IP address 192.168.90.50/24
assigned to one of its interfaces, it assumes that it can reach all other hosts with IP addresses in the range 192.168.90.[0-255]
on Layer 2, i.e. it will attempt to send packets directly to these hosts and not via a router.
Now that we have established the basics of CIDR, we can look at how routing decisions are made based on the routing table and Longest Prefix Matching (LPM).
Longest Prefix Matching (LPM)
As the name says, Longest Prefix Matching (LPM) is the process of finding the longest prefix in the routing table that matches the destination IP address of a packet. This is done by comparing the destination IP address with all prefixes in the routing table and selecting the one with the longest match.
This inherently establishes a hierarchy of prefixes, where longer prefixes are more specific and shorter prefixes are more general. You can think of it as a tree structure, where the root is the most general prefix and the leaves are the more specific prefixes (in fact, the routing table is often implemented as a TRIE, which is a tree-like data structure).
Assume the following routing table (and assume we can somehow reach the next hops):
192.168.0.0/16 via 1.1.1.1
192.168.20.0/24 via 8.8.8.8
192.168.20.99/32 via 6.6.6.6
A packet with the destination address 192.168.10.10
would be forwarded to 1.1.1.1
, just like a packet with the destination address 192.168.250.250
. This is because the longest prefix that matches both addresses is 192.168.0.0/16
. The other two entries in the routing table do not match the destination address (192.168.20.0/24
only matches from 192.168.20.0
to 192.168.20.255
, and 192.168.20.99/32
only matches 192.168.20.99
), so they are not considered.
A packet with the destination address 192.168.20.50
would be forwarded to 8.8.8.8
. Even though the prefix 192.168.0.0/16
matches, the prefix 192.168.20.0/24
matches as well, but it is more specific (longer prefix), so it is selected.
The prefix 192.168.20.99/32
is even more specific but does not match, so it is not considered.
A packet with the destination address 192.168.20.99
would be forwarded to 6.6.6.6
. All three prefixes match, but 192.168.20.99/32
is the most specific one, so it is selected.
A packet with the destination address 10.10.10.10
does not match any of the prefixes in the routing table, so it is dropped.
Subnetting establishes a hierarchy of prefixes
As you can see, subnetting establishes a hierarchy of prefixes, where longer prefixes are more specific and shorter prefixes are more general. This is the basis for routing decisions in IP networks.
A practical example of this is the following routing table (layouted using this wonderful tool):
10.0.0.0/8 -> Internal network for event
├── 10.1.0.0/16 -> Front of House
│ └── 10.1.1.1/16 -> Mixing Rack FOH
├── 10.2.0.0/16 -> Stage
│ ├── 10.2.1.0/24 -> Main Hang left
│ │ ├── 10.2.1.1/24 -> Main Hang Left, Speaker 1
│ │ └── 10.2.1.2/24 -> Main Hang Left, Speaker 2
│ └── 10.2.2.0/24 -> Main Hang Right
│ ├── 10.2.2.1/24 -> Main Hang Right, Speaker 1
│ └── 10.2.2.2/24 -> Main Hang Right, Speaker 2
├── 10.3.0.0/16 -> Delay Line 1
│ ├── 10.3.1.0/24 -> Delay Line 1, Hang Left
│ │ └── ...
│ └── 10.3.2.0/24 -> Delay Line 1, Hang Right
└── 10.4.0.0/16 -> Delay Line 2
└── ...
We will dive into more practical examples in Part 6 of this series, but this should give you an idea of how subnetting and longest prefix matching establishes a hierarchy of prefixes.
Routing Protocols
Routing protocols are a fundamental part of Layer 3 networks. They are used to exchange routing information between routers and to build the routing table. There are many different routing protocols, they are often classified into Interior Gateway Protocols (IGPs) and Exterior Gateway Protocols (EGPs).
Interior Gateway Protocols (IGPs) are used to exchange routing information within an autonomous system (AS), which is a collection of networks and routers under the control of a single organization. Exterior Gateway Protocols (EGPs) are used to exchange routing information between different autonomous systems.
However, in datacenter network there has been a trend towards using EGPs like eBGP (Exterior Border Gateway Protocol) as an Interior Gateway Protocol (IGP) as well. So, don’t be fooled by the names, they are not as strict as they sound.
The advantage of routing protocols is that they are dynamic. They reduce the workload for configuration, prevent errors in manual configurations and they can adapt to changes in the network topology like link failures, router failures or new devices being added. They can also be used to implement redundancy and load balancing. One primary task of routing protocols is also prevent routing loops.
As a simple example, assume the following topology, where R1, R2 and R3 are routers and speak a common routing protocol. R1 and R3 are connected through R2. Assume that clients are connected to all routers.
R1 -- R2 -- R3
Now if R1 can reach the network 1.2.3.0/24
locally and announces this to R2, R2 can automatically share this information with R3 and that R1 can be reached through R2.
Therefore, R3 knows that it can reach the subnet 1.2.3.0/24
through R2, without any manual configuration of the route 1.2.3.0/24
on R2 or R3. R2 also knows that it can reach the subnet 1.2.3.0/24
through its neighbour R1.
The most common routing protocols are BGP (Border Gateway Protocol) and OSPF (Open Shortest Path First). EIGRP (Enhanced Interior Gateway Routing Protocol) and IS-IS (Intermediate System to Intermediate System) are also somewhat common. Many more routing protocols exist, but most of them are not relevant for our use case.
We look into OSPF in more detail in Part 3 of this series and BGP in Part 4.
Layer 3 Network Design Principles
In this section, we will look at some basic design principles for Layer 3 networks. These principles are not set in stone, but they are a good starting point for designing Layer 3 networks.
Please note that many books have been written about Layer 3 network design, so this is by no means a complete guide. It is a collection of principles that are relevant for our particular use case: Audio over IP Networks for Events. As an example, it wouldn’t make sense to discuss CLOS (Leaf-Spine) topologies here, because the specific circumstances of event networks make them impractical.
Keep Layer 2 Broadcast Domains small
This is direct consequence of the goal to build a Layer 3 fabric.
Broadcast domains should be confined to a single switch or router, or even better, to a single port (so no BUM traffic is relayed between the ports)
EVPN-VXLAN
EVPN-VXLAN is a technology that allows extending L2 networks over L3 fabrics by encapsulating / tunneling packets (VXLAN encapsulation)
EVPN-VXLAN is a very cool and powerful technology, but can also be quite complex to setup and troubleshoot. In most setups, it also requires hardware support in the network switches (except if your end-nodes speak EVPN-VXLAN take care of the encapsulation)
EVPN-VXLAN has become kind of a buzzword. It is a widely used technology - and most people deploy it simply because everyone else does it and it’s the new hot thing. The usage of EVPN-VXLAN is often a result of lack of knowledge (that a better result could be achieved with an L3 fabric) or ignorance (customers specifying „I want my flat L2, because that’s how we’ve always done it!!“).
If you absolutely have to stretch a VLAN across your network (which is VERY unlikely), consider EVPN-VXLAN first. Only if you absolutely cannot source any EVPN-VXLAN capable hardware and personal, you should consider a normal stretched Broadcast Domain with all its disadvantages.
Broadcast helper / ip helper-address
I have heard of some manufacturers using Broadcast for Autodiscovery. This is bad. If you encounter such devices, you should chase the manufacturer until they get rid of Broadcast and use some mechanism based on Multicast.
Before you begin stretching a Broadcast Domain, consider alternatives.
If you have the case of a device using Broadcast for autodiscovery and is imperative to have this working, you should look at a feature commonly called „ip helper-address“ or „broadcast helper address“. Sometimes this can be used to relay all Broadcast traffic to another IP.
This other IP would be the IP of the computer that runs the control software for the devices that need to be discovered.
In summary, you would create a special VLAN for the devices that require autodiscovery by Broadcast and configure ip / broadcast helper-addresses on that VLAN.
Bring routing as close to the end-node as possible
Although this best practice is not necessarily applicable to audio over IP networks for events, simply because of the lack of support in the used devices, which almost never support any routing protocols, it still needs to be discussed.
Your goal should be to run routing protocols on each and every end-node.
The advantage of this approach is that it moves responsibility from the fabric towards the end-nodes. This means that you can potentially use cheaper switches with less features (e.g. no tunneling), because your end-nodes can perform this task.
This is similar to approaches like source routing, where the source devices pre-determines the path a packet will take through the fabric, instead of the routers along the path. Many internet service providers operate their core networks on this principle (MPLS, Segment Routing).
It allows for some more design freedoms, e.g. for Quality of Service or Traffic Engineering.
For a datacenter scenario this means that your goal is to run the routing protocols on the servers, instead of terminating them on the ToR switches (Top of Rack switch, the switches to which the servers are connected)
Build a proper IP address plan
One big advantage of our specific scenario is that our networks are mostly temporary.
We don’t have to deal with inherited burdens or legacy issues. Every deployment is basically greenfield and started from scratch.
That gives you a huge design freedoms.
Therefore, you should build a proper IP address plan. You can basically build it however you see fit. A typical recommendation (see above) is to create subnets based on physical location and purpose.
A proper IP address Plan will greatly simplify configuration and troubleshooting („I know exactly where this subnet belongs and what‘s its purpose!“)
And even if you realize that your IP address plan contains an error or is not ideal, you’re not out of luck. Routing protocols give you a lot flexibility.
In the case of „Oh snap! I need this device to be connected to a totally different switch than initially planned“, you could simply announce a /32 route in the fabric for this specific device / exception.
Prevent Single Point of Failures
Routing protocols give you the freedom to build basically as many redundant paths as you want. They will figure out the best way throughout the fabric.
Make use of this!
Use the right equipment for the job („Don‘t rig shit!“)
When it comes to network equipment, there are huge differences in quality. I will give some specific recommendations in Part 7.
Enterprise Network equipment has an extreme loss in value. Huge amounts of used enterprise network equipment exist on the refurbished market for comparatively little money.
My recommendation is always to buy used enterprise network equipment. This hardware is built to the highest standards and the software is usually extremely robust and polished.
Of course, you can’t just buy everything…
Use automation
Automation can massively reduce your workloads as well as prevent configuration errors.
Your goal should be to have your entire network setup configuration automated, templated, etc.