This article is the fourth article of my blog series “Audio over IP Networks for Events - An Opinionated Guide”. In the third article, we covered OSPF as a solid, easy-to-use and well-supported routing protocol for Layer 3 networks. You can find the third article here.

I assume that the reader has read and understood the previous articles in this series and knows how OSPF works

I will try to guide you through the concepts in a way that lays the foundations for you to get a deeper understanding, but this is by no means a complete guide to networking. Given the starting point laid out above, my goal is that you don’t have to go back and forth, googling every second word to understand the concepts I am trying to explain. I have tried to find a middle ground for the level of detail - however, if you feel that I am skipping too much detail, please let me know and I will try to improve the article.

Part 1: Foundations and why L2 is considered harmful
Part 2: Layer 3 Network Design Principles
Part 3: OSPF for self-healing networks that just work (TM)
Part 4: BGP as advanced routing protocol for when you need a little bit more spice
Part 5: Using PIM-SM to distribute Multicast
Part 6: Best Practices: Proven Design Patterns and Reference Designs
Part 7: Gear Guide: Selecting Hardware That Actually Works
Part 8: Test Before You Deploy! Network Simulation Tools and Techniques

Disclaimer:
As the title warns, this is an opinionated guide that reflects my personal opinions and field experience.
I make no claims to absolute truth and it is well within the realms of possibility that some statements I make are just plain wrong or lack exposure to scenarios that would shift my thinking.
I welcome all questions, suggestions and feedback (even if it’s a rant about how you think I’ve completely missed the mark).

BGP - The man, the myth, the legend

“The Internet’s Dark Art” - That’s the reputation the Border Gateway Protocol has earned among those who don’t work with it (and sometimes also among those who do). BGP is the routing protocol that holds the internet together… and it has developed an almost mystical reputation among network engineers. While other protocols are taken for granted, BGP is spoken of in hushed, reverent tones, like an ancient spell that must be cast exactly right or everything burns down.

BGP is one of the most versatile and powerful protocols that exists. It is a protocol built for scale, designed to handle the vastness of the internet. It is also a protocol that is extremely flexible, allowing for complex routing policies and configurations that can be tailored to specific needs.

BGP vs OSPF - Both are just routing protocols, right?

Writing a good introduction to BGP for a target audience that has never worked with it before is surprisingly hard. I believe a good approach is to explain the motivation and answer the question why we even need BGP, even though we already have OSPF. Hence the question in the title: “Both are just routing protocols, right?”

The short answer is: No and… Yes. Both BGP and OSPF are routing protocols. But they are designed for different purposes and have different strengths and weaknesses.

To break it down into simple terms: BGP is a protocol built for large-scale route distribution (think: around one Million routes for a full-table of the internet) and pathfinding on a global scale through various Autonomous Systems (systems under different administrative control, e.g. different ISPs) with complex policies (e.g. due to business decisions, security reasons, etc.).

OSPF (or other IGPs, like IS-IS) are protocols built to establish connectivity within a single Autonomous System (a network under a single administrative control).

While sometimes you can (ab)use one to do the job of the other, it is often a better idea to use the right tool for the job. One prime example are modern EVPN-VXLAN fabrics that sometimes use an approach often called “eBGP everywhere”. In these fabrics, eBGP (instead of OSPF or IS-IS) is used to establish basic connectivity within the fabric, but it is also used to transport additional routing information that allows virtualization (essentially multi-tenancy) of the fabric.

IGP, EGP, iBGP, eBGP, and oh my… What are all these acronyms?

Before we dive deeper into the guts of BGP, why we need it and how it works differently from OSPF or IS-IS, we need to clarify some terminology.

First, IS-IS is a protocol that is similar to OSPF. We are not going to cover IS-IS in this article, but almost all concepts that apply to OSPF also apply to IS-IS.

You oftentimes hear the terms EGP and IGP when talking about routing protocols. EGP stands for Exterior Gateway Protocol, while IGP stands for Interior Gateway Protocol. Keep in mind that these categories are just abstractions to help us understand the purpose of different protocols. They are not absolute definitions or categories.

Both OSPF and IS-IS are classified as Interior Gateway Protocols (IGPs). More IGPs exist, but they are mostly irrelevant today. IGPs are designed to establish connectivity (and to some extent, distribute routes) within a single AS (Autonomous System). An AS is a network under a single administrative control.

eBGP (the “e” stands for “exterior”) is classified as an Exterior Gateway Protocol (EGP). EGPs are designed to exchange routing information between different ASes. They allow pathfinding on a global scale through various Autonomous Systems, large scale route distribution (think: around one Million routes for a full-table of the internet) and complex policies (e.g. due to business decisions, security reasons, etc.).

In addition to eBGP, there is also iBGP (the “i” stands for “interior”). iBGP operates within a single AS. But that does not necessarily mean that iBGP is an IGP.
iBGP is in a weird place. While in theory, iBGP could be classified as an IGP, it is usually not used like an IGP in practice. Most of the time iBGP is used to distribute routing information within a single AS. but not to establish basic connectivity within the AS. Usually it is paired with an IGP (like OSPF or IS-IS) that provides basic connectivity between the routers and on top of that, iBGP is used to distribute routing information (e.g. routes learned via eBGP from other networks, but also routing information for network virtualization such as EVPN-VXLAN, etc.).

Why you should not use OSPF to distribute a large number of routes

In the previous section we established that OSPF is an IGP designed to establish connectivity within a single AS.

OSPF is not a protocol designed to carry a large number of routes. It is designed to react quickly to changes and propagate those changes throughout the network. It is also designed for (semi-)automatic topology discovery. This is simply a tradeoff that comes with the design of OSPF.

When you try to use OSPF to distribute a large number of routes, bad things will happen. We will not go into the details of why this is the case, but there is an excellent blog article by Dmytro Shypovalov that explains this in detail. A similar, but slightly less dramatic situation applies to IS-IS.

BGP is the polar opposite of OSPF in this regard. BGP is designed for stability, to carry a large number of routes and to provide powerful policy control (e.g. route filtering, path manipulation, etc.). It is not designed to react quickly to topology changes or to discover topology automatically. This is also a tradeoff that comes with the design of BGP.

Use eBGP to establish trust boundaries

We established that OSPF is designed to establish connectivity within a single AS. Remember that an AS is defined as a network under a single administrative control. The other nodes are implicitly trusted because they are assumed to be under the same administrative control.

If you inject an absurd amount of routes into OSPF, it might take down your entire network, as laid out in Dmytro Shypovalov’s excellent blog article (really, read it!). Therefore, a network using OSPF is, to some degree, a single failure domain - similar to a broadcast domain as discussed in Part 1 of this series, but not quite as tragic.

Imagine you were using OSPF to peer with a network under different administrative control (essentially a different AS), and that network started injecting a large number of routes into your OSPF network (either maliciously or by accident). Your entire network could come crashing down. That is obviously not a desirable situation, and hence OSPF must ONLY be used within a single AS and trusted administrative domain and NOT between networks under different administrative control.

This is where BGP or specifically eBGP comes into play. eBGP was inherently designed to establish trust boundaries and exchange routing information between different ASes (hence the “Exterior” in eBGP). With BGP’s powerful policy controls, we can filter exactly which routes we want to accept from our eBGP peers to protect our and other networks. BGP also allows complex traffic engineering to control exactly how traffic flows.

Use the right tool for the job!

We’ve covered the differences between OSPF, IS-IS, BGP, IGP and EGP in the previous sections. The key takeaway? Use the right tool for the job. You know the saying: “if all you have is a hammer, everything looks like a nail.” Being a good engineer means having a proper toolbox and knowing the strengths and limitations of each tool in it.

As a rule of thumb, most small to medium-sized networks should use a combination of an IGP (like OSPF or IS-IS) for basic connectivity within the network and BGP for route distribution (iBGP within the network and eBGP to peer with other networks).

You might see eBGP-only designs (“eBGP everywhere”) that use eBGP as IGP. These are valid designs, but they come with a different set of advantages and disadvantages. Don’t do it just because it sounds cool or recommended by your favourite vendor. They mostly stem from hyperscaler designs where the scale and complexity of the network justify the tradeoffs.

Please don’t fall for the fallacy of using eBGP as IGP and then iBGP on top for route distribution. This approach is just adding unnecessary complexity without any real benefits and some people (rightfully) call it an abomination.

Know what each protocol does well, understand the tradeoffs, pick the approach that actually fits your requirements and not the one that’s just trendy.

How BGP works - An overview

If you’re at this point, I’ve hopefully been able to motivate why we need BGP and you should add it to your networking toolbox. Now, let’s take a look at how BGP works. We won’t go into the deep technical details, but I want to give you a good overview of the key concepts.

Contents of a simple BGP-4 UPDATE Message

The fundamental specification for BGP-4 (simple BGP with only IPv4) is defined in RFC 4271. BGP knows different kinds of messages, but the most important one is the UPDATE message. This message is used to advertise new routes or withdraw previously advertised routes.

Quoting from the RFC:

An UPDATE message is used to advertise feasible routes that share common path attributes to a peer, or to withdraw multiple unfeasible routes from service (see 3.1). An UPDATE message MAY simultaneously advertise a feasible route and withdraw multiple unfeasible routes from service.

Therefore, a basic BGP-4 update message contains the following key components:

Withdrawn Routes: A list of IP prefixes that are being withdrawn (i.e. no longer reachable via this route).
Path Attributes: A set of attributes that describe the characteristics of the route.
Network Layer Reachability Information (NLRI): A list of IP prefixes that are being advertised as reachable via this route.

The most important attributes are AS_PATH (the sequence of ASes that the route has traversed) and NEXT_HOP (the IP address of the next hop router for this route).

If you dissect a BGP-4 UPDATE from the Wireshark example dataset, you can see these components in action:

Example BGP-4 dissected message showing multiple NLRI

You can see that the UPDATE message advertises multiple NLRI (Network Layer Reachability Information) / prefixes (in this case, IPv4 prefixes).

BGP Attributes

BGP attributes are a key part of BGP routing. They provide additional information about the route and can be used to influence routing decisions.

BGP attributes themself have different classifications:

Well-Known vs. Optional - Well-known attributes must be recognized and supported by all BGP implementations, while optional attributes may not be understood by all routers.
Mandatory vs. Discretionary - Mandatory attributes must be present in every BGP update message, while discretionary attributes may or may not be included.
Transitive vs. Non-Transitive - Transitive attributes are passed along to other BGP peers even if not understood, while non-transitive attributes are removed if not recognized. This only applies to optional attributes, as all well-known attributes are inherently transitive.

Then a BGP router passes on a transitive attribute it does not understand, it marks it as “partial” to indicate that it did not fully process and understand the attribute.

BGP-4 defines the following standard attributes:

Well-Known Mandatory (automatically transitive):

AS_PATH - Lists all autonomous systems a route has traversed, used to prevent loops and prefer shorter paths
NEXT_HOP - IP address of the next router to reach the destination
ORIGIN - Indicates how the route was introduced into BGP (IGP, EGP, or incomplete), you will likely never see EGP

Well-Known Discretionary (automatically transitive):

LOCAL_PREF - Used within an AS to prefer certain exit points (higher values preferred)
ATOMIC_AGGREGATE - Indicates route summarization occurred

Optional Transitive:

AGGREGATOR - Identifies the router that performed route aggregation
COMMUNITY - Tags routes for grouping and policy application - more important than it seems, will be covered later in detail!

Optional Non-Transitive:

MED (Multi-Exit Discriminator) - Used between ASes to suggest preferred entry points into an AS (lower values preferred)

Please note that this only applies to basic BGP-4. There are many extensions to BGP that introduce additional attributes, such as MP-BGP (Multiprotocol BGP) which adds support for IPv6, VPNs, etc. In this case some mandatory attributes may become optional and new attributes are introduced. This will be covered later.

Looking at our Wireshark example again, you can see some of these attributes in action:

Example BGP-4 dissected message showing various attributes

BGP is a path vector protocol

In maths, physics and computer science, a vector is a collection or ordered list of elements to express something that cannot be expressed by a single value. Oftentimes a vector is also interpreted as an arrow (something that points into a certain direction).

In the context of BGP, the “path vector” or simply “path” refers to the sequence of ASes (Autonomous Systems) that a route has traversed. This shows some parallels to the interpreation of vectors as arrows - the path vector points from the source AS to the destination AS.

For each route that a BGP router learns, it also learns and keeps the path that the route has taken through the network of ASes. This allows BGP to prevent loops and to implement complex routing policies based on the AS path.

Autonomous Systems are represented by ASNs (AS Numbers)

Autonomous Systems (AS) are identified by numbers called ASNs (Autonomous System Numbers). These numbers are assigned by the Internet Assigned Numbers Authority (IANA) and are used to identify each AS in BGP routing.

ASNs can be either 2-byte or 4-byte numbers. If you want to participate in global routing in the Internet, you need to get a unique ASN assigned. For use within private networks, the ranges 64512-65534 and 4200000000-4294967294 are reserved.

The ranges 64496-64511 and 65536-65551 are reserved for documentation and sample code.

4-byte ASN representation

ASNs can be represented in three formats: Asplain, Asdot and Asdot+

Asplain is simply a decimal number.

Asdot is Asplain for ASNs smaller than 65536 and Asdot+ for ASNs starting at 65536.

Asdot+ works by breaking the ASN into a lower and an upper 2-byte half, which are written separated by a dot.

AS64496 -> 0.64496
AS99999 -> 1.34463 = 1x 65536 + 34463
AS4200000001 -> 64086.59905 =  64086x 65536 + 59905

iBGP and eBGP: BGP’s two operating modes

BGP can operate in two modes:

iBGP (interior Border Gateway Protocol)
eBGP (exterior Border Gateway Protocol)

iBGP used between peers in the same AS with the same ASN (ignoring tricks such as “Local AS”).

eBGP used between peers in different ASes and different ASNs.

In many ways they operate similar, but there are some important differences.

The most important thing to know is that iBGP has a full-mesh requirement. This means that every participating router must have an iBGP session with each other participating router. This is due to iBGPs loop-avoidance mechanism. This requirement is lifted if you use Route Reflectors which we will cover later. When you use Route Reflectors, only session between the routers and the Route Reflectors are required, not between the routers themselves anymore. Full-Mesh requirement does NOT mean that every router has to be directly connected to each other. Only the iBGP sessions must exist, but they can go over multiple hops (e.g. when Loopback connectivity is ensured by an IGP such as OSPF).

BGP Loop Avoidance in iBGP und eBGP

As a routing protocol, BGP needs a mechanism to avoid routing loops. This loop avoidance mechanism works differently between iBGP and eBGP.

eBGP loop avoidance

When a router sends a route to its eBGP neighbour, it prepends its own ASN to the ASPATH and changes the next-hop to itself. Sometimes routers prepend their ASN multiple times due traffic engineering purposes, but this achieves the same goal.

Therefore, eBGP loop avoidance works by checking the ASPATH of each received route. If a router detects its own ASN within this path, it considers the path invalid.

iBGP loop avoidance

In iBGP, all routers are in the same ASN. Therefore, the router cannot prepend its own ASN when sending out the route. This means loop detection in eBGP by checking the ASPATH for the occurrence of the own ASN cannot work.

iBGP uses a technique called “Split-Horizon”. Fundamentally, iBGP routers will NOT redistribute any iBGP routing information it received from a peer (except Route Reflectors, which we will cover later). In order to make sure that every peer receives all the routes, every router must peer with all other routers. If every router is connected to every other router, there is no need to redistribute iBGP routing information received from a peer. This is the Full-Mesh Requirement that was described before.

This Full-Mesh requirement has the downside of quadratic scaling of total the number of sessions. If you double the amount of routers, the number of sessions you must configure quadruples! At some point the number of session per router can also become too large, but I’ve seen ISP networks with around 400 routers in full mesh without any issues.

Essentially, iBGP is only used as a protocol to distribute routes, but not to establish basic connectivity and find shortest path within the AS. Thus iBGP is typically paired with an IGP such as OSPF. The IGP establishes basic connectivity between the routers and finds shortest paths, iBGP then distributes further routing information.

iBGP will also not change the next-hop when it redistributes eBGP routes. This may lead to complications, which can be fixed by a config setting usually called next-hop self that will be covered later.

iBGP Route Reflectors & eBGP Route Servers

In the previous section we learned that iBGP has a Full-Mesh requirement. This means that every router must peer with every other router. This quickly becomes unmanageable as the number of routers increases.

To solve this problem, BGP introduces the concept of Route Reflectors for iBGP RFC4456 and Route Servers for eBGP RFC 7947.

Both operate in a very similar way, with some minor differences. Both perform a route brokering service - they receive routes from clients and redistribute them to other clients.

Route Reflectors do exactly what the name suggests - they reflect routes. When a Route Reflector receives a route from a client, it reflects it to all other clients (with some exceptions). This way, clients only need to peer with the Route Reflector, not with each other. This breaks the Full-Mesh requirement and reduces the number of required sessions significantly. You only need to configure sessions between the Route Reflector and the clients. If you add a new client, you only need to configure a session between the new client and the Route Reflector and not between the new client and all other clients.

Route Servers operate in a similar way, but they are used in eBGP scenarios. A Route Server is typically deployed in a large peering environment, such as an Internet Exchange Point (IXP). Instead of requiring all peers to establish direct connections with each other, they can connect to the Route Server (although sometimes some peers still establish direct connections over the IXP). The Route Server then takes on the responsibility of redistributing routes between all connected peers.

Note that both Route Reflectors and Route Servers typically don’t insert themselves into the traffic path (imagine a Route Server at an IXP having to handle all the traffic). They only redistribute routing information. The actual traffic still flows directly between the clients/peers. This does not mean that particularly Route Reflectors cannot be in the traffic path. Oftentimes Spine Switches in a Leaf-Spine fabric are used as Route Reflectors, hence they are in the traffic path as well, but this is not required.

The BGP best path selection algorithm

When a BGP router receives multiple routes to the same destination, it goes through the following steps to select the best path. The router stops as soon as one path wins. This best path is the one that is installed into the routing table and used for forwarding traffic. This path is also advertised to other BGP peers (by default, BGP only advertises the best path, but often it can be configured to advertise multiple paths).

Usually there are around 12 steps in this process. The process may vary slightly between different vendors and implementations.

For the open source routing daemon (FRR (Free Range Routing))[https://docs.frrouting.org/en/latest/bgp.html], the steps are as follows:

Weight check: A special attribute that is local to the router to make absolutely sure a certain path is preferred. Higher weight wins.
Local preference check: Prefer higher LOCAL_PREF routes to lower
Local route check: Prefer local routes (statics, aggregates, redistributed) to received routes, optionally with AIGP (Accumulated IGP Metric Attribute)
AS path length check: Prefer shortest hop-count AS_PATHs.
Origin check: Prefer the lowest origin type route. That is, prefer IGP origin routes to EGP, to Incomplete routes.
MED check: Where routes with a Multi-Exit Discriminator were received from the same AS, prefer the route with the lowest MED.
External check: Prefer the route received from an external, eBGP peer over routes received from other types of peers.
IGP cost check: Prefer the route with the lower IGP cost.
Multi-path check: If multi-pathing is enabled, then check whether the routes not yet distinguished in preference may be considered equal.
Already-selected external check: Where both routes were received from eBGP peers, then prefer the route which is already selected.
Router-ID check: Prefer the route with the lowest router-ID.
Cluster-List length check: Prefer the route with the shortest cluster-list length.
Peer address check: Prefer the route received from the peer with the higher transport layer address, as a last-resort tie-breaker.

This list looks intimidating, but in practice, most of the time the decision is made in the first few steps. Except in complex scenarios with multiple paths and traffic engineering, you usually don’t have to worry about most of these steps, as they were designed to deliver intuitive and sensible results in most scenarios.

BGP communities

BGP communities provide a mechanism to attach extra information to routes. BGP communities could also be called “attributes”, “tags” or “labels”. But because “attribute” is already taken, they are called “communities”. The RFC (RFC 1997) justifies the name as follows:

Community
A community is a group of destinations which share some common property.
Each autonomous system administrator may define which communities a destination belongs to.

BGP communities themself are transitive optional attributes, meaning they are carried along as routes are propagated between autonomous systems (unless explicitly stripped).

BGP communities are one of the most powerful features of BGP. They allow network operators to tag routes with specific information that can be used to influence routing decisions and policies, such as traffic engineering, route filtering, debugging purposes, etc. Communities are often classified as Informational tags (purely informational) or Action tags (to influence routing decisions).

Some pre-defined BGP communities exist (e.g. so-called well-known communities for normal 16-bit communities), but most communities are locally defined. This means that each AS can define their own communities and their meanings.

BGP Communities exist in different formats:

Standard Communities: 32-bit values (4 octets), usually represented as two 16-bit numbers separated by a colon, i.e. ASN:COMMUNITY (e.g. 65000:100)
Extended Communities: 64-bit values (8 octets), used for more complex tagging and policies (e.g. VPNs), has a field to indicate whether this extended community is transitive or non-transitive, consists of a type, sub-type and value fields depending on the type
Large Communities: 96-bit values (12 octets), introduced to provide even more flexibility and scalability, usually represented as three 32-bit numbers separated by colons, i.e. ASN:VALUE1:VALUE2 (e.g. 2914:65400:38016)

Some examples for BGP communities are:

NO_EXPORT (well-known community), 65535:65281 or 0xFFFFFF01: Prevents the route from being advertised outside the local AS.
NO_ADVERTISE (well-known community), 65535:65282 or 0xFFFFFF02: Prevents the route from being advertised to any peer.
GRACEFUL_SHUTDOWN (well-known community), 65535:0 or 0xFFFF0000: Signal the graceful shutdown of paths
ROUTE_TARGET (extended community): Used in VPNs to identify which routes belong to which VPN.
6939:9001 - Hurricane Electric (AS6939), “Learned in North America”
34927:9500 - iFog (AS34927), “Do not export to Deutsche Telekom”
8283:8:3512 - ColoClue (AS8283), “Received via Frys-IX”

Route filtering and manipulation with route-maps and vendor specific policy languages

While BGP provides the routing protocol framework, the real power comes from policy languages that allow you to filter and manipulate routes. Different vendors have different approaches, but the concepts are similar.

The classic BGP filtering mechanism is called route-maps (e.g. classic Cisco IOS, Arista, FRR). Some vendors have their own policy languages (e.g. Arista’s RCF, Juniper’s routing policies, BIRD’s filtering language), but the concepts are similar.

Route filtering & manipulation can for example be applied on inbound direction (to filter or manipulate routes received from a peer) and outbound direction (to filter or manipulate routes sent to a peer) or for redistribution between different routing protocols (e.g. when redistributing routes from OSPF into BGP or vice versa).

Some great examples for route filtering & manipulation the DENOG Routing Guide or NLNOG BGP Filter Guides. Let’s look at some examples.

Let’s look at the Filtering Small Prefixes example from the NLNOG BGP Filter Guide in Arista EOS:

ip prefix-list TINY-PREFIX-V4
   seq 1 permit 0.0.0.0/0 ge 25 le 32
!
ipv6 prefix-list TINY-PREFIX-V6
   seq 1 permit ::/0 ge 49 le 128
!
route-map NAME-IN-V4 deny 50
    match ip address prefix-list TINY-PREFIX-V4
!
route-map NAME-IN-V6 deny 50
    match ipv6 address prefix-list TINY-PREFIX-V6
!

Route-maps are processed sequentially. Each entry has:

Sequence number (10, 20, 30, etc.): Determines processing order
Action (permit/deny): What to do if the match succeeds
Match statements: Criteria to match routes
Set statements: Optional actions to take on matched routes

The ip prefix-lists will match small prefixes, such as IPv4 prefixes between /25 and /32 and IPv6 prefixes between /49 and /128. In the route map itself, there is a deny statement. That means prefixes matching the prefix list will be denied (i.e. filtered).

If you want to set a community in a route map, you can do it like this:

route-map specific-peer-out permit 50
    set community 1234:4567

Let’s look at something more complex written in Arista’s RCF language from the DENOG Routing Guide.

This function could be used to filter very long AS_PATH, specifically paths longer than 50 AS hops.

router general
control-functions
   code unit example
function as_path_to_long() {
  return not as_path.length <= 50;
}
function example_in() {
  if as_path_to_long() {
    exit false;
  }
}

Route Redistribution

Route redistribution is the process of taking routes learned via one routing protocol and injecting them into another routing protocol. Typical examples are the redistribution of routes from OSPF (or other IGPs) into BGP or redistribution of connected routes (subnets from interfaces) into BGP.

Route-maps can (and should) be used to filter and manipulate routes during redistribution. A typical example would be

router bgp 64496
    redistribute ospf route-map my-ospf-redist-map

Multiprotocol BGP (MP-BGP) and more attributes

As you may have noticed, I have used the term “Routing Information” multiple times. I tried to avoid the term “routes” in some places, because BGP is not limited to just distributing IPv4 routes.

BGP has been extended multiple times to support additional address families and use cases. One of the most important extensions is called Multiprotocol BGP (MP-BGP), defined in RFC 4760. MP-BGP is transmitted with the BGP attributes MP_REACH_NLRI and MP_UNREACH_NLRI.

MP-BGP enables support for VPNs, Tunnels, Multicast, virtual wires, etc… For this purpose, MP-BGP introduces the concept of Address Families (AFI) and Subsequent Address Families (SAFI). Together, these define what type of routing information is being carried.

One very common use case is the distribution of IPv4 routes with IPv6 nexthops. This happens when you only use IPv6 for your underlay (BGP Peering), oftentimes due to a technique called “BGP unnumbered”, but still want to distribute IPv4 routes.

AFI 1 is IPv4, AFI 2 is IPv6. SAFI 1 is Unicast, SAFI 2 is Multicast. SAFI 128 is used for L3VPNs (MPLS VPNs), SAFI 70 is used for EVPN. Combined, some common AFI/SAFI combinations are:

IPv4 Unicast (AFI 1, SAFI 1): Traditional IPv4 routing
IPv6 Unicast (AFI 2, SAFI 1): IPv6 routing
IPv4 Multicast (AFI 1, SAFI 2): IPv4 multicast routing information
IPv6 Multicast (AFI 2, SAFI 2): IPv6 multicast routing information
L3VPN IPv4 (AFI 1, SAFI 128): MPLS VPN routing for IPv4
L3VPN IPv6 (AFI 2, SAFI 128): MPLS VPN routing for IPv6
EVPN (AFI 25, SAFI 70): Ethernet VPN for VXLAN overlays

Even more BGP extensions and attributes exist, such as BGP-LS which delivers network topology information and carries interior gateway protocol link-state database information.

Common pitfalls

BGP has some common pitfalls that you should be aware of. We will be looking into two of them here.

next-hop self for eBGP routes redistributed into iBGP

When you redistribute eBGP routes into iBGP, the next-hop attribute of the eBGP route is usually not changed. Usually your iBGP peers cannot reach the next-hop of the eBGP route, because the next-hop is usually the IP address of the eBGP peer itself and there is simply no route (the eBGP peer is not taking part in your IGP)

One way to fix this is to inject the route to this specific eBGP peer into your IGP (e.g. OSPF). But this is cumbersome and not scalable.

The better way to fix this is to use the next-hop self command in your iBGP configuration. This command changes the next-hop of redistributed eBGP routes to the IP address of the iBGP router itself. This way, your iBGP peers can reach the next-hop via the IGP.

eBGP multihop

By default, eBGP peers are expected to be directly connected. If they are not, the eBGP session will not come up, because the TTL (Time To Live) of the eBGP packets is set to 1 by default.

If you want to peer with an eBGP peer that is not directly connected, you need to use the ebgp-multihop feature. This feature allows you to set the TTL of eBGP packets to a higher value, allowing them to traverse multiple hops.

Common pattern: Peering via loopbacks & unnumbered links

We have already discussed that you often want to use an IGP such as OSPF or IS-IS to establish basic connectivity or loopback connectivity within your network. Loopback interfaces are virtual interfaces that are always up as long as the router is up. They provide a stable and unique IP address for the router.

As we have seen with OSPF, Loopback addresses can be used for unnumbered interfaces. Unnumbered interfaces do not have their own IP address, but borrow the IP address of another interface (usually a loopback). This way, you don’t need to assign IP addresses to point-to-point links between routers, saving IP address space and configuration effort.

When peering BGP, it is a common pattern to establish BGP sessions via loopback addresses. This way, the BGP session remains stable even if the physical interface goes down or the IP address of the physical interface changes. The IGP (e.g. OSPF) ensures that the loopback addresses are reachable. Sometimes you also see BGP layered on top of BGP, i.e. one BGP session is used to establish loopback connectivity and another BGP session when works on top of these loopbacks.

BGP unnumbered works differently than OSPF unnumbered. BGP unnumbered is not true unnumbered, because it works via IPv6. It uses the link-local IPv6 addresses of the unnumbered interfaces to establish the BGP session. You still don’t need to assign IP addresses to the point-to-point link, but it works differently than OSPF unnumbered.

Example topologies

We will use the topology from the last article, Part 3: OSPF for self-healing networks that just work (TM) as basis for some modifications.

Example topology: OSPF + iBGP + eBGP

Due to a bug in the OSPF unnumbered ECMP logic in Arista EOS (at least in versions 4.34.3M and 4.35.0F), we are setting maximum-paths 1 on the OSPF configuration. For more details, check my blog article The curious case of an OSPF unnumbered ECMP bug in Arista EOS.

Example topology for the OSPF unnumbered + iBGP + eBGP scenario

In this example, we use unnumbered OSPF as our IGP (like in the previous example), but we add iBGP on top for route distribution within the AS. We assume that broadcast-truck is operated by a different administrative entity (e.g. a rental company) and we want to peer with them via eBGP. Therefore we add eBGP between foh and broadcast-truck to establish a trust boundary between the two networks. For eBGP, we use a technique called BGP unnumbered (which is not true unnumbered, because it works via IPv6, but it works), so that we don’t need to assign IP addresses to the point-to-point link between foh and broadcast-truck.

There are only minimal modifications to the OSPF section. The most important thing is that we remove the network A.B.C.D/24 area area 0.0.0.0 statements, as those will be announced via iBGP. We also remove the OSPF adjacency between foh and broadcast-truck as those two routers will peer via eBGP.

For stage-left, our OSPF config looks like this:

router ospf 1
   router-id 10.3.0.1
   auto-cost reference-bandwidth 800000
   bfd default
   network 10.3.0.1/32 area 0.0.0.0
   maximum-paths 1
   max-lsa 12000

The iBGP configuration is very similar on all nodes, let’s take stage-center as an example:

router bgp 64496
   router-id 10.2.0.1
   no bgp default ipv4-unicast
   maximum-paths 8
   neighbor ibgp-peers peer group
   neighbor ibgp-peers remote-as 64496
   neighbor ibgp-peers update-source Loopback1
   neighbor 10.1.0.1 peer group ibgp-peers
   neighbor 10.3.0.1 peer group ibgp-peers
   neighbor 10.4.0.1 peer group ibgp-peers
   neighbor 10.5.0.1 peer group ibgp-peers
   neighbor 10.6.0.1 peer group ibgp-peers
   neighbor 10.7.0.1 peer group ibgp-peers
   neighbor 10.8.0.1 peer group ibgp-peers
   !
   address-family ipv4
      neighbor ibgp-peers activate

router bgp 64496 - activates BGP with the correct ASN
router-id 10.2.0.1 - sets the router ID for BGP, just like in OSPF
no bgp default ipv4-unicast - disables that the the IPv4 unicast address family is enabled by default - this is simply good practice, because the default behaviour is often not desired
maximum-paths 8 - enable ECMP
neighbor ibgp-peers peer group - creates a peer group for iBGP peers. A peer group allows to configure multiple peers with the same settings easily.
neighbor ibgp-peers remote-as 64496 - sets the remote AS for the peer group to the same AS (iBGP)
neighbor ibgp-peers update-source Loopback1 - sets the source for the BGP session to the address of Loopback 1 - this is important, because we want to peer via loopbacks (loopbacks stay stable even when the IGP topology changes). In this case it would work without, but just by “accident”, simply because BGP does not have any other addresses available due to the usage of OSPF unnumbered, but it is good practice to set it explicitly.
neighbor X.X.X.X peer group ibgp-peers - adds the iBGP peers to the peer group - this is full mesh, every other router in the AS and we peer via loopbacks
address-family ipv4 neighbor ibgp-peers activate - activates IPv4 unicast for the peer group (if no address family is configured, the BGP session will not be established)

On stage-left and delay-row2-right we also have a network 10.X.10.0/24 under the address-family ipv4 to advertise the connected client subnets into BGP!

The configuration for foh and broadcast-truck is a bit more complex - let’s look at broadcast-truck as it only has the eBGP in isolation, and foh is simply a mix of both.

interface Ethernet1
   speed 100g-4
   no switchport
   ipv6 enable

ipv6 enable is new - because BGP unnumbered works via IPv6, and IPv6 is not enabled by default on interfaces on Arista, we need to enable it explicitly

interface Vlan1
   ip address 10.9.10.1/24

We have also added a new client on broadcast-truck, so we need to assign an IP address to the VLAN interface.

ip routing ipv6 interfaces - A magic flag that is needed on Arista to make this work. Apparently it enabled routing on interfaces that don’t have an IPv4 address assigned. It’s sparsely documented… We just accept that we need it and otherwise routing traffic through this interface won’t work.

ipv6 unicast-routing - Less obscure, this simply enables IPv6 routing on the router. Necessary for BGP unnumbered.

router bgp 64497
   router-id 10.9.0.1
   no bgp default ipv4-unicast
   maximum-paths 8
   neighbor event-network-ebgp peer group
   neighbor interface Et1-6 peer-group event-network-ebgp remote-as 64496
   !
   address-family ipv4
      neighbor event-network-ebgp activate
      neighbor event-network-ebgp next-hop address-family ipv6 originate
      network 10.9.0.1/32
      network 10.9.10.0/24

router bgp 64497 - activates BGP with the correct ASN for broadcast-truck
router-id 10.9.0.1 - sets the router ID for BGP, just like in OSPF
no bgp default ipv4-unicast - disables that the the IPv4 unicast address family is enabled by default - this is simply good practice, because the default behaviour is often not desired
maximum-paths 8 - enable ECMP
neighbor event-network-ebgp peer group - creates a peer group for eBGP peers. A peer group allows to configure multiple peers with the same settings easily. Mandatory for eBGP unnumbered!
neighbor interface Et1-6 peer-group event-network-ebgp remote-as 64496 - adds eBGP unnumbered neighbours on Ethernet Interfaces 1-6 to the peer group with the correct remote AS
address-family ipv4 - starts the IPv4 unicast address family
- neighbor event-network-ebgp activate - activates IPv4 unicast for the peer group (if no address family is configured, the BGP session will not be established), because we want to exchange IPv4 routes like the other router’s Loopbacks which are IPv4
- neighbor event-network-ebgp next-hop address-family ipv6 originate - because we are using BGP unnumbered, we need to tell BGP to allow the encoding of IPv4 addresses with IPv6 nexthop
- network 10.9.0.1/32 - advertises the loopback address into BGP
- network 10.9.10.0/24 - advertises the VLAN interface into BGP

On foh, we additionally have redistribute ospf - this redistributes OSPF routes (our loopbacks) into BGP so that broadcast-truck can learn about them.

We also need the neighbor ibgp-peers next-hop-self - we already discussed next-hop-self in the pitfalls section. Because foh is redistributing eBGP routes into iBGP, we need to make sure that the next-hop is reachable by the iBGP peers. In this special case due to iBGP unnumbered, we wouldn’t even see the routes on the other iBGP peers without this command, because the next-hop is an IPv6 address and the neighbours are not activated for IPv6 address family!

In an IPv4 scenario without next-hop-self you would see the routes, but they would be unusable because the next-hop (the broadcast truck) would not be reachable.

Let’s look into stage-center

stage-center(config)#show ip route bgp

VRF: default
Source Codes:
       C - connected, S - static, K - kernel,
       O - OSPF, O IA - OSPF inter area, O E1 - OSPF external type 1,
       O E2 - OSPF external type 2, O N1 - OSPF NSSA external type 1,
       O N2 - OSPF NSSA external type2, O3 - OSPFv3,
       O3 IA - OSPFv3 inter area, O3 E1 - OSPFv3 external type 1,
       O3 E2 - OSPFv3 external type 2,
       O3 N1 - OSPFv3 NSSA external type 1,
       O3 N2 - OSPFv3 NSSA external type2, B - Other BGP Routes,
       B I - iBGP, B E - eBGP, R - RIP, I L1 - IS-IS level 1,
       I L2 - IS-IS level 2, A B - BGP Aggregate,
       A O - OSPF Summary, NG - Nexthop Group Static Route,
       V - VXLAN Control Service, M - Martian,
       DH - DHCP client installed default route,
       DP - Dynamic Policy Route, L - VRF Leaked,
       G  - gRIBI, RC - Route Cache Route,
       CL - CBF Leaked Route

 B I      10.3.10.0/24 [200/0]
           via 10.3.0.1, Ethernet7
 B I      10.8.10.0/24 [200/0]
           via 10.1.0.1, Ethernet1
 B I      10.9.0.1/32 [200/0]
           via 10.1.0.1, Ethernet1
 B I      10.9.10.0/24 [200/0]
           via 10.1.0.1, Ethernet1

We can see that we have learned multiple BGP routes via iBGP (denoted by the B I source code):

10.3.10.0/24 and 10.8.10.0/24 are the client subnets from stage-left and delay-row2-right which were previously distributed via OSPF
10.9.0.1/32 is the loopback interface of broadcast-truck
10.9.10.0/24 is the client subnet of broadcast-truck

If we do show ip bgp 10.9.10.0/24:

stage-center(config)#show ip bgp 10.9.10.0/24
BGP routing table information for VRF default
Router identifier 10.2.0.1, local AS number 64496
BGP routing table entry for 10.9.10.0/24
 Paths: 1 available
  64497
    10.1.0.1 from 10.1.0.1 (10.1.0.1)
      Origin IGP, metric 0, localpref 100, IGP metric 18, weight 0, tag 0
      Received 00:28:11 ago, valid, internal, best
      Rx SAFI: Unicast

We can see that the 10.9.10.0/24 route was originated by 10.1.0.1 which is our foh router. The route was learned from AS 64497 which is broadcast-truck.

With show ip bgp we get some more interesting information:

stage-center(config)#show ip bgp 
BGP routing table information for VRF default
Router identifier 10.2.0.1, local AS number 64496
Route status codes: s - suppressed contributor, * - valid, > - active, E - ECMP head, e - ECMP
                    S - Stale, c - Contributing to ECMP, b - backup, L - labeled-unicast, q - Pending FIB install
                    % - Pending best path selection
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI Origin Validation codes: V - valid, I - invalid, U - unknown
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Link Local Nexthop

          Network                Next Hop              Metric  AIGP       LocPref Weight  Path
[...]
 * >      10.9.0.1/32            10.1.0.1              0       -          100     0       64497 i
 * >      10.9.10.0/24           10.1.0.1              0       -          100     0       64497 i

We can see all the BGP routes learned from broadcast-truck (AS 64497) with foh as next-hop (10.1.0.1)

Let’s look at foh:

foh(config-router-ospf)#show ip route bgp

VRF: default
Source Codes:
       C - connected, S - static, K - kernel,
       O - OSPF, O IA - OSPF inter area, O E1 - OSPF external type 1,
       O E2 - OSPF external type 2, O N1 - OSPF NSSA external type 1,
       O N2 - OSPF NSSA external type2, O3 - OSPFv3,
       O3 IA - OSPFv3 inter area, O3 E1 - OSPFv3 external type 1,
       O3 E2 - OSPFv3 external type 2,
       O3 N1 - OSPFv3 NSSA external type 1,
       O3 N2 - OSPFv3 NSSA external type2, B - Other BGP Routes,
       B I - iBGP, B E - eBGP, R - RIP, I L1 - IS-IS level 1,
       I L2 - IS-IS level 2, A B - BGP Aggregate,
       A O - OSPF Summary, NG - Nexthop Group Static Route,
       V - VXLAN Control Service, M - Martian,
       DH - DHCP client installed default route,
       DP - Dynamic Policy Route, L - VRF Leaked,
       G  - gRIBI, RC - Route Cache Route,
       CL - CBF Leaked Route

 B I      10.3.10.0/24 [200/0]
           via 10.2.0.1, Ethernet1
 B I      10.8.10.0/24 [200/0]
           via 10.6.0.1, Ethernet9
 B E      10.9.0.1/32 [200/0]
           via fe80::a8c1:abff:fe2f:2543, Ethernet11
           via fe80::a8c1:abff:fe4d:7a64, Ethernet12
           via fe80::a8c1:abff:fe2b:ce95, Ethernet13
           via fe80::a8c1:abff:fe9c:f18f, Ethernet14
           via fe80::a8c1:abff:feeb:aadb, Ethernet15
           via fe80::a8c1:abff:febb:c483, Ethernet16
 B E      10.9.10.0/24 [200/0]
           via fe80::a8c1:abff:fe2f:2543, Ethernet11
           via fe80::a8c1:abff:fe4d:7a64, Ethernet12
           via fe80::a8c1:abff:fe2b:ce95, Ethernet13
           via fe80::a8c1:abff:fe9c:f18f, Ethernet14
           via fe80::a8c1:abff:feeb:aadb, Ethernet15
           via fe80::a8c1:abff:febb:c483, Ethernet16

We can see the intra-AS client subnets 10.3.10.0/24 and 10.8.10.0/24 learned via iBGP. We can also see the eBGP routes from broadcast-truck (10.9.0.1/32 as loopback and 10.9.10.0/24 as client subnet) with multiple next-hops (the link-local IPv6 addresses of the Ethernet interfaces towards broadcast-truck due to BGP unnumbered), as well as ECMP kicking in. With show ip bgp this is even more obvious:

foh(config-router-ospf)#show ip bgp 
BGP routing table information for VRF default
Router identifier 10.1.0.1, local AS number 64496
Route status codes: s - suppressed contributor, * - valid, > - active, E - ECMP head, e - ECMP
                    S - Stale, c - Contributing to ECMP, b - backup, L - labeled-unicast, q - Pending FIB install
                    % - Pending best path selection
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI Origin Validation codes: V - valid, I - invalid, U - unknown
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Link Local Nexthop

          Network                Next Hop              Metric  AIGP       LocPref Weight  Path
[...]
 * >Ec    10.9.0.1/32            fe80::a8c1:abff:fe4d:7a64%Et12 0       -          100     0       64497 i
 *  ec    10.9.0.1/32            fe80::a8c1:abff:feeb:aadb%Et15 0       -          100     0       64497 i
 *  ec    10.9.0.1/32            fe80::a8c1:abff:fe2f:2543%Et11 0       -          100     0       64497 i
 *  ec    10.9.0.1/32            fe80::a8c1:abff:fe2b:ce95%Et13 0       -          100     0       64497 i
 *  ec    10.9.0.1/32            fe80::a8c1:abff:febb:c483%Et16 0       -          100     0       64497 i
 *  ec    10.9.0.1/32            fe80::a8c1:abff:fe9c:f18f%Et14 0       -          100     0       64497 i
 * >Ec    10.9.10.0/24           fe80::a8c1:abff:fe4d:7a64%Et12 0       -          100     0       64497 i
 *  ec    10.9.10.0/24           fe80::a8c1:abff:feeb:aadb%Et15 0       -          100     0       64497 i
 *  ec    10.9.10.0/24           fe80::a8c1:abff:fe2f:2543%Et11 0       -          100     0       64497 i
 *  ec    10.9.10.0/24           fe80::a8c1:abff:fe2b:ce95%Et13 0       -          100     0       64497 i
 *  ec    10.9.10.0/24           fe80::a8c1:abff:febb:c483%Et16 0       -          100     0       64497 i
 *  ec    10.9.10.0/24           fe80::a8c1:abff:fe9c:f18f%Et14 0       -          100     0       64497 i

Last but not least, let’s check what the broadcast-truck router sees:

broadcast-truck>show ip route

VRF: default
Source Codes:
       C - connected, S - static, K - kernel,
       O - OSPF, O IA - OSPF inter area, O E1 - OSPF external type 1,
       O E2 - OSPF external type 2, O N1 - OSPF NSSA external type 1,
       O N2 - OSPF NSSA external type2, O3 - OSPFv3,
       O3 IA - OSPFv3 inter area, O3 E1 - OSPFv3 external type 1,
       O3 E2 - OSPFv3 external type 2,
       O3 N1 - OSPFv3 NSSA external type 1,
       O3 N2 - OSPFv3 NSSA external type2, B - Other BGP Routes,
       B I - iBGP, B E - eBGP, R - RIP, I L1 - IS-IS level 1,
       I L2 - IS-IS level 2, A B - BGP Aggregate,
       A O - OSPF Summary, NG - Nexthop Group Static Route,
       V - VXLAN Control Service, M - Martian,
       DH - DHCP client installed default route,
       DP - Dynamic Policy Route, L - VRF Leaked,
       G  - gRIBI, RC - Route Cache Route,
       CL - CBF Leaked Route

Gateway of last resort is not set

 B E      10.1.0.1/32 [200/0]
           via fe80::a8c1:abff:fe77:620, Ethernet1
           via fe80::a8c1:abff:fe22:33cd, Ethernet2
           via fe80::a8c1:abff:fe1b:77b7, Ethernet3
           via fe80::a8c1:abff:fe09:4522, Ethernet4
           via fe80::a8c1:abff:fefa:f8bc, Ethernet5
           via fe80::a8c1:abff:fe3a:fcea, Ethernet6
 B E      10.2.0.1/32 [200/0]
           via fe80::a8c1:abff:fe77:620, Ethernet1
           via fe80::a8c1:abff:fe22:33cd, Ethernet2
           via fe80::a8c1:abff:fe1b:77b7, Ethernet3
           via fe80::a8c1:abff:fe09:4522, Ethernet4
           via fe80::a8c1:abff:fefa:f8bc, Ethernet5
           via fe80::a8c1:abff:fe3a:fcea, Ethernet6
[...]

Here we see our redistributed OSPF Loopbacks, reachable via eBGP!

Example topology: OSPF + iBGP with Route Reflector + eBGP

Example topology for the OSPF unnumbered + iBGP Route Reflector + eBGP scenario

In this example, we modify the previous example by introducing a Route Reflector (RR) to avoid a full mesh iBGP configuration. We will use a single, separate Route Reflector (do not do this at home) server for simplicity, but in production you would want to have at least two RRs for redundancy.

This route reflector gets the loopback 10.10.0.1 and is part of the OSPF unnumbered network like all other routers. All other routers peer with the RR via iBGP.

Let’s look at the configs and take stage-center as an example again:

router bgp 64496
   router-id 10.2.0.1
   no bgp default ipv4-unicast
   maximum-paths 8
   neighbor rr peer group
   neighbor rr remote-as 64496
   neighbor rr update-source Loopback1
   neighbor 10.10.0.1 peer group rr
   !
   address-family ipv4
      neighbor rr activate

As you can see, the config has become much simpler. We have got rid of all the iBGP peers and replaced them with a single RR peer group.

Let’s look at the RR config first:

router bgp 64496
   router-id 10.10.0.1
   no bgp default ipv4-unicast
   maximum-paths 8
   neighbor ibgp-peers peer group
   neighbor ibgp-peers remote-as 64496
   neighbor ibgp-peers update-source Loopback1
   neighbor ibgp-peers route-reflector-client
   neighbor 10.1.0.1 peer group ibgp-peers
   neighbor 10.2.0.1 peer group ibgp-peers
   neighbor 10.3.0.1 peer group ibgp-peers
   neighbor 10.4.0.1 peer group ibgp-peers
   neighbor 10.5.0.1 peer group ibgp-peers
   neighbor 10.6.0.1 peer group ibgp-peers
   neighbor 10.7.0.1 peer group ibgp-peers
   neighbor 10.8.0.1 peer group ibgp-peers
   !
   address-family ipv4
      neighbor ibgp-peers activate

As you can see, the RR peers with all other routers via iBGP and is configured as route-reflector for the ibgp-peers peer group.

The important part is the neighbor ibgp-peers route-reflector-client line. This tells the RR that all peers in the ibgp-peers peer group are route-reflector clients. Therefore, the RR will reflect routes between these clients.

For FOH, I have made a little change for the sake of demonstration:

route-map NEXT-HOP-SELF-FOR-EBGP permit 10
   match route-type external
   set ip next-hop 10.1.0.1

router bgp 64496
   router-id 10.1.0.1
   ...
   !
   address-family ipv4
      ...
      neighbor rr activate
      neighbor rr route-map NEXT-HOP-SELF-FOR-EBGP out

Instead of using next-hop-self, we are using a route-map to set the next-hop for all external routes (eBGP routes) to foh itself (10.1.0.1) when they are sent out to the Route Reflector. This is just to demonstrate that there are multiple ways to achieve the same goal.

Let’s look at the RR:

rr1>show ip bgp
BGP routing table information for VRF default
Router identifier 10.10.0.1, local AS number 64496
Route status codes: s - suppressed contributor, * - valid, > - active, E - ECMP head, e - ECMP
                    S - Stale, c - Contributing to ECMP, b - backup, L - labeled-unicast, q - Pending FIB install
                    % - Pending best path selection
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI Origin Validation codes: V - valid, I - invalid, U - unknown
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Link Local Nexthop

          Network                Next Hop              Metric  AIGP       LocPref Weight  Path
 * >      10.3.10.0/24           10.3.0.1              0       -          100     0       i
 * >      10.8.10.0/24           10.8.0.1              0       -          100     0       i
 * >      10.9.0.1/32            10.1.0.1              0       -          100     0       64497 i
 * >      10.9.10.0/24           10.1.0.1              0       -          100     0       64497 i

Here we can see that the RR has learned all routes from the clients, both the internal ones and the external ones from broadcast-truck. The next-hop for the external routes is correctly set to foh (10.1.0.1).

Looking at one of the routes in more detail:

rr1>show ip bgp 10.9.10.0/24 detail 
BGP routing table information for VRF default
Router identifier 10.10.0.1, local AS number 64496
BGP routing table entry for 10.9.10.0/24
 Paths: 1 available
  64497 (Received from a RR-client)
    10.1.0.1 from 10.1.0.1 (10.1.0.1)
      Origin IGP, metric 0, localpref 100, IGP metric 18, weight 0, tag 0
      Received 00:56:49 ago, valid, internal, best
      Rx SAFI: Unicast
 Advertised to 7 peers:
  peer-group ibgp-peers:
    10.2.0.1                10.3.0.1                10.4.0.1
    10.5.0.1                10.6.0.1                10.7.0.1
    10.8.0.1

We can also see that it’s reflected / advertised to all peers, except 10.1.0.1, which is the original client (foh).

rr1>show ip bgp sum
BGP summary information for VRF default
Router identifier 10.10.0.1, local AS number 64496
Neighbor Status Codes: m - Under maintenance
  Neighbor V AS           MsgRcvd   MsgSent  InQ OutQ  Up/Down State   PfxRcd PfxAcc PfxAdv
  10.1.0.1 4 64496             70        71    0    0 00:55:35 Estab   2      2      2
  10.2.0.1 4 64496             70        71    0    0 00:55:32 Estab   0      0      4
  10.3.0.1 4 64496             70        70    0    0 00:55:30 Estab   1      1      3
  10.4.0.1 4 64496             68        72    0    0 00:55:31 Estab   0      0      4
  10.5.0.1 4 64496             69        72    0    0 00:55:30 Estab   0      0      4
  10.6.0.1 4 64496             70        73    0    0 00:55:34 Estab   0      0      4
  10.7.0.1 4 64496             69        72    0    0 00:55:29 Estab   0      0      4
  10.8.0.1 4 64496             70        70    0    0 00:55:31 Estab   1      1      3

We can see that all iBGP peers are established with the RR.

Now let’s look at stage-center again:

stage-center>show ip bgp sum
BGP summary information for VRF default
Router identifier 10.2.0.1, local AS number 64496
Neighbor Status Codes: m - Under maintenance
  Neighbor  V AS           MsgRcvd   MsgSent  InQ OutQ  Up/Down State   PfxRcd PfxAcc PfxAdv
  10.10.0.1 4 64496             76        74    0    0 00:59:13 Estab   4      4      0

It only has one neighbour - the RR.

stage-center>show ip bgp
BGP routing table information for VRF default
Router identifier 10.2.0.1, local AS number 64496
Route status codes: s - suppressed contributor, * - valid, > - active, E - ECMP head, e - ECMP
                    S - Stale, c - Contributing to ECMP, b - backup, L - labeled-unicast, q - Pending FIB install
                    % - Pending best path selection
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI Origin Validation codes: V - valid, I - invalid, U - unknown
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Link Local Nexthop

          Network                Next Hop              Metric  AIGP       LocPref Weight  Path
 * >      10.3.10.0/24           10.3.0.1              0       -          100     0       i Or-ID: 10.3.0.1 C-LST: 10.10.0.1 
 * >      10.8.10.0/24           10.8.0.1              0       -          100     0       i Or-ID: 10.8.0.1 C-LST: 10.10.0.1 
 * >      10.9.0.1/32            10.1.0.1              0       -          100     0       64497 i Or-ID: 10.1.0.1 C-LST: 10.10.0.1 
 * >      10.9.10.0/24           10.1.0.1              0       -          100     0       64497 i Or-ID: 10.1.0.1 C-LST: 10.10.0.1

We can see that stage-center has learned all routes via the RR, with the correct next-hops. We also see two new fields: Or-ID (Originator ID) and C-LST (Cluster List). These are used by the RR to prevent routing loops.

The Originator ID is the router ID of the original advertiser of the route, and the Cluster List is a list of RRs that have processed the route.

iBGP Route Reflectors violate the iBGP split-horizon rule, which states that iBGP learned routes should not be advertised to other iBGP peers. Therefore, the Originator ID is introduced to identify the original source of the route. If a client receives a route with its own router ID as the Originator ID, it knows that it is the original source and will not process the route again.

If a RR sees its own router ID in the Cluster List, it knows that it has already processed the route and will not reflect it again.

The question as to whether in scenarios with multiple Route Reflectors the Cluster-ID of each RR should be identical or unique cannot be answered definitively. It depends…

Finally, let’s look at stage-right and broadcast-truck

stage-right>show ip route bgp

VRF: default
Source Codes:
       C - connected, S - static, K - kernel,
       O - OSPF, O IA - OSPF inter area, O E1 - OSPF external type 1,
       O E2 - OSPF external type 2, O N1 - OSPF NSSA external type 1,
       O N2 - OSPF NSSA external type2, O3 - OSPFv3,
       O3 IA - OSPFv3 inter area, O3 E1 - OSPFv3 external type 1,
       O3 E2 - OSPFv3 external type 2,
       O3 N1 - OSPFv3 NSSA external type 1,
       O3 N2 - OSPFv3 NSSA external type2, B - Other BGP Routes,
       B I - iBGP, B E - eBGP, R - RIP, I L1 - IS-IS level 1,
       I L2 - IS-IS level 2, A B - BGP Aggregate,
       A O - OSPF Summary, NG - Nexthop Group Static Route,
       V - VXLAN Control Service, M - Martian,
       DH - DHCP client installed default route,
       DP - Dynamic Policy Route, L - VRF Leaked,
       G  - gRIBI, RC - Route Cache Route,
       CL - CBF Leaked Route

 B I      10.3.10.0/24 [200/0]
           via 10.2.0.1, Ethernet3
 B I      10.8.10.0/24 [200/0]
           via 10.6.0.1, Ethernet1
 B I      10.9.0.1/32 [200/0]
           via 10.2.0.1, Ethernet3
 B I      10.9.10.0/24 [200/0]
           via 10.2.0.1, Ethernet3

All routes are there with resolved next-hops. Nice!

In broadcast-truck the output is a bit longer, because everything is ECMP’d:

broadcast-truck>show ip bgp
BGP routing table information for VRF default
Router identifier 10.9.0.1, local AS number 64497
Route status codes: s - suppressed contributor, * - valid, > - active, E - ECMP head, e - ECMP
                    S - Stale, c - Contributing to ECMP, b - backup, L - labeled-unicast, q - Pending FIB install
                    % - Pending best path selection
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI Origin Validation codes: V - valid, I - invalid, U - unknown
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Link Local Nexthop

          Network                Next Hop              Metric  AIGP       LocPref Weight  Path
 * >Ec    10.1.0.1/32            fe80::a8c1:abff:fef9:2e09%Et4 0       -          100     0       64496 i
 *  ec    10.1.0.1/32            fe80::a8c1:abff:fe3d:e982%Et1 0       -          100     0       64496 i
 *  ec    10.1.0.1/32            fe80::a8c1:abff:fe76:2007%Et5 0       -          100     0       64496 i
 *  ec    10.1.0.1/32            fe80::a8c1:abff:fe7c:dafd%Et3 0       -          100     0       64496 i
 *  ec    10.1.0.1/32            fe80::a8c1:abff:feb1:b6c1%Et2 0       -          100     0       64496 i
 *  ec    10.1.0.1/32            fe80::a8c1:abff:fe8d:e6f%Et6 0       -          100     0       64496 i
 * >Ec    10.2.0.1/32            fe80::a8c1:abff:fe76:2007%Et5 0       -          100     0       64496 i
 *  ec    10.2.0.1/32            fe80::a8c1:abff:fef9:2e09%Et4 0       -          100     0       64496 i
 *  ec    10.2.0.1/32            fe80::a8c1:abff:fe8d:e6f%Et6 0       -          100     0       64496 i
 *  ec    10.2.0.1/32            fe80::a8c1:abff:feb1:b6c1%Et2 0       -          100     0       64496 i
 *  ec    10.2.0.1/32            fe80::a8c1:abff:fe7c:dafd%Et3 0       -          100     0       64496 i
 *  ec    10.2.0.1/32            fe80::a8c1:abff:fe3d:e982%Et1 0       -          100     0       64496 i
 * >Ec    10.3.0.1/32            fe80::a8c1:abff:fe76:2007%Et5 0       -          100     0       64496 i
 *  ec    10.3.0.1/32            fe80::a8c1:abff:fef9:2e09%Et4 0       -          100     0       64496 i
 *  ec    10.3.0.1/32            fe80::a8c1:abff:fe8d:e6f%Et6 0       -          100     0       64496 i
 *  ec    10.3.0.1/32            fe80::a8c1:abff:feb1:b6c1%Et2 0       -          100     0       64496 i
 *  ec    10.3.0.1/32            fe80::a8c1:abff:fe7c:dafd%Et3 0       -          100     0       64496 i
 *  ec    10.3.0.1/32            fe80::a8c1:abff:fe3d:e982%Et1 0       -          100     0       64496 i
 * >Ec    10.3.10.0/24           fe80::a8c1:abff:fe76:2007%Et5 0       -          100     0       64496 i
 *  ec    10.3.10.0/24           fe80::a8c1:abff:fef9:2e09%Et4 0       -          100     0       64496 i
 *  ec    10.3.10.0/24           fe80::a8c1:abff:fe8d:e6f%Et6 0       -          100     0       64496 i
 *  ec    10.3.10.0/24           fe80::a8c1:abff:feb1:b6c1%Et2 0       -          100     0       64496 i
 *  ec    10.3.10.0/24           fe80::a8c1:abff:fe7c:dafd%Et3 0       -          100     0       64496 i
 *  ec    10.3.10.0/24           fe80::a8c1:abff:fe3d:e982%Et1 0       -          100     0       64496 i
 * >Ec    10.4.0.1/32            fe80::a8c1:abff:fe76:2007%Et5 0       -          100     0       64496 i
 *  ec    10.4.0.1/32            fe80::a8c1:abff:fef9:2e09%Et4 0       -          100     0       64496 i
 *  ec    10.4.0.1/32            fe80::a8c1:abff:fe8d:e6f%Et6 0       -          100     0       64496 i
 *  ec    10.4.0.1/32            fe80::a8c1:abff:feb1:b6c1%Et2 0       -          100     0       64496 i
 *  ec    10.4.0.1/32            fe80::a8c1:abff:fe7c:dafd%Et3 0       -          100     0       64496 i
 *  ec    10.4.0.1/32            fe80::a8c1:abff:fe3d:e982%Et1 0       -          100     0       64496 i
 * >Ec    10.5.0.1/32            fe80::a8c1:abff:fe76:2007%Et5 0       -          100     0       64496 i
 *  ec    10.5.0.1/32            fe80::a8c1:abff:fef9:2e09%Et4 0       -          100     0       64496 i
 *  ec    10.5.0.1/32            fe80::a8c1:abff:fe8d:e6f%Et6 0       -          100     0       64496 i
 *  ec    10.5.0.1/32            fe80::a8c1:abff:feb1:b6c1%Et2 0       -          100     0       64496 i
 *  ec    10.5.0.1/32            fe80::a8c1:abff:fe7c:dafd%Et3 0       -          100     0       64496 i
 *  ec    10.5.0.1/32            fe80::a8c1:abff:fe3d:e982%Et1 0       -          100     0       64496 i
 * >Ec    10.6.0.1/32            fe80::a8c1:abff:fe76:2007%Et5 0       -          100     0       64496 i
 *  ec    10.6.0.1/32            fe80::a8c1:abff:fef9:2e09%Et4 0       -          100     0       64496 i
 *  ec    10.6.0.1/32            fe80::a8c1:abff:fe8d:e6f%Et6 0       -          100     0       64496 i
 *  ec    10.6.0.1/32            fe80::a8c1:abff:feb1:b6c1%Et2 0       -          100     0       64496 i
 *  ec    10.6.0.1/32            fe80::a8c1:abff:fe7c:dafd%Et3 0       -          100     0       64496 i
 *  ec    10.6.0.1/32            fe80::a8c1:abff:fe3d:e982%Et1 0       -          100     0       64496 i
 * >Ec    10.7.0.1/32            fe80::a8c1:abff:fe76:2007%Et5 0       -          100     0       64496 i
 *  ec    10.7.0.1/32            fe80::a8c1:abff:fef9:2e09%Et4 0       -          100     0       64496 i
 *  ec    10.7.0.1/32            fe80::a8c1:abff:fe8d:e6f%Et6 0       -          100     0       64496 i
 *  ec    10.7.0.1/32            fe80::a8c1:abff:feb1:b6c1%Et2 0       -          100     0       64496 i
 *  ec    10.7.0.1/32            fe80::a8c1:abff:fe7c:dafd%Et3 0       -          100     0       64496 i
 *  ec    10.7.0.1/32            fe80::a8c1:abff:fe3d:e982%Et1 0       -          100     0       64496 i
 * >Ec    10.8.0.1/32            fe80::a8c1:abff:fe76:2007%Et5 0       -          100     0       64496 i
 *  ec    10.8.0.1/32            fe80::a8c1:abff:fef9:2e09%Et4 0       -          100     0       64496 i
 *  ec    10.8.0.1/32            fe80::a8c1:abff:fe8d:e6f%Et6 0       -          100     0       64496 i
 *  ec    10.8.0.1/32            fe80::a8c1:abff:feb1:b6c1%Et2 0       -          100     0       64496 i
 *  ec    10.8.0.1/32            fe80::a8c1:abff:fe7c:dafd%Et3 0       -          100     0       64496 i
 *  ec    10.8.0.1/32            fe80::a8c1:abff:fe3d:e982%Et1 0       -          100     0       64496 i
 * >Ec    10.8.10.0/24           fe80::a8c1:abff:fe76:2007%Et5 0       -          100     0       64496 i
 *  ec    10.8.10.0/24           fe80::a8c1:abff:fe3d:e982%Et1 0       -          100     0       64496 i
 *  ec    10.8.10.0/24           fe80::a8c1:abff:feb1:b6c1%Et2 0       -          100     0       64496 i
 *  ec    10.8.10.0/24           fe80::a8c1:abff:fe8d:e6f%Et6 0       -          100     0       64496 i
 *  ec    10.8.10.0/24           fe80::a8c1:abff:fe7c:dafd%Et3 0       -          100     0       64496 i
 *  ec    10.8.10.0/24           fe80::a8c1:abff:fef9:2e09%Et4 0       -          100     0       64496 i
 * >      10.9.0.1/32            -                     -       -          -       0       i
 * >      10.9.10.0/24           -                     -       -          -       0       i
 * >Ec    10.10.0.1/32           fe80::a8c1:abff:fe76:2007%Et5 0       -          100     0       64496 i
 *  ec    10.10.0.1/32           fe80::a8c1:abff:fef9:2e09%Et4 0       -          100     0       64496 i
 *  ec    10.10.0.1/32           fe80::a8c1:abff:fe8d:e6f%Et6 0       -          100     0       64496 i
 *  ec    10.10.0.1/32           fe80::a8c1:abff:feb1:b6c1%Et2 0       -          100     0       64496 i
 *  ec    10.10.0.1/32           fe80::a8c1:abff:fe7c:dafd%Et3 0       -          100     0       64496 i
 *  ec    10.10.0.1/32           fe80::a8c1:abff:fe3d:e982%Et1 0       -          100     0       64496 i

Example topology: eBGP everywhere

In this example, we start from scratch. No OSPF, no iBGP. Just eBGP everywhere.

Should you do that? Probably not. But nevertheless you should know how to implement it.

Example topology for the eBGP everywhere scenario

In this example, all routers peer with each other via eBGP unnumbered. There is no OSPF and no iBGP. Each router gets its own AS number. The routers distribute their loopbacks and some attached subnets via eBGP.

We already learned how to configure eBGP unnumbered in the previous examples, so the following config should look familiar. Let’s go through foh’s config.

interface Ethernet1
   speed 100g-4
   no switchport
   ipv6 enable

Nothing surprising here. The important bit is no switchport (as always for our router-to-router links) and ipv6 enable to enable IPv6 on the interface for link-local addresses.

ip routing ipv6 interfaces 

ipv6 unicast-routing

Again, the magic ip routing ipv6 interfaces to enable routing on interfaces without an IPv4 assigned.

peer-filter FILTER-ANY
   10 match as-range 1-4294967295 result accept
!
router bgp 64496
   router-id 10.1.0.1
   no bgp default ipv4-unicast
   maximum-paths 8
   neighbor broadcast-ebgp peer group
   neighbor internal-ebgp peer group
   neighbor interface Et1-10 peer-group internal-ebgp peer-filter FILTER-ANY
   neighbor interface Et11-16 peer-group broadcast-ebgp remote-as 64504
   !
   address-family ipv4
      neighbor broadcast-ebgp activate
      neighbor broadcast-ebgp next-hop address-family ipv6 originate
      neighbor internal-ebgp activate
      neighbor internal-ebgp next-hop address-family ipv6 originate
      network 10.1.0.1/32

This needs explanation.

peer-filter FILTER-ANY
   10 match as-range 1-4294967295 result accept

Creates a peer filter that accepts any AS number. Due to another ~~stupid bug~~ surprise behaviour in EOS, it’s necessary. For the internal eBGP peers, we want to be lazy and don’t want to specify the remote AS for each interface neighbour. Therefore, we create a peer filter that matches any AS number.

Older EOS versions require you to specify either a remote-as or peer-filter for interface neighbours. Newer EOS versions do not… But if you don’t specify it, they simply don’t enable the neighbours. It’s very tricky to debug. Thanks for nothing, Arista.

After that, nothing surprising

router bgp 64496 - BGP process with AS 64496
no bgp default ipv4-unicast - disables that the the IPv4 unicast address family is enabled by default - this is simply good practice, because the default behaviour is often not desired
maximum-paths 8 - enable ECMP with up to 8 paths
neighbor broadcast-ebgp peer group - create a peer group for the broadcast truck eBGP peers
neighbor internal-ebgp peer group - create a peer group for the internal eBGP peers
neighbor interface Et1-10 peer-group internal-ebgp peer-filter FILTER-ANY - assign all internal eBGP interface neighbours (Et1 to Et10) to the internal-ebgp peer group and apply the FILTER-ANY peer filter (allow all AS numbers)
neighbor interface Et11-16 peer-group broadcast-ebgp remote-as 64504 - assign all broadcast truck eBGP interface neighbours (Et11 to Et16) to the broadcast-ebgp peer group and set the remote AS to 64504
address-family ipv4 - enter the IPv4 address family
neighbor broadcast-ebgp activate - activate the broadcast-ebgp peer group for IPv4 unicast
neighbor broadcast-ebgp next-hop address-family ipv6 originate - because we are using BGP unnumbered, we need to tell BGP to allow the encoding of IPv4 addresses with IPv6 nexthop
neighbor internal-ebgp activate - activate the internal-ebgp peer group for IPv4 unicast
neighbor internal-ebgp next-hop address-family ipv6 originate - same as above for the internal eBGP peers
network 10.1.0.1/32 - advertise the loopback of foh

Note that we do not need next-hop-self or a route-map to change the next-hop, because in eBGP the next-hop is always changed to the advertising router by default.

Let’s look at stage-center.

stage-center(config-router-bgp)#show ip bgp
BGP routing table information for VRF default
Router identifier 10.2.0.1, local AS number 64497
Route status codes: s - suppressed contributor, * - valid, > - active, E - ECMP head, e - ECMP
                    S - Stale, c - Contributing to ECMP, b - backup, L - labeled-unicast, q - Pending FIB install
                    % - Pending best path selection
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI Origin Validation codes: V - valid, I - invalid, U - unknown
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Link Local Nexthop

          Network                Next Hop              Metric  AIGP       LocPref Weight  Path
 * >Ec    10.1.0.1/32            fe80::a8c1:abff:fe41:19b9%Et4 0       -          100     0       64496 i
 *  ec    10.1.0.1/32            fe80::a8c1:abff:fe39:e083%Et3 0       -          100     0       64496 i
 *  ec    10.1.0.1/32            fe80::a8c1:abff:fee7:f1e3%Et2 0       -          100     0       64496 i
 *  ec    10.1.0.1/32            fe80::a8c1:abff:fe98:5ff0%Et5 0       -          100     0       64496 i
 *  ec    10.1.0.1/32            fe80::a8c1:abff:fe6d:9aab%Et1 0       -          100     0       64496 i
 *  ec    10.1.0.1/32            fe80::a8c1:abff:fe41:b1b7%Et6 0       -          100     0       64496 i
 *  E     10.1.0.1/32            fe80::a8c1:abff:fe15:c17c%Et10 0       -          100     0       64499 64501 64496 i
 *  e     10.1.0.1/32            fe80::a8c1:abff:febc:ada0%Et9 0       -          100     0       64499 64501 64496 i
 * >      10.2.0.1/32            -                     -       -          -       0       i
 * >Ec    10.3.0.1/32            fe80::a8c1:abff:fed8:303%Et7 0       -          100     0       64498 i
 *  ec    10.3.0.1/32            fe80::a8c1:abff:fe46:a13f%Et8 0       -          100     0       64498 i
 * >Ec    10.3.10.0/24           fe80::a8c1:abff:fe46:a13f%Et8 0       -          100     0       64498 i
 *  ec    10.3.10.0/24           fe80::a8c1:abff:fed8:303%Et7 0       -          100     0       64498 i
 * >Ec    10.4.0.1/32            fe80::a8c1:abff:fe15:c17c%Et10 0       -          100     0       64499 i
 *  ec    10.4.0.1/32            fe80::a8c1:abff:febc:ada0%Et9 0       -          100     0       64499 i
 *  E     10.4.0.1/32            fe80::a8c1:abff:fe98:5ff0%Et5 0       -          100     0       64496 64501 64499 i
 *  e     10.4.0.1/32            fe80::a8c1:abff:fe41:b1b7%Et6 0       -          100     0       64496 64501 64499 i
 *  e     10.4.0.1/32            fe80::a8c1:abff:fe39:e083%Et3 0       -          100     0       64496 64501 64499 i
 *  e     10.4.0.1/32            fe80::a8c1:abff:fee7:f1e3%Et2 0       -          100     0       64496 64501 64499 i
 *  e     10.4.0.1/32            fe80::a8c1:abff:fe6d:9aab%Et1 0       -          100     0       64496 64501 64499 i
 *  e     10.4.0.1/32            fe80::a8c1:abff:fe41:19b9%Et4 0       -          100     0       64496 64501 64499 i
 * >Ec    10.5.0.1/32            fe80::a8c1:abff:fe46:a13f%Et8 0       -          100     0       64498 64500 i
 *  ec    10.5.0.1/32            fe80::a8c1:abff:fed8:303%Et7 0       -          100     0       64498 64500 i
 *  ec    10.5.0.1/32            fe80::a8c1:abff:fe98:5ff0%Et5 0       -          100     0       64496 64500 i
 *  ec    10.5.0.1/32            fe80::a8c1:abff:fe41:19b9%Et4 0       -          100     0       64496 64500 i
 *  ec    10.5.0.1/32            fe80::a8c1:abff:fee7:f1e3%Et2 0       -          100     0       64496 64500 i
 *  ec    10.5.0.1/32            fe80::a8c1:abff:fe39:e083%Et3 0       -          100     0       64496 64500 i
 *  ec    10.5.0.1/32            fe80::a8c1:abff:fe41:b1b7%Et6 0       -          100     0       64496 64500 i
 *  ec    10.5.0.1/32            fe80::a8c1:abff:fe6d:9aab%Et1 0       -          100     0       64496 64500 i
 *  E     10.5.0.1/32            fe80::a8c1:abff:fe15:c17c%Et10 0       -          100     0       64499 64501 64496 64500 i
 *  e     10.5.0.1/32            fe80::a8c1:abff:febc:ada0%Et9 0       -          100     0       64499 64501 64496 64500 i
 * >Ec    10.6.0.1/32            fe80::a8c1:abff:fe98:5ff0%Et5 0       -          100     0       64496 64501 i
 *  ec    10.6.0.1/32            fe80::a8c1:abff:fe41:b1b7%Et6 0       -          100     0       64496 64501 i
 *  ec    10.6.0.1/32            fe80::a8c1:abff:fe39:e083%Et3 0       -          100     0       64496 64501 i
 *  ec    10.6.0.1/32            fe80::a8c1:abff:fee7:f1e3%Et2 0       -          100     0       64496 64501 i
 *  ec    10.6.0.1/32            fe80::a8c1:abff:fe6d:9aab%Et1 0       -          100     0       64496 64501 i
 *  ec    10.6.0.1/32            fe80::a8c1:abff:fe41:19b9%Et4 0       -          100     0       64496 64501 i
 *  ec    10.6.0.1/32            fe80::a8c1:abff:fe15:c17c%Et10 0       -          100     0       64499 64501 i
 *  ec    10.6.0.1/32            fe80::a8c1:abff:febc:ada0%Et9 0       -          100     0       64499 64501 i
 *  E     10.6.0.1/32            fe80::a8c1:abff:fe46:a13f%Et8 0       -          100     0       64498 64500 64496 64501 i
 *  e     10.6.0.1/32            fe80::a8c1:abff:fed8:303%Et7 0       -          100     0       64498 64500 64496 64501 i
 * >Ec    10.7.0.1/32            fe80::a8c1:abff:fe46:a13f%Et8 0       -          100     0       64498 64500 64502 i
 *  ec    10.7.0.1/32            fe80::a8c1:abff:fe98:5ff0%Et5 0       -          100     0       64496 64500 64502 i
 *  ec    10.7.0.1/32            fe80::a8c1:abff:fed8:303%Et7 0       -          100     0       64498 64500 64502 i
 *  ec    10.7.0.1/32            fe80::a8c1:abff:fe41:b1b7%Et6 0       -          100     0       64496 64500 64502 i
 *  ec    10.7.0.1/32            fe80::a8c1:abff:fe39:e083%Et3 0       -          100     0       64496 64500 64502 i
 *  ec    10.7.0.1/32            fe80::a8c1:abff:fee7:f1e3%Et2 0       -          100     0       64496 64500 64502 i
 *  ec    10.7.0.1/32            fe80::a8c1:abff:fe6d:9aab%Et1 0       -          100     0       64496 64500 64502 i
 *  ec    10.7.0.1/32            fe80::a8c1:abff:fe41:19b9%Et4 0       -          100     0       64496 64500 64502 i
 * >Ec    10.8.0.1/32            fe80::a8c1:abff:fe98:5ff0%Et5 0       -          100     0       64496 64501 64503 i
 *  ec    10.8.0.1/32            fe80::a8c1:abff:fe41:b1b7%Et6 0       -          100     0       64496 64501 64503 i
 *  ec    10.8.0.1/32            fe80::a8c1:abff:fe39:e083%Et3 0       -          100     0       64496 64501 64503 i
 *  ec    10.8.0.1/32            fe80::a8c1:abff:fee7:f1e3%Et2 0       -          100     0       64496 64501 64503 i
 *  ec    10.8.0.1/32            fe80::a8c1:abff:fe6d:9aab%Et1 0       -          100     0       64496 64501 64503 i
 *  ec    10.8.0.1/32            fe80::a8c1:abff:fe41:19b9%Et4 0       -          100     0       64496 64501 64503 i
 *  ec    10.8.0.1/32            fe80::a8c1:abff:fe15:c17c%Et10 0       -          100     0       64499 64501 64503 i
 *  ec    10.8.0.1/32            fe80::a8c1:abff:febc:ada0%Et9 0       -          100     0       64499 64501 64503 i
 * >Ec    10.8.10.0/24           fe80::a8c1:abff:fe15:c17c%Et10 0       -          100     0       64499 64501 64503 i
 *  ec    10.8.10.0/24           fe80::a8c1:abff:febc:ada0%Et9 0       -          100     0       64499 64501 64503 i
 *  ec    10.8.10.0/24           fe80::a8c1:abff:fe98:5ff0%Et5 0       -          100     0       64496 64501 64503 i
 *  ec    10.8.10.0/24           fe80::a8c1:abff:fe41:b1b7%Et6 0       -          100     0       64496 64501 64503 i
 *  ec    10.8.10.0/24           fe80::a8c1:abff:fe39:e083%Et3 0       -          100     0       64496 64501 64503 i
 *  ec    10.8.10.0/24           fe80::a8c1:abff:fee7:f1e3%Et2 0       -          100     0       64496 64501 64503 i
 *  ec    10.8.10.0/24           fe80::a8c1:abff:fe6d:9aab%Et1 0       -          100     0       64496 64501 64503 i
 *  ec    10.8.10.0/24           fe80::a8c1:abff:fe41:19b9%Et4 0       -          100     0       64496 64501 64503 i
 *  E     10.8.10.0/24           fe80::a8c1:abff:fe46:a13f%Et8 0       -          100     0       64498 64500 64496 64501 64503 i
 *  e     10.8.10.0/24           fe80::a8c1:abff:fed8:303%Et7 0       -          100     0       64498 64500 64496 64501 64503 i
 * >Ec    10.9.0.1/32            fe80::a8c1:abff:fe39:e083%Et3 0       -          100     0       64496 64504 i
 *  ec    10.9.0.1/32            fe80::a8c1:abff:fe41:19b9%Et4 0       -          100     0       64496 64504 i
 *  ec    10.9.0.1/32            fe80::a8c1:abff:fe98:5ff0%Et5 0       -          100     0       64496 64504 i
 *  ec    10.9.0.1/32            fe80::a8c1:abff:fe41:b1b7%Et6 0       -          100     0       64496 64504 i
 *  ec    10.9.0.1/32            fe80::a8c1:abff:fe6d:9aab%Et1 0       -          100     0       64496 64504 i
 *  ec    10.9.0.1/32            fe80::a8c1:abff:fee7:f1e3%Et2 0       -          100     0       64496 64504 i
 *  E     10.9.0.1/32            fe80::a8c1:abff:fe15:c17c%Et10 0       -          100     0       64499 64501 64496 64504 i
 *  e     10.9.0.1/32            fe80::a8c1:abff:febc:ada0%Et9 0       -          100     0       64499 64501 64496 64504 i
 * >Ec    10.9.10.0/24           fe80::a8c1:abff:fe39:e083%Et3 0       -          100     0       64496 64504 i
 *  ec    10.9.10.0/24           fe80::a8c1:abff:fe98:5ff0%Et5 0       -          100     0       64496 64504 i
 *  ec    10.9.10.0/24           fe80::a8c1:abff:fee7:f1e3%Et2 0       -          100     0       64496 64504 i
 *  ec    10.9.10.0/24           fe80::a8c1:abff:fe6d:9aab%Et1 0       -          100     0       64496 64504 i
 *  ec    10.9.10.0/24           fe80::a8c1:abff:fe41:19b9%Et4 0       -          100     0       64496 64504 i
 *  ec    10.9.10.0/24           fe80::a8c1:abff:fe41:b1b7%Et6 0       -          100     0       64496 64504 i
 *  E     10.9.10.0/24           fe80::a8c1:abff:fe15:c17c%Et10 0       -          100     0       64499 64501 64496 64504 i
 *  e     10.9.10.0/24           fe80::a8c1:abff:febc:ada0%Et9 0       -          100     0       64499 64501 64496 64504 i

That’s a lot of output. But we can see that stage-center has learned all routes via eBGP from its neighbours and it’s properly ECMP’d.

Let’s look at a route in detail:

stage-center(config-router-bgp)#show ip bgp 10.9.10.0/24 detail
BGP routing table information for VRF default
Router identifier 10.2.0.1, local AS number 64497
BGP routing table entry for 10.9.10.0/24
 Paths: 8 available
  64496 64504
    fe80::a8c1:abff:fe39:e083%Et3 from fe80::a8c1:abff:fe39:e083%Et3 (10.1.0.1)
      Origin IGP, metric 0, localpref 100, IGP metric 1, weight 0, tag 0
      Received 05:06:18 ago, valid, external, ECMP head, ECMP, best, ECMP contributor
      Rx SAFI: Unicast
  64496 64504
    fe80::a8c1:abff:fe98:5ff0%Et5 from fe80::a8c1:abff:fe98:5ff0%Et5 (10.1.0.1)
      Origin IGP, metric 0, localpref 100, IGP metric 1, weight 0, tag 0
      Received 05:06:18 ago, valid, external, ECMP, ECMP contributor
      Not best: ECMP-Fast configured
      Rx SAFI: Unicast
  64496 64504
    fe80::a8c1:abff:fee7:f1e3%Et2 from fe80::a8c1:abff:fee7:f1e3%Et2 (10.1.0.1)
      Origin IGP, metric 0, localpref 100, IGP metric 1, weight 0, tag 0
      Received 05:06:18 ago, valid, external, ECMP, ECMP contributor
      Not best: ECMP-Fast configured
      Rx SAFI: Unicast
  64496 64504
    fe80::a8c1:abff:fe6d:9aab%Et1 from fe80::a8c1:abff:fe6d:9aab%Et1 (10.1.0.1)
      Origin IGP, metric 0, localpref 100, IGP metric 1, weight 0, tag 0
      Received 05:06:18 ago, valid, external, ECMP, ECMP contributor
      Not best: ECMP-Fast configured
      Rx SAFI: Unicast
  64496 64504
    fe80::a8c1:abff:fe41:19b9%Et4 from fe80::a8c1:abff:fe41:19b9%Et4 (10.1.0.1)
      Origin IGP, metric 0, localpref 100, IGP metric 1, weight 0, tag 0
      Received 05:06:18 ago, valid, external, ECMP, ECMP contributor
      Not best: ECMP-Fast configured
      Rx SAFI: Unicast
  64496 64504
    fe80::a8c1:abff:fe41:b1b7%Et6 from fe80::a8c1:abff:fe41:b1b7%Et6 (10.1.0.1)
      Origin IGP, metric 0, localpref 100, IGP metric 1, weight 0, tag 0
      Received 05:06:18 ago, valid, external, ECMP, ECMP contributor
      Not best: ECMP-Fast configured
      Rx SAFI: Unicast
  64499 64501 64496 64504
    fe80::a8c1:abff:fe15:c17c%Et10 from fe80::a8c1:abff:fe15:c17c%Et10 (10.4.0.1)
      Origin IGP, metric 0, localpref 100, IGP metric 1, weight 0, tag 0
      Received 05:03:36 ago, valid, external, ECMP head, ECMP
      Not best: AS path length
      Rx SAFI: Unicast
  64499 64501 64496 64504
    fe80::a8c1:abff:febc:ada0%Et9 from fe80::a8c1:abff:febc:ada0%Et9 (10.4.0.1)
      Origin IGP, metric 0, localpref 100, IGP metric 1, weight 0, tag 0
      Received 05:03:36 ago, valid, external, ECMP
      Not best: AS path length
      Rx SAFI: Unicast
 Advertised to 9 peers:
  peer-group internal-ebgp:
    fe80::a8c1:abff:fe15:c17c%Et10    fe80::a8c1:abff:fe41:19b9%Et4    fe80::a8c1:abff:fe41:b1b7%Et6
    fe80::a8c1:abff:fe46:a13f%Et8    fe80::a8c1:abff:fe6d:9aab%Et1    fe80::a8c1:abff:fe98:5ff0%Et5
    fe80::a8c1:abff:febc:ada0%Et9    fe80::a8c1:abff:fed8:303%Et7    fe80::a8c1:abff:fee7:f1e3%Et2

We can see that stage-center has 8 available paths to reach 10.9.10.0/24. The path through fe80::a8c1:abff:fe39:e083%Et3 is the best one, but only because it was received first.

The other paths through 64496 64504 have Not best: ECMP-Fast configured, meaning they were not best paths due to the default setting of bgp bestpath ecmp-fast. The docs explain it like this:

By default, within an ECMP group the BGP best-path selection process prefers the active path (the first path received by the switch) unless a relevant tie-breaker is enabled.

But still, the other paths with Not best: ECMP-Fast configured contribute to ECMP, as we can see by ECMP contributor in the Received line.

More paths through 64499 64501 64496 64504 exist, but they are not best because they are longer Not best: AS path length. Still, BGP understands that they could be used for ECMP if needed (valid, external, ECMP head, ECMP), but they are not used right now.

This also shows that the ASPATH is “in order”, i.e. from the first AS to the last AS (AS 64504 originates the 10.9.10.0/24 prefix) in the path and why BGP peers prepend their own AS and not append it.

show ip bgp sum shows our eBGP neighbours via IPv6 link-local addresses.

stage-center(config-router-bgp)#show ip bgp sum
BGP summary information for VRF default
Router identifier 10.2.0.1, local AS number 64497
Neighbor Status Codes: m - Under maintenance
  Neighbor                       V AS           MsgRcvd   MsgSent  InQ OutQ  Up/Down State   PfxRcd PfxAcc PfxAdv
  fe80::a8c1:abff:fe15:c17c%Et10 4 64499            391       392    0    0 05:15:56 Estab   8      8      10
  fe80::a8c1:abff:fe39:e083%Et3  4 64496            387       383    0    0 05:18:38 Estab   9      9      10
  fe80::a8c1:abff:fe41:19b9%Et4  4 64496            391       385    0    0 05:18:38 Estab   9      9      11
  fe80::a8c1:abff:fe41:b1b7%Et6  4 64496            386       382    0    0 05:18:39 Estab   9      9      12
  fe80::a8c1:abff:fe46:a13f%Et8  4 64498            390       389    0    0 05:18:20 Estab   6      6      9
  fe80::a8c1:abff:fe6d:9aab%Et1  4 64496            387       389    0    0 05:18:38 Estab   9      9      12
  fe80::a8c1:abff:fe98:5ff0%Et5  4 64496            389       386    0    0 05:18:38 Estab   9      9      10
  fe80::a8c1:abff:febc:ada0%Et9  4 64499            394       396    0    0 05:15:56 Estab   8      8      12
  fe80::a8c1:abff:fed8:303%Et7   4 64498            386       390    0    0 05:18:21 Estab   6      6      11
  fe80::a8c1:abff:fee7:f1e3%Et2  4 64496            382       384    0    0 05:18:38 Estab   9      9      12

and of course, connectivity works as expected.

[*]─[stagebox1]─[~]
└──> sudo traceroute 10.9.10.10
traceroute to 10.9.10.10 (10.9.10.10), 30 hops max, 46 byte packets
 1  10.3.10.1 (10.3.10.1)  0.843 ms  0.791 ms  0.583 ms
 2  10.5.0.1 (10.5.0.1)  1.236 ms  1.413 ms  10.2.0.1 (10.2.0.1)  1.259 ms
 3  10.1.0.1 (10.1.0.1)  2.071 ms  2.630 ms  1.957 ms
 4  10.9.0.1 (10.9.0.1)  2.582 ms  3.310 ms  2.662 ms
 5  10.9.10.10 (10.9.10.10)  3.702 ms  3.012 ms  2.932 ms

[*]─[stagebox1]─[~]
└──> ping -c 5 10.9.10.10
PING 10.9.10.10 (10.9.10.10) 56(84) bytes of data.
64 bytes from 10.9.10.10: icmp_seq=1 ttl=60 time=3.09 ms
64 bytes from 10.9.10.10: icmp_seq=2 ttl=60 time=2.68 ms
64 bytes from 10.9.10.10: icmp_seq=3 ttl=60 time=2.50 ms
64 bytes from 10.9.10.10: icmp_seq=4 ttl=60 time=2.75 ms
64 bytes from 10.9.10.10: icmp_seq=5 ttl=60 time=2.27 ms

--- 10.9.10.10 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4006ms
rtt min/avg/max/mdev = 2.266/2.657/3.092/0.273 ms

[*]─[stagebox1]─[~]
└──> curl 10.9.10.10
based on WBITT Network MultiTool (https://github.com/wbitt/Network-MultiTool)

traceroute shows our expected path through the network and ping and curl confirm end-to-end connectivity. Note that it is important to test connectivity not only with ping, but also actual TCP connections like curl, because ICMP traffic from ping is often treated differently by network devices.

Conclusion

In this article, we explored BGP as an advanced routing protocol for event networks. We discussed the fundamental differences between BGP and other protocols such as OSPF, covering concepts like IGPs vs EGPs, iBGP vs eBGP, Autonomous Systems, and trust boundaries.

We examined three distinct BGP deployment patterns: OSPF with iBGP and eBGP for multi-AS scenarios, iBGP with Route Reflectors to scale within larger networks, and the modern “eBGP everywhere” approach for simplified, scalable fabrics. For each pattern, we set up the topology using Containerlab and configured BGP alongside or instead of OSPF. Finally, we inspected the networks to verify connectivity and BGP operation, including path selection and ECMP behavior.

We have shown that while BGP has a reputation for complexity, it is a powerful and flexible routing protocol that can be configured with relative ease when you understand its core concepts. BGP extends your networking toolkit beyond OSPF’s capabilities, enabling trust boundaries, large-scale route distribution, and sophisticated traffic engineering - capabilities that become essential as your event networks grow in size and complexity.

Audio over IP Networks for Events - An Opinionated Guide, Part 4: BGP as advanced routing protocol for when you need a little bit more spice

BGP - The man, the myth, the legend

BGP vs OSPF - Both are just routing protocols, right?

IGP, EGP, iBGP, eBGP, and oh my… What are all these acronyms?

Why you should not use OSPF to distribute a large number of routes

Use eBGP to establish trust boundaries

Use the right tool for the job!

How BGP works - An overview

Contents of a simple BGP-4 UPDATE Message

BGP Attributes

BGP is a path vector protocol

Autonomous Systems are represented by ASNs (AS Numbers)

4-byte ASN representation

iBGP and eBGP: BGP’s two operating modes

BGP Loop Avoidance in iBGP und eBGP

eBGP loop avoidance

iBGP loop avoidance

iBGP Route Reflectors & eBGP Route Servers

The BGP best path selection algorithm

BGP communities

Route filtering and manipulation with route-maps and vendor specific policy languages

Route Redistribution

Multiprotocol BGP (MP-BGP) and more attributes

Common pitfalls

next-hop self for eBGP routes redistributed into iBGP

eBGP multihop

Common pattern: Peering via loopbacks & unnumbered links

Example topologies

Example topology: OSPF + iBGP + eBGP

Example topology: OSPF + iBGP with Route Reflector + eBGP

Example topology: eBGP everywhere

Conclusion

Assets / Documents

BGP - The man, the myth, the legend#

BGP vs OSPF - Both are just routing protocols, right?#

IGP, EGP, iBGP, eBGP, and oh my… What are all these acronyms?#

Why you should not use OSPF to distribute a large number of routes#

Use eBGP to establish trust boundaries#

Use the right tool for the job!#

How BGP works - An overview#

Contents of a simple BGP-4 UPDATE Message#

BGP Attributes#

BGP is a path vector protocol#

Autonomous Systems are represented by ASNs (AS Numbers)#

4-byte ASN representation#

iBGP and eBGP: BGP’s two operating modes#

BGP Loop Avoidance in iBGP und eBGP#

eBGP loop avoidance#

iBGP loop avoidance#

iBGP Route Reflectors & eBGP Route Servers#

The BGP best path selection algorithm#

BGP communities#

Route filtering and manipulation with route-maps and vendor specific policy languages#

Route Redistribution#

Multiprotocol BGP (MP-BGP) and more attributes#

Common pitfalls#

next-hop self for eBGP routes redistributed into iBGP#

eBGP multihop#

Common pattern: Peering via loopbacks & unnumbered links#

Example topologies#

Example topology: OSPF + iBGP + eBGP#

Example topology: OSPF + iBGP with Route Reflector + eBGP#

Example topology: eBGP everywhere#

Conclusion#

Assets / Documents#

BGP - The man, the myth, the legend

BGP vs OSPF - Both are just routing protocols, right?

IGP, EGP, iBGP, eBGP, and oh my… What are all these acronyms?

Why you should not use OSPF to distribute a large number of routes

Use eBGP to establish trust boundaries

Use the right tool for the job!

How BGP works - An overview

Contents of a simple BGP-4 UPDATE Message

BGP Attributes

BGP is a path vector protocol

Autonomous Systems are represented by ASNs (AS Numbers)

4-byte ASN representation

iBGP and eBGP: BGP’s two operating modes

BGP Loop Avoidance in iBGP und eBGP

eBGP loop avoidance

iBGP loop avoidance

iBGP Route Reflectors & eBGP Route Servers

The BGP best path selection algorithm

BGP communities

Route filtering and manipulation with route-maps and vendor specific policy languages

Route Redistribution

Multiprotocol BGP (MP-BGP) and more attributes

Common pitfalls

next-hop self for eBGP routes redistributed into iBGP

eBGP multihop

Common pattern: Peering via loopbacks & unnumbered links

Example topologies

Example topology: OSPF + iBGP + eBGP

Example topology: OSPF + iBGP with Route Reflector + eBGP

Example topology: eBGP everywhere

Conclusion

Assets / Documents