While preparing the labs for “Audio over IP Networks for Events - An Opinionated Guide, Part 4: BGP as advanced routing protocol for when you need a little bit more spice”, I stumbled across a curious bug in Arista EOS related to OSPF unnumbered, ECMP and iBGP.

In short, when using OSPF unnumbered with ECMP, Arista EOS (specifically cEOS64-4.35.0F) fails to install some iBGP routes into the FIB for no obvious reason. This bug also seems to be present in 4.34.3M and some other people have reported similar issues on physical hardware platforms ranging back to at least 4.28.

It’s a bit tricky because the bug doesn’t always present itself identically, but I can consistently trigger it, just not always with the exact same results.

This is the topology:

Example topology for the OSPF unnumbered + iBGP + eBGP lab

Example topology for the OSPF unnumbered + iBGP + eBGP lab

Really nothing special, right?

The issue

Let’s look at stage-center:

stage-center(config)#show ip route

VRF: default
WARNING: Some of the routes are not programmed in     
kernel, and they are marked with '%'.                 
Source Codes:
       C - connected, S - static, K - kernel,
       O - OSPF, O IA - OSPF inter area, O E1 - OSPF external type 1,
       O E2 - OSPF external type 2, O N1 - OSPF NSSA external type 1,
       O N2 - OSPF NSSA external type2, O3 - OSPFv3,
       O3 IA - OSPFv3 inter area, O3 E1 - OSPFv3 external type 1,
       O3 E2 - OSPFv3 external type 2,
       O3 N1 - OSPFv3 NSSA external type 1,
       O3 N2 - OSPFv3 NSSA external type2, B - Other BGP Routes,
       B I - iBGP, B E - eBGP, R - RIP, I L1 - IS-IS level 1,
       I L2 - IS-IS level 2, A B - BGP Aggregate,
       A O - OSPF Summary, NG - Nexthop Group Static Route,
       V - VXLAN Control Service, M - Martian,
       DH - DHCP client installed default route,
       DP - Dynamic Policy Route, L - VRF Leaked,
       G  - gRIBI, RC - Route Cache Route,
       CL - CBF Leaked Route

Gateway of last resort is not set

[...]
 B I      10.3.10.0/24 [200/0]
           via 10.3.0.1, Ethernet7
           via 10.3.0.1, Ethernet8
[...]
 B I%     10.8.10.0/24 [200/0]
           via 10.1.0.1, Ethernet1
           via 10.1.0.1, Ethernet2
           via 10.1.0.1, Ethernet3
           via 10.1.0.1, Ethernet4
           via 10.1.0.1, Ethernet5
           via 10.1.0.1, Ethernet6
           via 10.4.0.1, Ethernet9
           via 10.4.0.1, Ethernet10
 B I      10.9.0.1/32 [200/0]
           via 10.1.0.1, Ethernet1
           via 10.1.0.1, Ethernet2
           via 10.1.0.1, Ethernet3
           via 10.1.0.1, Ethernet4
           via 10.1.0.1, Ethernet5
           via 10.1.0.1, Ethernet6
 B I      10.9.10.0/24 [200/0]
           via 10.1.0.1, Ethernet1
           via 10.1.0.1, Ethernet2
           via 10.1.0.1, Ethernet3
           via 10.1.0.1, Ethernet4
           via 10.1.0.1, Ethernet5
           via 10.1.0.1, Ethernet6

Uhm - why is 10.8.10.0/24 not installed into the kernel?

The nexthops are there:


 O        10.1.0.1/32 [110/18]
           directly connected, Ethernet1
           directly connected, Ethernet2
           directly connected, Ethernet3
           directly connected, Ethernet4
           directly connected, Ethernet5
           directly connected, Ethernet6

 O        10.4.0.1/32 [110/18]
           directly connected, Ethernet9
           directly connected, Ethernet10

Alright, let’s look at delay-row2-left

delay-row2-left>show ip route

VRF: default
WARNING: Some of the routes are not programmed in     
kernel, and they are marked with '%'.                 
[...]
 B I%     10.3.10.0/24 [200/0]
           via 10.5.0.1, Ethernet1
           via 10.5.0.1, Ethernet2
[...]
 B I%     10.8.10.0/24 [200/0]
           via 10.5.0.1, Ethernet1
           via 10.5.0.1, Ethernet2
 B I%     10.9.0.1/32 [200/0]
           via 10.5.0.1, Ethernet1
           via 10.5.0.1, Ethernet2
 B I%     10.9.10.0/24 [200/0]
           via 10.5.0.1, Ethernet1
           via 10.5.0.1, Ethernet2

Interesting, here none of the iBGP routes gets installed. Next-hop is also there

delay-row2-left>show ip route 10.5.0.1/32

VRF: default
WARNING: Some of the routes are not programmed in     
kernel, and they are marked with '%'.                 
[...]

 O        10.5.0.1/32 [110/18]
           directly connected, Ethernet1
           directly connected, Ethernet2
delay-row2-left#ping 10.5.0.1 source Loopback 1
PING 10.5.0.1 (10.5.0.1) from 10.7.0.1 : 72(100) bytes of data.
80 bytes from 10.5.0.1: icmp_seq=1 ttl=64 time=0.047 ms
80 bytes from 10.5.0.1: icmp_seq=2 ttl=64 time=0.036 ms
80 bytes from 10.5.0.1: icmp_seq=3 ttl=64 time=0.010 ms
80 bytes from 10.5.0.1: icmp_seq=4 ttl=64 time=0.009 ms
80 bytes from 10.5.0.1: icmp_seq=5 ttl=64 time=0.012 ms

--- 10.5.0.1 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.009/0.022/0.047/0.015 ms, ipg/ewma 0.039/0.034 ms

Let’s dig a bit deeper.

delay-row2-left#show ip bgp 10.3.10.0/24 detail 
BGP routing table information for VRF default
Router identifier 10.7.0.1, local AS number 64496
BGP routing table entry for 10.3.10.0/24
 Paths: 1 available
  Local
    10.3.0.1 from 10.3.0.1 (10.3.0.1)
      Origin IGP, metric 0, localpref 100, IGP metric 26, weight 0, tag 0
      Received 00:12:29 ago, valid, internal, best
      Rx SAFI: Unicast
 Not advertised to any peer.

That looks good. The iBGP route is there, valid and best.

delay-row2-left#show ip bgp neighbors 10.3.0.1 received-routes 
BGP routing table information for VRF default
Router identifier 10.7.0.1, local AS number 64496
Route status codes: s - suppressed contributor, * - valid, > - active, E - ECMP head, e - ECMP
                    S - Stale, c - Contributing to ECMP, b - backup, L - labeled-unicast, q - Pending FIB install
                    % - Pending best path selection
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI Origin Validation codes: V - valid, I - invalid, U - unknown
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Link Local Nexthop

          Network                Next Hop              Metric  AIGP       LocPref Weight  Path
 * >      10.3.10.0/24           10.3.0.1              -       -          100     -       i

Yep, that also looks correct.

delay-row2-left#show rib route ip 10.3.10.0/24 debug 
VRF: default
Codes: C - Connected, S - Static, P - Route Input, G - Gribi
       B - BGP, O - Ospf, O3 - Ospf3, I - Isis, R - Rip, VL - VRF Leak
       > - Best Route, * - Unresolved Next hop
       EM - Exact match of the SR-TE Policy
       NM - Null endpoint match of the SR-TE Policy
       AM - Any endpoint match of the SR-TE Policy
       L - Part of a recursive route resolution loop
       A - Next hop not resolved in ARP/ND
       NF - Not in FEC
>B    10.3.10.0/24 [set ID 3, 200 pref/0 MED] updated 00:14:06 ago
         via [config: VRF ID 0, ipv4, ID 19] 10.3.0.1 [110 pref/26 metric] type ipv4
            via [status: ipv4, ID 7] 10.5.0.1, Ethernet1
            via [status: ipv4, ID 8] 10.5.0.1, Ethernet2

The RIB entry also looks good, the next-hops are resolved. The destination for 10.3.10.0/24 is 10.3.0.1 which is resolved to the nexthop 10.5.0.1 via IGP.

Let’s do a wireshark capture, just to be sure…

delay-row2-left(config)#clear bgp *
! Peerings for all neighbors were hard reset

delay-row2-left(config)#show ip route bgp

VRF: default
WARNING: Some of the routes are not programmed in     
kernel, and they are marked with '%'.                 
Source Codes:
       C - connected, S - static, K - kernel,
       O - OSPF, O IA - OSPF inter area, O E1 - OSPF external type 1,
       O E2 - OSPF external type 2, O N1 - OSPF NSSA external type 1,
       O N2 - OSPF NSSA external type2, O3 - OSPFv3,
       O3 IA - OSPFv3 inter area, O3 E1 - OSPFv3 external type 1,
       O3 E2 - OSPFv3 external type 2,
       O3 N1 - OSPFv3 NSSA external type 1,
       O3 N2 - OSPFv3 NSSA external type2, B - Other BGP Routes,
       B I - iBGP, B E - eBGP, R - RIP, I L1 - IS-IS level 1,
       I L2 - IS-IS level 2, A B - BGP Aggregate,
       A O - OSPF Summary, NG - Nexthop Group Static Route,
       V - VXLAN Control Service, M - Martian,
       DH - DHCP client installed default route,
       DP - Dynamic Policy Route, L - VRF Leaked,
       G  - gRIBI, RC - Route Cache Route,
       CL - CBF Leaked Route

 B I%     10.3.10.0/24 [200/0]
           via 10.5.0.1, Ethernet1
           via 10.5.0.1, Ethernet2
 B I%     10.8.10.0/24 [200/0]
           via 10.5.0.1, Ethernet1
           via 10.5.0.1, Ethernet2
 B I%     10.9.0.1/32 [200/0]
           via 10.5.0.1, Ethernet1
           via 10.5.0.1, Ethernet2
 B I%     10.9.10.0/24 [200/0]
           via 10.5.0.1, Ethernet1
           via 10.5.0.1, Ethernet2

Nope, that wasn’t it. The BGP capture also looks completely normal:

Wireshark capture of BGP update for 10.3.10.0/24

Wireshark capture of BGP update for 10.3.10.0/24

It gets even more cursed. Not only is the route not installed - it seems to get partially installed?

Let’s go to stage-left, where the route to 10.8.10.0/24 is installed

stage-left#show ip route 10.8.10.0/24

VRF: default
WARNING: Some of the routes are not programmed in     
kernel, and they are marked with '%'.                 
[...]

 B I      10.8.10.0/24 [200/0]
           via 10.5.0.1, Ethernet1
           via 10.5.0.1, Ethernet2
           via 10.2.0.1, Ethernet3
           via 10.2.0.1, Ethernet4
stage-left#ping 10.8.10.10 source Loopback 1
PING 10.8.10.10 (10.8.10.10) from 10.3.0.1 : 72(100) bytes of data.
From 10.2.0.1 icmp_seq=1 Destination Net Unreachable
From 10.6.0.1 icmp_seq=1 Destination Net Unreachable
80 bytes from 10.8.10.10: icmp_seq=1 ttl=60 time=3.35 ms
80 bytes from 10.8.10.10: icmp_seq=2 ttl=60 time=3.51 ms
80 bytes from 10.8.10.10: icmp_seq=3 ttl=60 time=2.36 ms

--- 10.8.10.10 ping statistics ---
3 packets transmitted, 3 received, +2 errors, 0% packet loss, time 3ms
rtt min/avg/max/mdev = 2.360/3.072/3.507/0.507 ms, pipe 2, ipg/ewma 1.693/3.243 ms

So… Some packets get lost, some get through. Looks like ECMP shenanigans.

The workaround

A really great engineer and DisNOG community member offered their help, and we dived into a nice late-night debugging session. We double-checked everything, even went through show tech-support extended tfa outputs, but couldn’t find anything obviously wrong. We spotted a little oddity in the show tech-support extended tfa output, and said engineer suggested to test maximum-paths 8 in the OSPF configuration. This did not change anything… But then I had a brainwave and thought “Let’s try maximum-paths 1”.

Lo and behold - all routes got installed correctly!

delay-row2-left(config)#show ip route bgp

VRF: default
WARNING: Some of the routes are not programmed in     
kernel, and they are marked with '%'.                 
[...]

 B I%     10.3.10.0/24 [200/0]
           via 10.5.0.1, Ethernet1
           via 10.5.0.1, Ethernet2
 B I%     10.8.10.0/24 [200/0]
           via 10.5.0.1, Ethernet1
           via 10.5.0.1, Ethernet2
 B I%     10.9.0.1/32 [200/0]
           via 10.5.0.1, Ethernet1
           via 10.5.0.1, Ethernet2
 B I%     10.9.10.0/24 [200/0]
           via 10.5.0.1, Ethernet1
           via 10.5.0.1, Ethernet2

delay-row2-left(config)#router ospf 1
delay-row2-left(config-router-ospf)#maximum-paths 1
delay-row2-left(config-router-ospf)#end
delay-row2-left#show ip route bgp

VRF: default
[...]

 B I      10.3.10.0/24 [200/0]
           via 10.5.0.1, Ethernet1
 B I      10.8.10.0/24 [200/0]
           via 10.5.0.1, Ethernet1
 B I      10.9.0.1/32 [200/0]
           via 10.5.0.1, Ethernet1
 B I      10.9.10.0/24 [200/0]
           via 10.5.0.1, Ethernet1

Look at that!

So, the workaround for this bug is to set maximum-paths 1 in the OSPF configuration. This seems to prevent the bug from occurring, allowing all iBGP routes to be installed correctly into the routing table.

TAC Case incoming…

Assets / Documents