Can't find what you need?


• Ask the Community
• Create a Case
Reset Search
 

 

Article

EVPN and BGP extended communities on IPv4 sessions

« Go Back

Information

 
TitleEVPN and BGP extended communities on IPv4 sessions
Symptoms
Traffic drop on specific VLANs/VxLAN tunnels
Environment
  • EVPN
  • SLX or VDX
User-added image
Cause

Topology and design

The issue can be seen in the topology above for a multitenant EVPN network, where there are multiple customer VRFs are configured together with primary VRF (MAIN, associated with L3 VNI 100), that may provide external access or shared services.

10.128.20.0/22 is an network in MAIN VRF extended between Data Centres. 10.128.128.0/27 is an example of a customer subnet that has extra services associated with it. It is is connected to DC2LEAF1, but to reach it from outside the packets should first pass through some extra service chain on Borders, as depicted with green dashed line on the diagram.

In this particular case the packets should first pass through the firewall. Firewall is connected on DC2BRDR1 border pair and forms one BGP connection in prefix specific VRFs for every customer to learn about destinations and a second BGP session to advertise the prefixes back to Border Leaf switches in MAIN VRF.

Every Leaf is a pair of routers (A and B), but that is only important of understanding following outputs and does not have any impact on the problem. The addresses on the routers (e.g. 10.128.8.4) are the ones used for VxLAN tunnels origination/termination and BGP next hops.

10.128.20.0/22 subnet

The hosts on 10.128.20.0/22 on DC1LEAF1 (not on the diagram) and DC2LEAF1 pairs are learned via BGP EVPN ipv4-prefixes and ARP route types. Based on each /32 ARP EVPN route, an IPv4 route is injected into corresponding VRF and associated with the tunnel towards the destination. E.g. for Border Leaf pair in DC1 and a host in DC2 the state on DC1BRDR1A is

DC1BRDR1A# sh ip route 10.128.20.13/32 deta vrf MAIN
IP Routing Table for VRF "MAIN"
Total number of IP routes: 20
'*' denotes best ucast next-hop
'[x/y]' denotes [preference/metric]

10.128.20.13/32,
    *via 10.128.40.7%default-vrf, Ve 459, [20/0], 2m25s, eBgp, tag 0, (VNI 100, GW MAC 0027.f8df.d5d9, Tu 61442)

DC1BRDR1A# sh mac | in df.d5d9
300      0027.f8df.d5d9    EVPN     Active       Tu 61442
459      0027.f8df.d5d9    EVPN     Active       Tu 61442
2605     0027.f8df.d5d9    EVPN     Active       Tu 61442
2613     0027.f8df.d5d9    EVPN     Active       Tu 61442
2618     0027.f8df.d5d9    EVPN     Active       Tu 61442
2620     0027.f8df.d5d9    EVPN     Active       Tu 61442
2621     0027.f8df.d5d9    EVPN     Active       Tu 61442
2708     0027.f8df.d5d9    EVPN     Active       Tu 61442

DC1BRDR1A# sh tun 61442
Tunnel 61442, mode VXLAN, rbridge-ids 1-2
Ifindex 2080436226, Admin state up, Oper state up
Overlay gateway "DC2", ID 1
Source IP 10.128.8.4 ( Loopback 2 ), Vrf default-vrf
Destination IP 10.128.40.7

where

  • 10.128.20.13/32 is the host connected to DC2LEAF1
  • 10.128.40.7 is the tunnel destination address for DC2LEAF1
  • Tu 61442 is the tunnel number on DC1BRDR1A towards the address above. Tunnel numbers are locally significant and can differ both between routers and on both sides of a tunnel
  • VLAN/Ve 459 is the one mapped to L3 VNI 100 in EVPN configuration
  • GW MAC 0027.f8df.d5d9 is the MAC address for VE interface on DC2LEAF1 on the VLAN where 10.128.20.13 host is
DC2LEAF1A# sh int vlan 105
Vlan 105
Address is 0027.f8df.d5d9, Current address is 0027.f8df.d5d9

So far this is a standard/simple EVPN setup and and a test ping from a host behind DC1BRDR1 is working fine:

DC1HOST1# ping 10.128.20.13
Type Control-c to abort
PING 10.128.20.13 (10.128.20.13): 56 data bytes
64 bytes from 10.128.20.13: icmp_seq=0 ttl=62 time=2.726 ms
64 bytes from 10.128.20.13: icmp_seq=1 ttl=62 time=1.640 ms
64 bytes from 10.128.20.13: icmp_seq=2 ttl=62 time=2.513 ms
64 bytes from 10.128.20.13: icmp_seq=3 ttl=62 time=1.443 ms
64 bytes from 10.128.20.13: icmp_seq=4 ttl=62 time=3.073 ms

10.128.128.0/27 subnet

10.128.128.0/27 is one of customer specific protected subnet prefixes and it is also originated into BGP EVPN on DC2LEAF1 pair. When ipv4 prefix is injected into BGP and has number of attributes and extended communities attached, one of those being for Router MAC, which is the same MAC for VE interface on DC2LEAF1 as in the previous section - 0027.f8df.d5d9, encoded as ExtCom:06:03:00:27:f8:df:d5:d9

DC2LEAF1A# show bgp evpn routes type ipv4-prefix 10.128.128.0/27 t 0
Status A:AGGREGATE B:BEST b:NOT-INSTALLED-BEST C:CONFED_EBGP D:DAMPED
       E:EBGP H:HISTORY I:IBGP L:LOCAL M:MULTIPATH m:NOT-INSTALLED-MULTIPATH
       S:SUPPRESSED F:FILTERED s:STALE
1       Prefix: IP4Prefix:[0][10.128.128.0/27],  Status: BL,  Age: 3h40m42s
         NEXT_HOP: 10.128.40.7, Learned from Peer: Local Router
          LOCAL_PREF: 100,  MED: 0,  ORIGIN: incomplete,  Weight: 0
         AS_PATH:
            Extended Community: RT 4200100000:2701 ExtCom:06:03:00:27:f8:df:d5:d9 ExtCom:03:0c:00:00:00:00:00:08 ExtCom:03:0d:00:00:00:00:00:00 RT 62305:2701
            Default Extd Gw  Community: Received
            Extended Community: ExtCom: Tunnel Encapsulation (Type Vxlan)
            Adj_RIB_out count: 2,  Admin distance 0
            L3_vni: 2701 Router Mac : 0027.f8df.d5d9
            RD: 10.128.40.8:2701

This prefix is originated in a customer specific VRF D701, which is mapped to VNI/VE/VLAN 2701. That reaches border routers in VRF D701 and then further to the firewall in the same VRF, where relevant BGP configuration on DC2BRDR1 is

  address-family ipv4 unicast vrf D701
   ...
   neighbor 10.128.33.1 remote-as 4200002500
   neighbor 10.128.33.1 activate
   neighbor 10.128.33.1 bfd
   neighbor 10.128.33.1 send-community extended

Since sending of extended communities is configured for BGP session, those are present and seen on firewall

NL2FW#sh ip bgp routes detail 10.128.128.0
Number of BGP Routes matching display condition : 1
Status A:AGGREGATE B:BEST b:NOT-INSTALLED-BEST C:CONFED_EBGP D:DAMPED
       E:EBGP H:HISTORY I:IBGP L:LOCAL M:MULTIPATH m:NOT-INSTALLED-MULTIPATH
       S:SUPPRESSED F:FILTERED s:STALE
1       Prefix: 10.128.128.0/27,  Status: BE,  Age: 0h0m4s
         NEXT_HOP: 10.128.33.5, Metric: 0, Learned from Peer: 10.128.33.5 (4200102701)
          LOCAL_PREF: 100,  MED: none,  ORIGIN: incomplete,  Weight: 0
         AS_PATH: 4200002500 4200102701 4200002100 4200002401
            Extended Community: RT 4200100000:2701 ExtCom:06:03:00:27:f8:df:d5:d9 ExtCom:03:0c:00:00:00:00:00:08 ExtCom:03:0d:00:00:00:00:00:00 RT 62305:2701

The firewall is also configured to send extended communities to DC2BRDR1 pair. The prefix comes back from the firewall to DC2BRDR1 in MAIN VRF / VNI 100, extended community with MAC is received and prefix is set to "L3_vni: 100 Router Mac : 0027.f8df.d5d9". Note that there's a second extended community now with DC2BRDR1A MAC, which inserts the prefix into EVPN for MAIN VRF, but original DC2LEAF1A MAC is chosen for BGP updates

DC2BRDR1A# show bgp evpn routes type ipv4-prefix 10.128.128.0/27 t 0
...
1       Prefix: IP4Prefix:[0][10.128.128.0/27],  Status: BE,  Age: 0h0m11s
         NEXT_HOP: 10.128.40.4, Learned from Peer: 10.128.33.1 (4200002090)
          LOCAL_PREF: 100,  MED: none,  ORIGIN: incomplete,  Weight: 0
         AS_PATH: 4200002090 4200002500 4200102701 4200002100 4200002401
            Extended Community: RT 4200100000:2701 ExtCom:06:03:00:27:f8:df:d5:d9 ExtCom:03:0c:00:00:00:00:00:08 ExtCom:03:0d:00:00:00:00:00:00 RT 62305:2701 RT 4200100000:2 ExtCom:06:03:c4:f5:7c:67:51:5b ExtCom:03:0c:00:00:00:00:00:08 RT 62304:100
            Default Extd Gw  Community: Received
            Extended Community: ExtCom: Tunnel Encapsulation (Type Vxlan)
            Adj_RIB_out count: 6,  Admin distance 0
            L3_vni: 100 Router Mac : 0027.f8df.d5d9
            RD: 10.128.40.5:2
...

There are two conflicting record for MAC address 0027.f8df.d5d9 on DC2LEAF1A in BGP, one associated with tunnel 61442 is the tunnel towards 10.128.40.7/DC2LAEF1 and the other with 61441 is the ine towards 10.128.40.4/DC2BRDR1. MAC table on DC1BRDR1 is now updated with the MAC is pointing to tunnel towards DC2BRDR1

DC1BRDR1# sh mac | in df.d5d9
300      0027.f8df.d5d9    EVPN     Active       Tu 61442
459      0027.f8df.d5d9    EVPN     Active       Tu 61441
2605     0027.f8df.d5d9    EVPN     Active       Tu 61442
2613     0027.f8df.d5d9    EVPN     Active       Tu 61442
2618     0027.f8df.d5d9    EVPN     Active       Tu 61442
2620     0027.f8df.d5d9    EVPN     Active       Tu 61442
2621     0027.f8df.d5d9    EVPN     Active       Tu 61442
2708     0027.f8df.d5d9    EVPN     Active       Tu 61442

Now, when pinging as before from the same host behind DC1BRDR1, the packets are taking incorrect tunnel 61441 and are dropped on DC2BRDR1

Ping from behind NL1SW02 is now broken

DC1HOST1# ping 10.128.20.13
Type Control-c to abort
PING 10.128.20.13 (10.128.20.13): 56 data bytes
--- 10.128.20.13 ping statistics ---
5 packets transmitted, 0 packets received, 100% packet loss

 

 

Resolution

There are several different steps that can be taken to alter the design and avoid the problem:

  • Do not send extended communities on IPv4 unicast BGP sessions from Border Leafs to Firewall
  • Do not send extended communities on IPv4 unicast BGP sessions from Firewall to Border Leafs
  • Filter communities send to drop Router MAC from the list, if possible

The first option, that can be configured on SLX/VDX is shown below and VRF session towards firewall is reconfigured to not send extended communities:

DC2BRDR1A(config)# rb 1
DC2BRDR1A(config-rbridge-id-1)# router bgp
DC2BRDR1A(config-bgp-router)#  add ipv4 uni vrf D701
DC2BRDR1A(config-bgp-ipv4u-vrf)#   no neighbor 10.7.1.5 send-community extended
%Warning: Please clear the neighbor session for the parameter change to take effect!
DC2BRDR1A(config-bgp-ipv4u-vrf)#

After clearing the session, DC2BRDR1A's own MAC is now set in BGP EVPN prefixes for VRF MAIN:

1       Prefix: IP4Prefix:[0][10.128.128.0/27],  Status: BE,  Age: 0h0m56s
         NEXT_HOP: 10.128.40.4, Learned from Peer: 10.128.33.1 (4200002090)
          LOCAL_PREF: 100,  MED: none,  ORIGIN: incomplete,  Weight: 0
         AS_PATH: 4200002090 4200002500 4200102701 4200002100 4200002401
            Extended Community: RT 4200100000:2 ExtCom:06:03:c4:f5:7c:67:51:5b ExtCom:03:0c:00:00:00:00:00:08 RT 62304:100
            Extended Community: ExtCom: Tunnel Encapsulation (Type Vxlan)
            Adj_RIB_out count: 6,  Admin distance 0
            L3_vni: 100 Router Mac : c4f5.7c67.515b

Once BGP sessions and tunnels are refreshed in the network DC1BRDR1A has correct MAC table again

DC1BRDR1A# sh mac | in df.d5d9
300      0027.f8df.d5d9    EVPN     Active       Tu 61442
2605     0027.f8df.d5d9    EVPN     Active       Tu 61442
2613     0027.f8df.d5d9    EVPN     Active       Tu 61442
2618     0027.f8df.d5d9    EVPN     Active       Tu 61442
2620     0027.f8df.d5d9    EVPN     Active       Tu 61442
2621     0027.f8df.d5d9    EVPN     Active       Tu 61442
2708     0027.f8df.d5d9    EVPN     Active       Tu 61442
DC1BRDR1A# clear bgp evpn neighbor all rb all
DC1BRDR1A# sh mac | in df.d5d9
300      0027.f8df.d5d9    EVPN     Active       Tu 61442
459      0027.f8df.d5d9    EVPN     Active       Tu 61442
2605     0027.f8df.d5d9    EVPN     Active       Tu 61442
2613     0027.f8df.d5d9    EVPN     Active       Tu 61442
2618     0027.f8df.d5d9    EVPN     Active       Tu 61442
2620     0027.f8df.d5d9    EVPN     Active       Tu 61442
2621     0027.f8df.d5d9    EVPN     Active       Tu 61442
2708     0027.f8df.d5d9    EVPN     Active       Tu 61442

Ping from behind DC1BRDR1A is again successful

DC1HOST1# ping 10.128.20.13
Type Control-c to abort
PING 10.128.20.13 (10.128.20.13): 56 data bytes
64 bytes from 10.128.20.13: icmp_seq=0 ttl=62 time=2.010 ms
64 bytes from 10.128.20.13: icmp_seq=1 ttl=62 time=2.603 ms
64 bytes from 10.128.20.13: icmp_seq=2 ttl=62 time=0.818 ms
64 bytes from 10.128.20.13: icmp_seq=3 ttl=62 time=1.715 ms
64 bytes from 10.128.20.13: icmp_seq=4 ttl=62 time=2.013 ms
--- 10.128.20.13 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.818/1.832/2.603/0.583 ms
Additional notes

Feedback

 

Was this article helpful?


   

Feedback

Please tell us how we can make this article more useful.

Characters Remaining: 255