NSX-T Tier-0 Gateway Inter-SR Routing Deep Dive

In my last Post i briefly talked about transit subnets that gets created when T1 gateway is attached to a T0 gateway. In this post we will learn in depth working of the SR components that gets deployed when we set up Logical Routing in NSX-T.

In this post we will learn about following:

  • Inter-SR Architecture
  • How to Enable Inter-SR routing
  • Ingress/Egress traffic patterns
  • Failure scenarios & remediation when an edge node losts northbound connectivity with upstream router

If you are new to NSX-T, then I would recommend reading my NSX-T series from below links:

1: NSX-T Management & Control Plane Setup

2: Uplink Profiles in NSX-T

3: Transport Zones & Transport Node Profiles

4: NSX-T Data Plane Setup

5: Configure Logical Routing in NSX-T

Let’s get started.

What is Tier-0 Inter-SR Routing?

Tier-0 gateway in active-active mode supports inter-SR iBGP. In active-active mode, the SR components form an internal connection between each other over a pre-defined NSX managed subnet 169.254.0.X/25. 

Inter-SR iBGP feature helps in tolerating failures that can occur on SR component of edge nodes. For e.g there is an uplink failure on one of the edge node. In such a scenario, all traffic flowing through SR component of failed edge, will be routed to SR component of working edge over the router link that gets established when T0 gateway is deployed and BGP is enabled.

Inter-SR Architecture

Before I proceed, I want to show logical topology of my NSX-T infra so that it will be easy to co-relate things.

In my lab I have 2 upstream routers (vyos) with which my T0 is BGP peering. The two uplinks used by edge nodes for northbound connectivity are backed by subnets 172.16.60.0/24, 172.16.70.0/24 and VLAN 600 & 700 respectively.

The Green & Red arrow shows direction of egress & ingress traffic respectively.

As we know, logical segments that are attached to T1 GW, route through DR components (Host + Edge) for east-west traffic.  If stateful services such as NAT or LB etc are configured on T1 GW, SR components also get instantiated on T1. 

Note: SR component is always present on T0 GW.

Unlike DR components, SR components are not distributed by nature and they always sits on edge nodes and are leveraged for routing North-South traffic.

Even if T0 is deployed in Active-Active mode, SR components on T1 GW are always in Active/Standby. The standby SR component is kept in operational down state and only comes into picture when active SR fails.

Note: For Active-Active T0 deployment, SR components on T0 are in Active-Active state.

  • The DR & SR components of the T0 & T1 gateway are connected to each other via an internal (Transit) link which is backed by NSX managed subnet 169.254.0.0/28
  • SR component of T1 gets attached to DR component of T0 (when we connect T1 GW to T0) via another transit link called Router Link which is backed by NSX managed subnet 100.64.0.0/16. Router ports on T0 & T1 get IP address 100.64.0.0/31 & 100.64.0.1/31 respectively.
  • SR components on T0 connects to each other via another internal segment called as inter-sr link. This link has its own VNI ID and is used implicitly for routing between SRs. This link is backed by NSX managed subnet 169.254.0.0/25. IPs of this subnet are non-floating, means in case SR goes down, IP will not move to other SRs.

Info about this interface can be obtained by getting into vrf corresponding to SR and running command: get interfaces

The DR component of T0 connects to both the SR component for load distribution. DR component of T0 has two default routes pointing to each of the T0 SR components as the next-hop. 

Both edge nodes connects to upstream router via 2 uplinks (VLAN 600 & 700 in my case). SR component of T0 leverages these uplinks to perform North-South routing either over BGP or via static routes. 

Ingress/Egress Traffic Flow

For North-South routing, any traffic that reaches to SR Component of T0, will prefer the eBGP path to the upstream router rather than the iBGP link to the other SR component. This is because eBGP paths are preferred over iBGP in BGP Path Selection. The Inter-SR link is leveraged only when in failure scenario.

The same rule applies for Ingress traffic. Northbound router will forward traffic to the T0 SR Component with which they have an eBGP relationship, rather than using iBGP link to its peer router. 

Local AS for IBGP peering

If we have configured local AS on T0, then the same AS is used for iBGP peering between all SRs. Otherwise private AS 65000 is used.

Note: iBGP sessions will have ⅓ KeepAlive/Hold timer settings until BFD over VTEP is tied to these sessions.

How Routes Are Exchanged b/w SR’s?

Pre NSX 3.0, the inter-SR IBGP mesh runs in the default vrf. All static and connected routes were redistributed into BGP. The eBGP learnt routes as well as the connected and user-defined static routes are exchanged over the IBGP sessions, and thus every SR is aware of routes on other SRs.

In NSX 3.0, logic has changed for inter-SR routing. A new control VRF for inter-SR is introduced (inter_sr_vrf) and the iBGP sessions runs in this new VRF. Currently, inter-SR routing is only used to sync routes of default vrf.

The inter_sr_vrf exists only in the control plane, not in dataplane. Edge control-plane takes care of programming any additional routes learnt through inter-SR routing into dataplane. Routes learnt from the inter-SR VRF will be installed into default vrf in datapath if the routes are not already present in the default VRF.

Let’s understand this with help of below example:

1: Route 10.10.10.0/24 is received from inter-SR vrf:

  • Route is not present in the default vrf: Install the route into datapath default vrf.
  • Route is already present in the default vrf: Ignore the route entry.

2: Route 10.10.10.0/24 is received from default vrf:

  • Accept this route and install it in datapath default vrf.

This way, we will only see 1 entry of 10.10.10.0/24 in the T0 forwarding table.

How to Enable Inter-SR Routing?

Inter-SR routing can be enabled during deployment or post deployment by editing T0 GW configuration and toggling Inter-SR iBGP feature under BGP section. NSX-T take cares of backend configuration automatically.

We can check the SR instances under Manager view in NSX-T. In below image we can see the 2 SR instances of T0 gateway.

Switching to Routing > BGP tab on T0, we can verify that iBGP being established b/w the SR components.

To verify Inter-SR & iBGP connection, login to each edge node over ssh and run command: get service router config and look for entries similar to as shown below:

On Edge 01

On Edge 02

Auto Generated route-maps are used on the inter-SR IBGP neighbors to prevent these routes to be advertised into eBGP neighbors. This is done by setting no-export community on the routes exchanged between the SRs. Also next-hop-self is configured on both SR component, so the traffic is appropriately routed to the correct SR.

And that’s it for this post.

I hope you enjoyed reading the post. Feel free to share this on social media if it is worth sharing 🙂 

Leave a ReplyCancel reply