Troubleshoot VMware NSX Connectivity Issues

Monitor and analyze virtual machine traffic with Flow Monitoring

Flow monitoring is used to capture ingress/egress traffic of VM’s in a NSX environment. Flow monitoring is disabled by default and you need to enable it before you can use this tool. Once Flow monitoring is enabled, you need to wait for some time to let this tool gather data about your vSphere environment (much like how vROPS gather data before generating reports/recommendations etc)

Flow monitoring can be enabled by navigating to Networking & Security > Flow Monitoring > Configuration and clicking on Enable.

Under Flow Exclusion, you can exclude any object which you don’t want to monitor. For example, you can select option “destination” under Exclusion Settings and click on + button to specify a destination container for which flow monitoring data won’t be gathered.

Flow Monitoring Dashboard

Here you can see Top Flows, Top Destinations and Top Sources of your environment.

Top Flows: This tab will show you what type of traffic (http, ping, DNS, ARP etc) is flowing through your environment.

Since these screenshots are taken from my lab, where nothing much is configured, the top traffic is reported for ICMP (as I keep testing connectivity between VM’s post configuring anything new)

Top Destinations: This will give you a list of destination VM’s where most of the traffic is going. In production it can be a mail server or a web server or any such VM which is accessed by end users very frequently.

Top Sources: Those VM’s which is producing the maximum outgoing traffic.

Details by Service

This tab categorizes the traffic flow based on what type of services are being accessed. It will show you the amount of data collected and the number of sessions.

On selecting a Service you can see the traffic from the source and destination. You can even add a DFW Rule by selecting any object and clicking on Add Rule. You can specify whether traffic from a given source will be allowed or blocked.

Live Flow

Live Flow is used to see real-time traffic flow for a specific VM network interface. You can configure the time interval for which you want to see the live traffic and if a VM has more than one network interface, you can exclusively select a NIC for which flow will be gathered.

From Live Flow tab, click on “Select vNIC”

Select a VM from the list and click on > button to choose specific NIC.

Once a VM NIC is selected, click on Start button to capture real-time flow.

And you will see data getting populated and info like Source IP/Port, Destination IP/Port, Incoming Bytes, Outgoing Bytes etc will be displayed on-screen.

In my lab, I initiated a ping from one of my VM towards a web server and 2 records populated here.

Troubleshoot Virtual Machine Connectivity

VM’s connected to same logical switches should communicate with each other. If they are not communicating, then it means you have made some configuration mistakes. Common things which you can check for troubleshooting this issue are:

Ensure the 2 VM’s which you are trying to communicate with each other have IP address from same subnet.
Firewall running inside Guest OS is not blocking ping etc.
NIC card of both VM’s should be connected.
VM’s should be in same cluster which you connected to Transport Zone. If the VM’s are in different cluster, then ensure both clusters were added to TZ where you created the virtualwire.

To establish communication between 2 logical switches, you need a DLR. Logical switches are added to DLR as in internal interface (LIF) and while adding the LIF’s, you specify an IP address for that LIF. This IP address act as the Gateway IP for all the VM’s which are connected to that LIF.

If you are unable to ping between 2 VM’s which reside on different logical switches, then ensure following:

VM’s IP address should match with subnet IP which you added on the LIF and that IP is set as default gateway inside Guest OS. Refer screenshots below

You can also test the logical switch connectivity by double clicking on a logical switch and from Monitor tab you can test for Ping and Broadcast packets.

Select the source and destination host by clicking on browse button and click on “Start Test”

If the test is successful, you will see similar results shown below.

Troubleshoot dynamic routing protocols

You can configure dynamic routing like OSPF, BGP and IS-IS on NSX edges, DLR and/or between other network components. if the routing is not working as expected between the 2 devices, then you can use below commands to debug and troubleshoot routing issues.

Case 1: BGP is configured between DLR and NSX Edge

In my lab, NSX edge is distributing routes for network 192.168.20.0/29 via BGP to DLR and DLR is distributing 172.16.10.0/24,172.16.20.0 and 172.16.30.0/24 to ESG. I ran following commands to verify routing is working as expected.

Verify BGP Neighbor

DLR-01-Site-A-0> show ip bgp neighbor

BGP neighbor is 192.168.10.2, remote AS 65001,
BGP state = Established, up
Hold time is 180, Keep alive interval is 60 seconds
Neighbor capabilities:
Route refresh: advertised and received
Address family IPv4 Unicast:advertised and received
Graceful restart Capability:advertised and received
Restart remain time: 0
Received 22665 messages, Sent 22642 messages
Default minimum time between advertisement runs is 30 seconds
For Address family IPv4 Unicast:advertised and received
Index 1 Identifier 0x8ba77cc4
Route refresh request:received 0 sent 0
Prefixes received 3 sent 4 advertised 4
Connections established 2, dropped 2
Local host: 192.168.10.3, Local port: 55092
Remote host: 192.168.10.2, Remote port: 179

DLR-01-Site-A-0> show ip bgp neighbor

BGP neighbor is 192.168.10.2, remote AS 65001,

BGP state = Established, up

Hold time is 180, Keep alive interval is 60 seconds

Neighbor capabilities:

Route refresh: advertised and received

Address family IPv4 Unicast:advertised and received

Graceful restart Capability:advertised and received

Restart remain time: 0

Received 22665 messages, Sent 22642 messages

Default minimum time between advertisement runs is 30 seconds

For Address family IPv4 Unicast:advertised and received

Index 1 Identifier 0x8ba77cc4

Route refresh request:received 0 sent 0

Prefixes received 3 sent 4 advertised 4

Connections established 2, dropped 2

Local host: 192.168.10.3, Local port: 55092

Remote host: 192.168.10.2, Remote port: 179

DLR-01-Site-A-0> show ip bgp neighbors summary
Codes: I-Idle, C-Connect, OS-OpenSent, OC-OpenConfirm, A-Active, E-Established

BGP summary information for VRF default
Router ID: 192.168.10.1   Local AS: 65001

   Neighbor        AS          UpDown  InMsgs  OutMsgs InPfx   OutPfx  Flaps

E  192.168.10.2    65001       1w6d    22665   22642   3       4       1

DLR-01-Site-A-0> show ip bgp neighbors summary

Codes: I-Idle, C-Connect, OS-OpenSent, OC-OpenConfirm, A-Active, E-Established

BGP summary information for VRF default

Router ID: 192.168.10.1 Local AS: 65001

Neighbor AS UpDown InMsgs OutMsgs InPfx OutPfx Flaps

E 192.168.10.2 65001 1w6d 22665 22642 3 4 1

Verify routes learned via BGP

DLR-01-Site-A-0> show ip route bgp

Codes: O - OSPF derived, i - IS-IS derived, B - BGP derived,
C - connected, S - static, L1 - IS-IS level-1, L2 - IS-IS level-2,
IA - OSPF inter area, E1 - OSPF external type 1, E2 - OSPF external type 2,
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2

B       192.168.20.0/29      [200/0]       via 192.168.10.2

DLR-01-Site-A-0> show ip route bgp

Codes: O - OSPF derived, i - IS-IS derived, B - BGP derived,

C - connected, S - static, L1 - IS-IS level-1, L2 - IS-IS level-2,

IA - OSPF inter area, E1 - OSPF external type 1, E2 - OSPF external type 2,

N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2

B 192.168.20.0/29 [200/0] via 192.168.10.2

DLR-01-Site-A-0> show ip route
Total number of routes: 7

Codes: O - OSPF derived, i - IS-IS derived, B - BGP derived,
C - connected, S - static, L1 - IS-IS level-1, L2 - IS-IS level-2,
IA - OSPF inter area, E1 - OSPF external type 1, E2 - OSPF external type 2,
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2

S       0.0.0.0/0            [1/0]         via 192.168.10.2
C       172.16.10.0/24       [0/0]         via 172.16.10.1
C       172.16.20.0/24       [0/0]         via 172.16.20.1
C       172.16.30.0/24       [0/0]         via 172.16.30.1
C       192.168.10.0/29      [0/0]         via 192.168.10.3
B       192.168.20.0/29      [200/0]       via 192.168.10.2
C       192.168.109.0/24     [0/0]         via 192.168.109.240

DLR-01-Site-A-0> show ip route

Total number of routes: 7

Codes: O - OSPF derived, i - IS-IS derived, B - BGP derived,

C - connected, S - static, L1 - IS-IS level-1, L2 - IS-IS level-2,

IA - OSPF inter area, E1 - OSPF external type 1, E2 - OSPF external type 2,

N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2

S 0.0.0.0/0 [1/0] via 192.168.10.2

C 172.16.10.0/24 [0/0] via 172.16.10.1

C 172.16.20.0/24 [0/0] via 172.16.20.1

C 172.16.30.0/24 [0/0] via 172.16.30.1

C 192.168.10.0/29 [0/0] via 192.168.10.3

B 192.168.20.0/29 [200/0] via 192.168.10.2

C 192.168.109.0/24 [0/0] via 192.168.109.240

Verify Forwarding Table

DLR-01-Site-A-0> show ip forwarding
Codes: C - connected, R - remote,
> - selected route, * - FIB route

R>* 0.0.0.0/0 via 192.168.10.2, vNic_2
C>* 172.16.10.0/24 is directly connected, VDR
C>* 172.16.20.0/24 is directly connected, VDR
C>* 172.16.30.0/24 is directly connected, VDR
C>* 192.168.10.0/29 is directly connected, vNic_2
R>* 192.168.20.0/29 via 192.168.10.2, vNic_2
C>* 192.168.109.0/24 is directly connected, vNic_0

DLR-01-Site-A-0> show ip forwarding

Codes: C - connected, R - remote,

> - selected route, * - FIB route

R>* 0.0.0.0/0 via 192.168.10.2, vNic_2

C>* 172.16.10.0/24 is directly connected, VDR

C>* 172.16.20.0/24 is directly connected, VDR

C>* 172.16.30.0/24 is directly connected, VDR

C>* 192.168.10.0/29 is directly connected, vNic_2

R>* 192.168.20.0/29 via 192.168.10.2, vNic_2

C>* 192.168.109.0/24 is directly connected, vNic_0

Important: When configuring BGP between a DLR and ESG, always make sure to use protocol address for forming BGP neighborship. If you use DLR’s forwarding address in place of protocol address, routes won’t be learnt between ESG and DLR

Case 2: OSPF configured between 2 NSX edges : You can run following commands on both edge to troubleshoot OSPF issues

Verify OSPF neighbors

Peri-GW01-0> show ip ospf neighbors
NeighborID       Pri  Address          DeadTime  State                 Interface
192.168.20.1     128  192.168.20.1     32        Full/DR/1w6d          vNic_1

Peri-GW01-0> show ip ospf neighbors

NeighborID Pri Address DeadTime State Interface

192.168.20.1 128 192.168.20.1 32 Full/DR/1w6d vNic_1

Verify if routes are being learnt or not via OSPF

Peri-GW01-0> show ip route
Total number of routes: 7

Codes: O - OSPF derived, i - IS-IS derived, B - BGP derived,
C - connected, S - static, L1 - IS-IS level-1, L2 - IS-IS level-2,
IA - OSPF inter area, E1 - OSPF external type 1, E2 - OSPF external type 2,
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2

S 0.0.0.0/0 [1/0] via 192.168.109.1
O E2 172.16.10.0/24 [110/0] via 192.168.20.1
O E2 172.16.20.0/24 [110/0] via 192.168.20.1
O E2 172.16.30.0/24 [110/0] via 192.168.20.1
O E2 192.168.10.0/29 [110/0] via 192.168.20.1
C 192.168.20.0/29 [0/0] via 192.168.20.2
C 192.168.109.0/24 [0/0] via 192.168.109.241

Peri-GW01-0> show ip route

Total number of routes: 7

Codes: O - OSPF derived, i - IS-IS derived, B - BGP derived,

C - connected, S - static, L1 - IS-IS level-1, L2 - IS-IS level-2,

IA - OSPF inter area, E1 - OSPF external type 1, E2 - OSPF external type 2,

N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2

S 0.0.0.0/0 [1/0] via 192.168.109.1

O E2 172.16.10.0/24 [110/0] via 192.168.20.1

O E2 172.16.20.0/24 [110/0] via 192.168.20.1

O E2 172.16.30.0/24 [110/0] via 192.168.20.1

O E2 192.168.10.0/29 [110/0] via 192.168.20.1

C 192.168.20.0/29 [0/0] via 192.168.20.2

C 192.168.109.0/24 [0/0] via 192.168.109.241

If you have verified that your OSPF/BGP configuration is correct on both sides, make sure that you are advertising the routes. It’s a very common mistake that we forget to configure Route-Redistribution post configuring the dynamic routing protocol. Route Redistribution is configured under NSX Edge > Manage > Routing > Route Redistribution.

And that’s it for this post.

I hope you find this post informational. Feel free to share this on social media if it is worth sharing. Be sociable 🙂

Troubleshoot VMware NSX Connectivity Issues

Monitor and analyze virtual machine traffic with Flow Monitoring

Troubleshoot Virtual Machine Connectivity

Troubleshoot dynamic routing protocols

Like this:

Related

Leave a ReplyCancel reply

Monitor and analyze virtual machine traffic with Flow Monitoring

Troubleshoot Virtual Machine Connectivity

Troubleshoot dynamic routing protocols

Spread the Love

Like this:

Related

Leave a ReplyCancel reply