Troubleshooting vRSLCM Deployment Failure in VCF

Last week while working in my VCF lab, I faced an issue with vRSLCM deployment. Deployment failed in step where SDDC Manager tries to configure vRSLCM NTP settings.

Started my troubleshooting with checking domainmanager.log on sddc manager appliance. I saw following log entries:

2020-05-30T14:44:37.013+0000 INFO [847f3151af714e35,a502] [c.v.e.s.v.v.ImportNtpSettingInVrslcm,dm-exec-4] Importing NTP Server 10.84.x.x in vRSLCM with name VCF NTP Server 1

2020-05-30T14:44:47.189+0000 DEBUG [847f3151af714e35,a502] [c.v.e.s.r.c.LoggingHttpRequestInterceptor,dm-exec-4] Request URI: https://vrslcm.vstellar.local/lcm/lcops/api/settings/productntpsetting
Request method: POST
Request body: {"name":"VCF NTP Server 1","hostName":"10.84.x.x"}
Response code: 400 BAD_REQUEST

2020-05-30T14:45:17.923+0000 ERROR [847f3151af714e35,f1d1] [c.v.e.s.o.model.error.ErrorFactory,dm-exec-10] [L5UVJG] INTERNAL_SERVER_ERROR Invocation of prefix '' part of task ImportNtpSettingInVrslcm in plugin VrslcmPlugin failed with exception.
com.vmware.evo.sddc.common.core.error.InternalServerErrorException: Invocation of prefix '' part of task ImportNtpSettingInVrslcm in plugin VrslcmPlugin failed with exception.

Caused by: org.springframework.web.client.HttpClientErrorException$BadRequest: 400 : [{"status":"ERROR","statusCode":"BAD_REQUEST","message":"Please check if the provided IP/FQDN is of an NTP server and is reachable.","resourceIdentifier":null,"errorStackTrace":["com.vmware.vrealize.lc... (13471 bytes)]

2020-05-30T14:44:37.013+0000 INFO [847f3151af714e35,a502] [c.v.e.s.v.v.ImportNtpSettingInVrslcm,dm-exec-4] Importing NTP Server 10.84.x.x in vRSLCM with name VCF NTP Server 1

2020-05-30T14:44:47.189+0000 DEBUG [847f3151af714e35,a502] [c.v.e.s.r.c.LoggingHttpRequestInterceptor,dm-exec-4] Request URI: https://vrslcm.vstellar.local/lcm/lcops/api/settings/productntpsetting

Request method: POST

Request body: {"name":"VCF NTP Server 1","hostName":"10.84.x.x"}

Response code: 400 BAD_REQUEST

2020-05-30T14:45:17.923+0000 ERROR [847f3151af714e35,f1d1] [c.v.e.s.o.model.error.ErrorFactory,dm-exec-10] [L5UVJG] INTERNAL_SERVER_ERROR Invocation of prefix '' part of task ImportNtpSettingInVrslcm in plugin VrslcmPlugin failed with exception.

com.vmware.evo.sddc.common.core.error.InternalServerErrorException: Invocation of prefix '' part of task ImportNtpSettingInVrslcm in plugin VrslcmPlugin failed with exception.

Caused by: org.springframework.web.client.HttpClientErrorException$BadRequest: 400 : [{"status":"ERROR","statusCode":"BAD_REQUEST","message":"Please check if the provided IP/FQDN is of an NTP server and is reachable.","resourceIdentifier":null,"errorStackTrace":["com.vmware.vrealize.lc... (13471 bytes)]

It turned out that vRSLCM appliance was unable to reach the NTP server which I am using throughtout my VCF deployment. Initially this issue looked strange to me as each and every component of my deployment was able to reach NTP server.

Later when I digged more into issue, I found that NTP server was not reachable from the “xreg-seg” that is created for an AVN enabled bringup. On performing traceroute, I found that packet was getting dropped at downlink interface of my T0 router.

root@vrslcm [ ~ ]# traceroute 10.84.x.x
traceroute to 10.84.x.x (10.84.55.42), 30 hops max, 60 byte packets

1 _gateway (192.168.11.1) 0.988 ms 0.909 ms 0.748 ms

2 100.64.32.0 (100.64.32.0) 3.237 ms !N 4.051 ms !N 4.037 ms !N

root@vrslcm [ ~ ]# traceroute 10.84.x.x

traceroute to 10.84.x.x (10.84.55.42), 30 hops max, 60 byte packets

1 _gateway (192.168.11.1) 0.988 ms 0.909 ms 0.748 ms

2 100.64.32.0 (100.64.32.0) 3.237 ms !N 4.051 ms !N 4.037 ms !N

This was a little weird to me as I had end to end working BGP setup in my lab. I then reached out to a good friend of mine Roshan who have very good exposure on VCF and NSX-T etc.

We discussed this issue and after brainstorming for 45 minutes, we discovered that the network on which NTP server was residing, was not advertised via BGP and that is why my T0 & T1 did not had visibility to that network and hence NTP server was unreachable.

I am using VYOS in my lab which acts as ToR for my VCF stack. The moment I advertised NTP subnet via BGP and retried task in SDDC Manager, issue was resolved.

vyos@mj-vyos# set protocols bgp 65001 address-family ipv4 network 10.84.x.x/24

And that’s it for this post.

I hope you enjoyed reading this post. Feel free to share this on social media if it is worth sharing 🙂

vStellar Blog

Where VCF is Life and NSX is Love.

Troubleshooting vRSLCM Deployment Failure in VCF

Leave a Reply Cancel reply