Recently while playing with vCF 3.7.2 in my lab, I encountered an issue where SDDC Bringup process was halted because of NFS mount problem.
If you are experienced with vCF then you would be knowing that during bringup process, NFS share from the sddc manager vm is mounted as a NFS datastore across the management domain by the name “lcm-bundle-repo”.
On checking the hostd.log on Esxi host I saw following log entries
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
2019-09-25T04:47:25.303Z info hostd[2098744] [Originator@6876 sub=Solo.Vmomi opID=0297e199 user=root] Activation [N5Vmomi10ActivationE:0x0000001ae16a7db0] : Invoke done [createNasDatastore] on [vim.host.DatastoreSystem:ha-datastoresystem] 2019-09-25T04:47:25.303Z info hostd[2098744] [Originator@6876 sub=Solo.Vmomi opID=0297e199 user=root] Throw vim.fault.PlatformConfigFault 2019-09-25T04:47:25.303Z info hostd[2098744] [Originator@6876 sub=Solo.Vmomi opID=0297e199 user=root] Result: --> (vim.fault.PlatformConfigFault) { --> faultCause = (vmodl.MethodFault) null, --> faultMessage = (vmodl.LocalizableMessage) [ --> (vmodl.LocalizableMessage) { --> key = "vob.vmfs.nfs.mount.error.perm.denied", --> arg = (vmodl.KeyAnyValue) [ --> (vmodl.KeyAnyValue) { --> key = "1", --> value = "10.62.73.244" --> }, --> (vmodl.KeyAnyValue) { --> key = "2", --> value = "/nfs/vmware/vcf/nfs-mount" --> } --> ], --> message = "NFS mount 10.62.x.244:/nfs/vmware/vcf/nfs-mount failed: The mount request was denied by the NFS server. Check that the export exists and that the client is permitted to mount it. --> " --> } |
vmkernel.log was full of below error messages
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
2019-09-25T04:47:18.015Z cpu7:2104329 opID=21588528)World: 11943: VC opID 0297e0ff maps to vmkernel opID 21588528 2019-09-25T04:47:18.015Z cpu7:2104329 opID=21588528)NFS: 160: Command: (mount) Server: (10.62.x.244) IP: (10.62.x.244) Path: (/nfs/vmware/vcf/nfs-mount) Label: (lcm-bundle-repo) Options: (ro) 2019-09-25T04:47:18.015Z cpu7:2104329 opID=21588528)StorageApdHandler: 977: APD Handle 173b5164-9515fadb Created with lock[StorageApd-0x430f271e6700] 2019-09-25T04:47:18.015Z cpu7:2104329 opID=21588528)SunRPC: 1099: Destroying world 0x242f34 2019-09-25T04:47:18.016Z cpu10:2113924)WARNING: NFS: 221: Got error 13 from mount call 2019-09-25T04:47:18.016Z cpu7:2104329 opID=21588528)SunRPC: 1099: Destroying world 0x242f35 2019-09-25T04:47:18.016Z cpu7:2104329 opID=21588528)StorageApdHandler: 1063: Freeing APD handle 0x430f271e6700 [173b5164-9515fadb] 2019-09-25T04:47:18.016Z cpu7:2104329 opID=21588528)StorageApdHandler: 1147: APD Handle freed! 2019-09-25T04:47:18.016Z cpu7:2104329 opID=21588528)NFS: 193: NFS mount 10.62.x.244:/nfs/vmware/vcf/nfs-mount failed: The mount request was denied by the NFS server. Check that the export exists and that the client is permitted to mount it. |
Clearly NFS server (sddc manager) was denying the requests from Esxi host to mount the nfs share “/nfs/vmware/vcf/nfs-mount”.
During troubleshooting I came across VMware KB-1005948 and my issue was exactly the same as mentioned in the KB.
As per KB 1005948
You may see this issue if you have more than one vmkernel port on the same network segment. VMware recommends only having one vmkernel port per network segment unless port binding is being used.
Since this was a nested lab, I kept all portgroups (Management/vMotion/vSAN) on same subnet and to my surprise the default gateway was pointing to vmk2 instead of vmk0.
1 2 3 4 5 6 7 |
[root@mgmt-esxi01:/var/log] esxcfg-route -l Network Netmask Gateway Interface 10.62.x.0 255.255.255.0 Local Subnet vmk2 default 0.0.0.0 10.62.x.253 vmk2 |
So one part is sorted out. Still I was wondering why sddc manager is refusing the mount request as Esxi and SDDC Manager are on same subnet and I was not using any kind of firewall to restrict any traffic.
Next I checked the /etc/exports file on the sddc manager vm and got answer of my question immediately. By default, the IP address of vmk0 of all host (of management domain) is explicitly white listed in the /etc/export
1 2 3 4 5 6 7 8 9 |
root@sddc-manager [ ~ ]# cat /etc/exports /nfs/vmware/vcf/nfs-mount 10.62.x.232(ro,sync,no_subtree_check) /nfs/vmware/vcf/nfs-mount 10.62.x.233(ro,sync,no_subtree_check) /nfs/vmware/vcf/nfs-mount 10.62.x.234(ro,sync,no_subtree_check) /nfs/vmware/vcf/nfs-mount 10.62.x.235(ro,sync,no_subtree_check) |
So when Esxi was trying to mount the nfs share, it was doing so via the IP configured on vmk2 and that’s why NFS server was rejecting that request. To fix this issue, I could have changed the vmk0 IP to vmk2 IP in above file, but I took a shortcut and I allowed entire subnet in the NFS configuration.
I commented all the lines and added below line
/nfs/vmware/vcf/nfs-mount 10.62.x.0/24(ro,sync,no_subtree_check)
and re-exported the share by firing exportfs -ar command followed by NFS service restart.
# systemctl restart nfs-mountd.service
# systemctl restart nfs-server.service
I retried the sddc bringup task and it completed without any further issues.
Final Thoughts
In a production deployment you were less likely to hit this issue as you would have proper networking setup in place and VLAN’s defined for your Management, vMotion, vSAN and VXLAN traffic and uplinks trunked on the physical switches.Â
If you are doing a POC/Lab setup and running everything on same subnet, you may probably hit this issue. But during bringup process, if your Esxi host default gateway is not changed to any other vmkernel port, then you will not face this.
If you have any other thoughts, do leave your comments and  will be happy to discuss further.
I hope you find this post informational. Feel free to share this on social media if it is worth sharing