Table of Contents
Overview
Load balancing in Tanzu Kubernetes Grid (when installed with NSX ALB) is accomplished by leveraging Avi Kubernetes operator (AKO), which delivers L4+L7 load balancing to the Kubernetes API endpoint and the applications deployed in Tanzu Kubernetes clusters. AKO runs as a pod in Tanzu Kubernetes clusters and serves as an Ingress controller and load balancer.
The Global Server Load Balancing (GSLB) function of NSX ALB enables load-balancing for globally distributed applications/workloads (usually, different data centers and public clouds). GSLB offers efficient traffic distribution across widely scattered application servers. This enables an organization to run several sites in either Active-Active (load balancing and disaster recovery) or Active-Standby (DR) mode.
With the growing footprint of containerized workloads in datacenters, organizations are deploying containerized workloads across multi-cluster/multi-site environments, necessitating the requirement for a technique to load-balance the application globally.
To meet this requirement, NSX ALB provides a feature called AMKO (Avi Multi-Cluster Kubernetes Operator) which is an operator for Kubernetes that facilitates application delivery across multiple clusters. AMKO runs as a pod in the Tanzu Kubernetes clusters and works in conjunction with AKO to facilitate multi-cluster application deployment, mapping the same application deployed on multiple clusters to a single GSLB service, extending application ingresses across multi-region and multi-availability-zone deployments.
How AMKO Works with AKO & GSLB
When GSLB is enabled in NSX ALB, the ALB controller can be either a leader or a follower. The active site from which the initial GSLB site configuration is performed is the designated GSLB leader. Changes to GSLB configuration are permitted only from the leader node, which propagates those changes to all accessible followers.
To achieve global load balancing, AKO is deployed (with layer-7 flag on) across all Tanzu Kubernetes clusters and acts as the default ingress controller to facilitate the creation and management of Virtual Services, VIP, FQDN, etc. AMKO recognizes these new VIPs and hostnames in the status field of the ingress object. AMKO then calls the NSX ALB APIs to create a new GSLB service with the new VIP on the leader cluster and configure GSLB services and DNS/IPAM settings which are synchronized across all the follower clusters automatically.
The below diagram shows a high-level workflow for how GSLB, AKO, and AMKO work together to provide global load balancing.
My Lab Setup
My lab setup is based on the following BOM:
Software component | Version |
vCenter Server | 7.0 Update 3k |
ESXi | 7.0 Update 3k |
TKGm | 2.1.1 |
NSX ALB | 22.1.2 |
AKO | 1.8.2 |
AMKO | 1.9.2 |
TKG Bootstrapper | CentOS-7 |
DNS | Windows Server 2022 STD |
Network Architecture
I have deployed the following reference architecture in my lab. Virtual networking for both sites is provided via vyos. The vyos router is connected to an L3 router to facilitate communication between the 2 sites. The domain used in my environment is sddc.lab.
Deployment Workflow
1: Deploy TKG management & workload clusters with NSX ALB as the load balancer. (Not demonstrated in this post)
2: Configure GSLB sites in NSX ALB.
3: Create an ingress service in the workload clusters of Site-A & Site-B.
4: Deploy AMKO in SiteA & Site-B
5: Configure DNS zone delegation.
6: Verify AMKO & GSLB configuration.
Deploy TKG management & workload clusters
In both sites, I have one management cluster and one workload cluster deployed using the prod plan for multiple control plane nodes. I am using NSX ALB on both sides to provide L4/L7 load balancing.
TKG Management Cluster
TKG Workload Cluster
I’m using AKO 1.8.2, which is included with TKG, and I’m using AkoDeploymentConfig (ADC) to deploy AKO. To learn more about AKO installation using ADC, please see this article.
My ADC yaml for Site-A is shown below
Site-B TKG Management Cluster
Site-B TKG WOrkload Cluster
Configure GSLB Sites
In my environment, the NSX ALB deployed in Site-A is configured as the GSLB leader, while the NSX ALB deployed in Site-B is configured as the GSLB follower.
It is best practice to create a separate Service Engine Group for the DNS virtual service that serves GSLB.
Create DNS Service Engine Group
To create a Service Engine Group, login to NSX ALB and navigate to Infrastructure > Cloud Resources > Service Engine Group and select the right cloud and click on the Create button and configure the following settings.
Note: For Active-Active SE, you must have an Enterprise license configured in ALB.
Under the advanced tab:
- Configure the Service Engine Name Prefix that helps you to recognize DNS SE VMs easily in the vCenter inventory.
- Select the compute and storage placement container for the SE VMs.
Create DNS Virtual Service
To create the global DNS virtual service, go to Applications > Virtual Services > Create Virtual Service and select the Advanced Setup option and configure the following settings:
- Name: provide a name for the DNS VS.
- Application Profile: System-DNS
For the Service Port of the VS, switch to the advanced view and configure the settings as shown below.
To configure the VIP network for the DNS VS, click the Create VS VIP button. In my environment, I chose the same VIP network that I had configured for my workload cluster.
Provide a name for the VIP network and click on Add button.
Choose the VIP network and the placement network that DNS VS will use.
Return to the VS creation wizard by clicking the Save button, then navigate to the Advanced page and pick the Service Engine Group that you created for DNS VS.
Observe the Service Engine vms creation in the vCenter inventory.
Repeat the steps for Site B and ensure that the DNS VS is up and healthy on both sites.
Enable GSLB and Configure GSLB Sites
To enable GSLB and configure GSLB sites, login to the ALB controller of Site-A and navigate to Infrastructure > GSLB > Site Configuration and click on the pencil icon on the right and configure the following settings:
- Name: Provide a name for the GSLB Site.
- Credentials: Provide the credentials of the controller node.
- IP Address: Provide the IP address of the controller node and select port 443. If you have configured the Controller cluster, use the cluster IP here.
- Client Group IP Address Type: Public
- GSLB Subdomain: The subdomain that you want to use. This is the domain that you will be delegating to your DNS server later.
Click on the ‘Save and Set DNS Virtual Services’ button and select the Site-A DNS VS and map it with the subdomain that you configured.
Click on the Save button and then click on ADD New Site. Provide the details of the Site-B ALB controller. Ensure you have selected the Active Member checkbox.
Click on the Save and Set DNS Virtual Services button and select the Site-B DNS VS and map it with the subdomain.
After hitting the Save button, both sites are configured as GSLB sites, with Site-A serving as the GSLB leader and Site-B serving as the GSLB follower.
Deploy Demo Application in Both Sites
I’m using a demo application called ‘Online Boutique’ to test the multi-site ingress. You can get the application manifest files from here. The simplest method is to perform a git clone of the project on the bootstrap machine from which you can access the workload cluster.
The following are the high-level steps for deploying the application:
1: Switch the context to the workload cluster.
1 2 |
[root@sitea-bootstrap ~]# kubectl config use-context sitea-tkg-wld01-admin@sitea-tkg-wld01 Switched to context "sitea-tkg-wld01-admin@sitea-tkg-wld01". |
2: Create a namespace for the demo application.
1 |
kubectl create ns boutique-app |
3: Create a registry secret to pull application images from the docker hub.
1 2 3 4 5 6 7 8 9 10 |
[root@sitea-bootstrap ~]# docker login Login with your Docker ID to push and pull images from Docker Hub. If you don't have a Docker ID, head over to https://hub.docker.com to create one. Username: vstellar Password: WARNING! Your password will be stored unencrypted in /root/.docker/config.json. Login Succeeded [root@sitea-bootstrap ~]# kubectl create secret generic regcred --from-file=.dockerconfigjson=/root/.docker/config.json --type=kubernetes.io/dockerconfigjson -n boutique-app |
4: Update the service account with the registry credential. You need to append the line ‘imagePullSecrets‘ in the service account as shown below.
1 2 3 4 5 6 7 8 9 |
[root@sitea-bootstrap ~]# kubectl edit sa default -n boutique-app apiVersion: v1 imagePullSecrets: - name: regcred kind: ServiceAccount metadata: name: default namespace: boutique-app |
5: Deploy the application
1 |
[root@sitea-bootstrap]# kubectl apply -f microservices-demo/release/kubernetes-manifests.yaml -n boutique-app |
6: Verify the application deployment
As part of the application deployment, one service is deployed as type Loadbalancer. This is the service that I will be exposing as an ingress later.
Repeat the above steps to deploy the demo application in Site-B as well.
Deploy AMKO in Site A & Site B
You must first generate a file named gslb-members before you can deploy AMKO. This file is essentially a kubeconfig file that has been merged from all Kubernetes clusters where AMKO will be deployed. AMKO assumes connectivity to all Kubernetes API servers in the member clusters. AMKO will be unable to monitor the Kubernetes resources in the member clusters without this. AMKO accesses all member Kubernetes clusters using gslb-members files.
The steps to generate the gslb-members file are given below:
Step 1: Generate a Kubeconfig file of workload clusters deployed in both sites.
1 2 3 |
[root@sitea-bootstrap ~]# tanzu cluster kubeconfig get sitea-tkg-wld01 --admin --export-file sitea-tkg-wld01-kubeconfig [root@siteab-bootstrap ~]# tanzu cluster kubeconfig get siteb-tkg-wld01 --admin --export-file siteb-tkg-wld01-kubeconfig |
Copy the Site-B kubeconfig file to the Site-A bootstrap VM.
Step 2: Merge Kubeconfig files
1 2 3 |
[root@sitea-bootstrap ~]# export KUBECONFIG=sitea-tkg-wld01-kubeconfig:siteb-tkg-wld01-kubeconfig [root@sitea-bootstrap ~]# kubectl config view --flatten > gslb-members |
Step 3: Verify that the gslb-member file shows both member clusters’ context
1 2 3 4 5 |
[root@sitea-bootstrap ~]# kubectl config get-contexts --kubeconfig gslb-members CURRENT NAME CLUSTER AUTHINFO NAMESPACE * sitea-tkg-wld01-admin@sitea-tkg-wld01 sitea-tkg-wld01 sitea-tkg-wld01-admin siteb-tkg-wld01-admin@siteb-tkg-wld01 siteb-tkg-wld01 siteb-tkg-wld01-admin |
Step 4: Create a generic secret that AMKO can use to authenticate to workload clusters.
1 2 3 4 |
[root@sitea-bootstrap ~]# kubectl config use-context sitea-tkg-wld01-admin@sitea-tkg-wld01 [root@sitea-bootstrap ~]# kubectl create secret generic gslb-config-secret --from-file gslb-members -n avi-system secret/gslb-config-secret created |
Repeat this step for Site-B as well.
1 2 3 4 5 |
[root@sitea-bootstrap ~]# kubectl config use-context siteb-tkg-wld01-admin@siteb-tkg-wld01 Switched to context "siteb-tkg-wld01-admin@siteb-tkg-wld01". [root@sitea-bootstrap ~]# kubectl create secret generic gslb-config-secret --from-file gslb-members -n avi-system secret/gslb-config-secret created |
Step 5: Deploy AMKO through the Helm
Switch to Site-A workload cluster context before performing the below steps.
Note: If your bootstrap machine does not have Helm installed, then follow the instructions provided here to install Helm.
5.1: Add AKO helm repository
1 2 3 4 5 6 7 |
[root@sitea-bootstrap ~]# helm repo add ako https://projects.registry.vmware.com/chartrepo/ako "ako" has been added to your repositories [root@sitea-bootstrap ~]# helm search repo | grep ako ako/ako 1.9.2 1.9.2 A helm chart for Avi Kubernetes Operator ako/ako-operator 1.3.1 1.3.1 A Helm chart for Kubernetes AKO Operator ako/amko 1.9.1 1.9.1 A helm chart for Avi Kubernetes Operator |
5.2: Generate AMKO values.yaml file
1 |
[root@sitea-bootstrap]# helm show values ako/amko --version 1.9.1 > sitea-amko-values.yaml |
Modify the values.yaml file and fill in the fields shown in the below screenshot.
I have pasted my values.yaml here for reference.
Step 6: Install AMKO
1 2 3 4 5 6 7 8 |
[root@sitea-bootstrap ~]# helm install ako/amko --generate-name --version 1.9.1 -f sitea-amko-values.yaml --namespace=avi-system NAME: amko-1681371765 LAST DEPLOYED: Thu Apr 13 13:12:46 2023 NAMESPACE: avi-system STATUS: deployed REVISION: 1 TEST SUITE: None |
Step 7: Validate that the AMKO pod is running
1 2 3 4 5 |
[root@sitea-bootstrap ~]# kubectl get po -n avi-system NAME READY STATUS RESTARTS AGE ako-0 1/1 Running 0 24h amko-0 2/2 Running 0 70s |
AMKO is now installed and running in Site-A. Repeat the same for Site-B. The values.yaml file for Site-B is slightly different. You have to change the value of ‘currentCluster’ to Site-B cluster context and the value of ‘currentClusterIsLeader’ to false.
I have provided the full yaml file below for reference.
Verify that the AMKO pod is running in Site B as well
1 2 3 4 5 |
[root@siteb-bootstrap ~]# kubectl get po -n avi-system NAME READY STATUS RESTARTS AGE ako-0 1/1 Running 0 2d1h amko-0 2/2 Running 0 2d1h |
Deploy Ingress for the Demo Application
For the demo application that you deployed earlier, expose the frontend-external service using ingress. This needs to be configured on both sites. The ingress yaml is provided below for reference.
Note: Change the fqdn to reflect your environment’s values. In addition, the app label should match what you have defined in the AMKO values.yaml file then only AMKO will be able to create the corresponding GSLB Service in NSX ALB.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: onlineshop-ingress labels: app: gslb spec: rules: - host: onlineshop.gslb.sddc.lab http: paths: - pathType: Prefix path: / backend: service: name: frontend-external port: number: 80 |
To deploy the ingress, run the following command
1 |
[root@sitea-bootstrap ~]# kubectl apply -f sample-ingress.yaml -n boutique-app |
Verify that the ingress object is created
1 2 3 4 5 6 7 8 9 |
[root@sitea-bootstrap ~]# kubectl get ing -n boutique-app NAME CLASS HOSTS ADDRESS PORTS AGE onlineshop-ingress avi-lb onlineshop.gslb.sddc.lab 172.26.17.21 80 2d1h [root@siteb-bootstrap ~]# kubectl get ing -n boutique-app NAME CLASS HOSTS ADDRESS PORTS AGE onlineshop-ingress avi-lb onlineshop.gslb.sddc.lab 172.27.17.21 80 2d1h |
AMKO creates the GSLB Service and updates the status to GSLB sites when the ingress app label matches the label defined in the AMKO configuration.
Click on the GSLB Service to check the status of the members from both sites.
Verify that AMKO from both sites is in sync and that the GSLB configuration is copied to both sites.
Configure DNS Delegation
In order for GSLB to handle incoming requests to ingress (onlineshop.gslb.sddc.lab), the DNS server must be configured to route all name resolution requests ending with (gslb.sddc.lab) to the GSLB DNS VS IPs. To accomplish this, we must configure our DNS server for zone delegation.
First, create 2 A records corresponding to the GSLB DNS VS IP from both sites.
Then right click on the forward lookup zone and select the New Delegation. For the delegated domain field, enter the subdomain that you want to delegate.
Next, add the two A records you created earlier and finish the wizard. At this point, your configuration should look like this.
Verify AMKO and GSLB Deployment
To verify that AMKO and GSLB are load-balancing HTTP sessions to the demo app, you can use a simple dig command with some seconds of pause added. If the GSLB is working fine, you should see the ingress IP from both sites in a Round-Robin fashion.
Note: Because GSLB will not load balance every single packet in the round-robin fashion over two geographically separate sites, you may see an IP address from one site appear many times in a row.
Troubleshooting Tips
After creating the ingress object, if you are seeing the errors as shown below in the AMKO log
1 2 3 4 5 6 7 |
2023-04-17T07:32:16.121Z ERROR nodes/avi_model_nodes.go:667 gsName: onlineshop.gslb.sddc.lab, cluster: siteb-tkg-wld01-admin@siteb-tkg-wld01, namespace: boutique-app, member: onlineshop-ingress/onlineshop.gslb.sddc.lab, msg: controller UUID or VS UUID missing from the object, won't update member 2023-04-17T07:32:17.158Z ERROR nodes/avi_model_nodes.go:649 gsName: onlineshop.gslb.sddc.lab, cluster: sitea-tkg-wld01-admin@sitea-tkg-wld01, namespace: boutique-app, msg: controller UUID or VS UUID missing from the object, won't update member onlineshop-ingress/onlineshop.gslb.sddc.lab 2023-04-17T07:32:17.159Z ERROR nodes/avi_model_nodes.go:667 gsName: onlineshop.gslb.sddc.lab, cluster: siteb-tkg-wld01-admin@siteb-tkg-wld01, namespace: boutique-app, member: onlineshop-ingress/onlineshop.gslb.sddc.lab, msg: controller UUID or VS UUID missing from the object, won't update member 2023-04-17T07:32:17.165Z ERROR rest/dq_nodes.go:146 key: admin/onlineshop.gslb.sddc.lab, msg: no cache object for this GS was found, can't delete |
Then edit the Global deployment Policy using the command ‘kubectl edit gdp global-gdp -n avi-system‘ and configure the parameters ‘syncVipOnly: true‘ under the matchClusters section.
I hope you enjoyed reading this post. Feel free to share this on social media if it is worth sharing.