TKG Multi-Site Global Load Balancing using Avi Multi-Cluster Kubernetes Operator (AMKO)

Overview

Load balancing in Tanzu Kubernetes Grid (when installed with NSX ALB) is accomplished by leveraging Avi Kubernetes operator (AKO), which delivers L4+L7 load balancing to the Kubernetes API endpoint and the applications deployed in Tanzu Kubernetes clusters. AKO runs as a pod in Tanzu Kubernetes clusters and serves as an Ingress controller and load balancer.

The Global Server Load Balancing (GSLB) function of NSX ALB enables load-balancing for globally distributed applications/workloads (usually, different data centers and public clouds). GSLB offers efficient traffic distribution across widely scattered application servers. This enables an organization to run several sites in either Active-Active (load balancing and disaster recovery) or Active-Standby (DR) mode.

With the growing footprint of containerized workloads in datacenters, organizations are deploying containerized workloads across multi-cluster/multi-site environments, necessitating the requirement for a technique to load-balance the application globally.

To meet this requirement, NSX ALB provides a feature called AMKO (Avi Multi-Cluster Kubernetes Operator) which is an operator for Kubernetes that facilitates application delivery across multiple clusters. AMKO runs as a pod in the Tanzu Kubernetes clusters and works in conjunction with AKO to facilitate multi-cluster application deployment, mapping the same application deployed on multiple clusters to a single GSLB service, extending application ingresses across multi-region and multi-availability-zone deployments. 

How AMKO Works with AKO & GSLB

When GSLB is enabled in NSX ALB, the ALB controller can be either a leader or a follower. The active site from which the initial GSLB site configuration is performed is the designated GSLB leader. Changes to GSLB configuration are permitted only from the leader node, which propagates those changes to all accessible followers. 

To achieve global load balancing, AKO is deployed (with layer-7 flag on) across all Tanzu Kubernetes clusters and acts as the default ingress controller to facilitate the creation and management of Virtual Services, VIP, FQDN, etc. AMKO recognizes these new VIPs and hostnames in the status field of the ingress object. AMKO then calls the NSX ALB APIs to create a new GSLB service with the new VIP on the leader cluster and configure GSLB services and DNS/IPAM settings which are synchronized across all the follower clusters automatically.

The below diagram shows a high-level workflow for how GSLB, AKO, and AMKO work together to provide global load balancing.

My Lab Setup

My lab setup is based on the following BOM:

Software component  Version
vCenter Server 7.0 Update 3k
ESXi 7.0 Update 3k
TKGm 2.1.1
NSX ALB 22.1.2
AKO 1.8.2
AMKO 1.9.2
TKG Bootstrapper CentOS-7
DNS Windows Server 2022 STD

Network Architecture

I have deployed the following reference architecture in my lab. Virtual networking for both sites is provided via vyos. The vyos router is connected to an L3 router to facilitate communication between the 2 sites. The domain used in my environment is sddc.lab

Deployment Workflow

1: Deploy TKG management & workload clusters with NSX ALB as the load balancer. (Not demonstrated in this post)

2: Configure GSLB sites in NSX ALB.

3: Create an ingress service in the workload clusters of Site-A & Site-B.

4: Deploy AMKO in SiteA & Site-B

5: Configure DNS zone delegation. 

6: Verify AMKO & GSLB configuration.

Deploy TKG management & workload clusters

In both sites, I have one management cluster and one workload cluster deployed using the prod plan for multiple control plane nodes. I am using NSX ALB on both sides to provide L4/L7 load balancing.

TKG Management Cluster

TKG Workload Cluster

I’m using AKO 1.8.2, which is included with TKG, and I’m using AkoDeploymentConfig (ADC) to deploy AKO. To learn more about AKO installation using ADC, please see this article.

My ADC yaml for Site-A is shown below



Site-B TKG Management Cluster

Site-B TKG WOrkload Cluster

Configure GSLB Sites

In my environment, the NSX ALB deployed in Site-A is configured as the GSLB leader, while the NSX ALB deployed in Site-B is configured as the GSLB follower. 

It is best practice to create a separate Service Engine Group for the DNS virtual service that serves GSLB.

Create DNS Service Engine Group

To create a Service Engine Group, login to NSX ALB and navigate to Infrastructure > Cloud Resources > Service Engine Group and select the right cloud and click on the Create button and configure the following settings. 

Note: For Active-Active SE, you must have an Enterprise license configured in ALB.

Under the advanced tab:

  • Configure the Service Engine Name Prefix that helps you to recognize DNS SE VMs easily in the vCenter inventory.
  • Select the compute and storage placement container for the SE VMs.

Create DNS Virtual Service

To create the global DNS virtual service, go to Applications > Virtual Services > Create Virtual Service and select the Advanced Setup option and configure the following settings:

  • Name: provide a name for the DNS VS. 
  • Application Profile: System-DNS

For the Service Port of the VS, switch to the advanced view and configure the settings as shown below.

To configure the VIP network for the DNS VS, click the Create VS VIP button. In my environment, I chose the same VIP network that I had configured for my workload cluster.

Provide a name for the VIP network and click on Add button.

Choose the VIP network and the placement network that DNS VS will use. 

Return to the VS creation wizard by clicking the Save button, then navigate to the Advanced page and pick the Service Engine Group that you created for DNS VS.

Observe the Service Engine vms creation in the vCenter inventory.

Repeat the steps for Site B and ensure that the DNS VS is up and healthy on both sites.

Enable GSLB and Configure GSLB Sites

To enable GSLB and configure GSLB sites, login to the ALB controller of Site-A and navigate to Infrastructure > GSLB > Site Configuration and click on the pencil icon on the right and configure the following settings:

  • Name: Provide a name for the GSLB Site.
  • Credentials: Provide the credentials of the controller node. 
  • IP Address: Provide the IP address of the controller node and select port 443. If you have configured the Controller cluster, use the cluster IP here.
  • Client Group IP Address Type: Public
  • GSLB Subdomain: The subdomain that you want to use. This is the domain that you will be delegating to your DNS server later.

Click on the ‘Save and Set DNS Virtual Services’ button and select the Site-A DNS VS and map it with the subdomain that you configured.

Click on the Save button and then click on ADD New Site. Provide the details of the Site-B ALB controller. Ensure you have selected the Active Member checkbox.

Click on the Save and Set DNS Virtual Services button and select the Site-B DNS VS and map it with the subdomain.

After hitting the Save button, both sites are configured as GSLB sites, with Site-A serving as the GSLB leader and Site-B serving as the GSLB follower.

Deploy Demo Application in Both Sites

I’m using a demo application called ‘Online Boutique’ to test the multi-site ingress. You can get the application manifest files from here. The simplest method is to perform a git clone of the project on the bootstrap machine from which you can access the workload cluster.

The following are the high-level steps for deploying the application:

1: Switch the context to the workload cluster.

2: Create a namespace for the demo application.

3: Create a registry secret to pull application images from the docker hub.

4: Update the service account with the registry credential. You need to append the line ‘imagePullSecrets‘ in the service account as shown below.

5: Deploy the application

6: Verify the application deployment

As part of the application deployment, one service is deployed as type Loadbalancer. This is the service that I will be exposing as an ingress later.

Repeat the above steps to deploy the demo application in Site-B as well.

Deploy AMKO in Site A & Site B

You must first generate a file named gslb-members before you can deploy AMKO. This file is essentially a kubeconfig file that has been merged from all Kubernetes clusters where AMKO will be deployed. AMKO assumes connectivity to all Kubernetes API servers in the member clusters. AMKO will be unable to monitor the Kubernetes resources in the member clusters without this. AMKO accesses all member Kubernetes clusters using gslb-members files.

The steps to generate the gslb-members file are given below:

Step 1: Generate a Kubeconfig file of workload clusters deployed in both sites.

Copy the Site-B kubeconfig file to the Site-A bootstrap VM.

Step 2: Merge Kubeconfig files

Step 3: Verify that the gslb-member file shows both member clusters’ context

Step 4: Create a generic secret that AMKO can use to authenticate to workload clusters.

Repeat this step for Site-B as well.

Step 5: Deploy AMKO through the Helm

Switch to Site-A workload cluster context before performing the below steps.

Note: If your bootstrap machine does not have Helm installed, then follow the instructions provided here to install Helm.

5.1: Add AKO helm repository

5.2: Generate AMKO values.yaml file

Modify the values.yaml file and fill in the fields shown in the below screenshot.

I have pasted my values.yaml here for reference.



Step 6: Install AMKO

Step 7: Validate that the AMKO pod is running

AMKO is now installed and running in Site-A. Repeat the same for Site-B. The values.yaml file for Site-B is slightly different. You have to change the value of ‘currentCluster’ to Site-B cluster context and the value of ‘currentClusterIsLeader’ to false.

I have provided the full yaml file below for reference.



Verify that the AMKO pod is running in Site B as well

Deploy Ingress for the Demo Application

For the demo application that you deployed earlier, expose the frontend-external service using ingress. This needs to be configured on both sites. The ingress yaml is provided below for reference.

Note: Change the fqdn to reflect your environment’s values. In addition, the app label should match what you have defined in the AMKO values.yaml file then only AMKO will be able to create the corresponding GSLB Service in NSX ALB.

To deploy the ingress, run the following command

Verify that the ingress object is created

AMKO creates the GSLB Service and updates the status to GSLB sites when the ingress app label matches the label defined in the AMKO configuration.

Click on the GSLB Service to check the status of the members from both sites.

Verify that AMKO from both sites is in sync and that the GSLB configuration is copied to both sites.

Configure DNS Delegation

In order for GSLB to handle incoming requests to ingress (onlineshop.gslb.sddc.lab), the DNS server must be configured to route all name resolution requests ending with (gslb.sddc.lab) to the GSLB DNS VS IPs. To accomplish this, we must configure our DNS server for zone delegation.

First, create 2 A records corresponding to the GSLB DNS VS IP from both sites.

Then right click on the forward lookup zone and select the New Delegation. For the delegated domain field, enter the subdomain that you want to delegate.

Next, add the two A records you created earlier and finish the wizard. At this point, your configuration should look like this.

Verify AMKO and GSLB Deployment

To verify that AMKO and GSLB are load-balancing HTTP sessions to the demo app, you can use a simple dig command with some seconds of pause added. If the GSLB is working fine, you should see the ingress IP from both sites in a Round-Robin fashion.

Note: Because GSLB will not load balance every single packet in the round-robin fashion over two geographically separate sites, you may see an IP address from one site appear many times in a row.

Troubleshooting Tips

After creating the ingress object, if you are seeing the errors as shown below in the AMKO log

Then edit the Global deployment Policy using the command ‘kubectl edit gdp global-gdp -n avi-system‘ and configure the parameters ‘syncVipOnly: true‘ under the matchClusters section.

I hope you enjoyed reading this post. Feel free to share this on social media if it is worth sharing.

Leave a Reply