How to make NSX ALB 21.1.3 work with TKGm 1.5.1

To test TKGm 1.5.1 against the latest version of nSX ALB, I upgraded my ALB deployment to 21.1.3. The deployment of the TKG management and workload cluster went smoothly.

However, when I deployed a sample load balancer application that uses a dedicated SEG and VIP network, the service was waiting for an external IP assignment. 

Read More

NSX ALB Signed Certificates and TKGm Installation Gotcha

The Problem

I recently replaced the self-signed NSX-ALB certificates with a CA-signed (Microsoft CA) certificate, which caused a new unanticipated issue with TKGm deployment.

The TKGm installer wizard was complaining about the certificate validity. I knew there was nothing wrong with the certificate validity on NSX ALB because it was replaced just a few hours ago. Nonetheless, I double-checked the certificate expiration date, which is set to 2024.

After some jiggling, I investigated the bootstrap machine CLI terminal, where I issued the tanzu management-cluster create command, and spotted the main problem right away.

This is the error shown in the CLI.

Since the certificate is not signed by a Public CA, the bootstrapper machine has no idea about the CA server who signed this cert.Read More

Replacing NSX ALB Certificates with Signed Certificates

In this post, I will walk through the steps of replacing NSX ALB self-signed certificates with a CA-signed certificate. For the purpose of this demonstration, I am using Active Directory Certificate Service in my lab. I have a windows server 2019 deployed and additional roles configured for AD integrated Certificate Service. 

Please follow the below procedure for replacing NSX ALB certificates.

Step 1: Generate Certificate Signing Request (CSR)

CSR includes information such as domain name, organization name, locality, and country. The request also contains the public key/private key, which will be associated with the certificate generated. A CSR can be generated directly from the NSX ALB portal, but that requires configuring a Certificate Management Profile or using the OpenSSL utility.

To generate a CSR via the NSX ALB portal, go to Templates > Security > SSL/TLS Certificates and click on the Create button, then select controller certificate from the drop-down menu.Read More

Backup and Restore TKG Clusters using Velero Standalone

In my last post, I discussed how Tanzu Mission Control Data Protection can be used to backup and restore stateful Kubernetes applications. In this tutorial, I’ll show you how to backup and restore applications using Velero standalone.

Why Use Velero Standalone?

You might wonder why you need to use Velero standalone when TMC Data Protection exists to make the process of backing up and restoring K8 applications easier. The answer is pretty simple. Currently, TMC does not have the ability to restore data across clusters. Backup and restore are limited to a single cluster. This is not an ideal solution from a business continuity point of view.

You may simply circumvent this situation if you use Velero standalone. In this approach, Velero is installed separately in both the source and target clusters. Both clusters have access to the S3 bucket where backups will be kept. If your source cluster is completely lost due to a disaster, you can redeploy the applications by downloading the backup from S3 and then restoring it.Read More

Backing Up Stateful Applications using TMC Data Protection

Introduction

Kubernetes is frequently thought of as a platform for stateless workloads because the majority of its resources are ephemeral. However, as Kubernetes grows in popularity, enterprises are deploying more and more stateful apps. Because stateful workloads require permanent storage for application data, you can no longer simply reload them in the event of a disaster.

As businesses invest extensively in Kubernetes and deploy more and more containerized applications across multi-clouds, providing adequate data protection in a distributed environment becomes a challenge that must be addressed.

Data Protection in Tanzu Mission Control (TMC) is provided by Velero which is an open-source project. Velero backups typically include application and cluster data like config maps, custom resource definitions, secrets, and so on, which would then be re-applied to a cluster during restoration. The resources that are using a persistent volume, are backed up using Restic

In this post, I’ll show how to backup and recover a stateful application running in a Tanzu Kubernetes cluster.Read More

Using Custom S3 Storage (MinIO) with TMC Data Protection

Introduction

Data protection in TMC is provided by Velero which is an open-source project that came with the Heptio acquisition.

When data protection is enabled on a Kubernetes cluster, the data backup is stored external to the TMC. TMC supports both AWS S3 and Custom S3 storage locations to store the backups.  Configuring the AWS S3 endpoint is pretty simple as TMC provides a CloudFormation script that does all the backend tasks such as creating S3 buckets, assigning permissions, etc.

AWS S3 might not be a suitable solution in some use cases. For instance, a customer has already invested heavily in an S3 solution (MinIO, Cloudian, etc). TMC allows customers to bring their own self-provisioned AWS S3 bucket or S3-compatible on-prem storage locations for their Kubernetes clusters.

In this post, I will be talking about how you can use on-prem S3 storage for storing Kubernetes backups taken from TMC Data Protection.Read More

Tanzu Kubernetes Grid 1.4 Installation in Internet-Restricted Environment

An air gap (aka internet-restricted) installation method is used when the TKG environment (bootstrapper and cluster nodes) is unable to connect to the internet to download the installation binaries from the public VMware Registry during TKG install, or upgrades. 

Internet restricted environments can use an internal private registry in place of the VMware public registry. An example of a commonly used registry solution is Harbor

This blog post covers how to install TKGm using a private registry configured with a self-signed certificate.

Pre-requisites of Internet-Restricted Environment

Before you can deploy TKG management and workload clusters in an Internet-restricted environment, you must have:

  • An Internet-connected Linux jumphost machine that has:
    • A minimum of 2 GB RAM, 2 vCPU, and 30 GB hard disk space.
    • Docker client installed.
    • Tanzu CLI installed. 
    • Carvel Tools installed.
    • A version of yq greater than or equal to 4.9.2 is installed.
  • An internet-restricted Linux machine with Harbor installed.
Read More

Resizing TKGm Cluster in VCD

This blog post explains how to resize (horizontal scale) a CSE provisioned TKGm cluster in VCD. 

In my lab, I deployed a TKGm cluster with one control plane and one worker node. 

To resize the cluster through the VCD UI, go to the Kubernetes Container Clusters page and select the TKGm cluster to resize. Click on the Resize option.

Select the number of worker nodes you want in your TKGm cluster and click the Resize button.Read More

Error Deploying Container Service Extension 3.1.1 – No module named ‘_sqlite3’

Container Service Extension 3.1.1 was released a few days back with new enhancements. The release announcements were made here and here.

Although the deployment procedure hasn’t changed much, mine was not smooth and I faced a couple of hiccups. This blog post discusses the problem I experienced and how I resolved it.

After installing VCD-CLI using pip, I was unable to execute any VCD command. The command was throwing an error as shown below:

Read More

Unable to delete TKGm clusters in VCD

I encountered an issue while playing with Container Service Extension 3.1.1 in my lab where I was unable to construct TKGm clusters. During troubleshooting, I discovered that the Rights Bundle “cse:nativeCluster Entitlement” was missing certain critical rights that are newly added with CSE 3.1.1.

On attempting to delete the failed clusters, the clusters stuck in the state “DELETE:IN_PROGRESS”.

On attempting to delete the failed cluster via vcd-cli, the operation failed with the error “RDE_ENTITY_NOT_RESOLVED

Read More