Tanzu Kubernetes Grid 1.4 Installation in Internet-Restricted Environment

An air gap (aka internet-restricted) installation method is used when the TKG environment (bootstrapper and cluster nodes) is unable to connect to the internet to download the installation binaries from the public VMware Registry during TKG install, or upgrades.

Internet restricted environments can use an internal private registry in place of the VMware public registry. An example of a commonly used registry solution is Harbor.

This blog post covers how to install TKGm using a private registry configured with a self-signed certificate.

Pre-requisites of Internet-Restricted Environment

Before you can deploy TKG management and workload clusters in an Internet-restricted environment, you must have:

An Internet-connected Linux jumphost machine that has:
- A minimum of 2 GB RAM, 2 vCPU, and 30 GB hard disk space.
- Docker client installed.
- Tanzu CLI installed.
- Carvel Tools installed.
- A version of yq greater than or equal to 4.9.2 is installed.
An internet-restricted Linux machine with Harbor installed.
A way for TKG cluster VMs to access images in the private registry.
A base image template containing the OS and Kubernetes versions, that will be used to deploy management & workload clusters, is imported in vSphere. Instructions for the same are here

Network Layout

The below diagram shows the network layout of how components are connected with each other in my lab.

The networks utilized for TKG (management & workload clusters) deployment are not connected to the internet. All essential binaries for TKG deployment are pushed into the internal harbor registry, and during deployment, you point your cluster configuration to use the internal registry. I placed the harbor registry on the TKG management network to keep the architecture simple.

The required firewall rules for communication are created on the infrastructure’s firewall device, which is beyond the scope of this article.

It’s time to dive into the lab and see things in action.

The below steps shows the procedure for deploying TKG 1.4 in an internet restricted environment. The below steps assumes that you have already configured your vSphere environment as per TKG requirements.

Step 1: Deploy and Configure NSX Advanced Load Balancer

Please see this article for the steps of deploying and configuring NSX ALB for TKG

Step 2: Deploy and Configure Harbor Registry

I already have a post on this topic, so I am not repeating the steps.

Step 3: Deploy & Configure Linux Jumphost

In my lab, I have deployed a CentOS 7 VM which is acting as a Linux jumphost and installed Tanzu CLI, Carvel Tools, Docker client, etc. The steps are here

When the jumphost configuration is ready, execute the tanzu init and tanzu management-cluster create command. When these commands are executed for the first time, they install the necessary Tanzu Kubernetes Grid configuration files in the ~/.config/tanzu/tkg folder on the jumphost.

The script that you create and run in the subsequent steps requires TKG Bill of Materials (BoM) YAML files to be present on the jumphost. The scripts in this procedure use the BoM files to identify the correct versions of the different Tanzu Kubernetes Grid component images to pull.

Step 4: If your environment has a DNS server, please ensure that you have created a DNS entry for the harbor registry.

Step 5: Set environment variables as shown below:

5a: Set the IP address or FQDN of your local registry.

# export TKG_CUSTOM_IMAGE_REPOSITORY=”PRIVATE-REGISTRY”

Where PRIVATE-REGISTRY is the IP address or FQDN of your private registry and the name of the project. For example, registry.example.com/library

5b: Set the repository from which to fetch Bill of Materials (BoM) YAML files.

# export TKG_IMAGE_REPO=”projects.registry.vmware.com/tkg”

5c: If your private registry uses a self-signed certificate, provide the CA certificate of the registry in base64 encoded format e.g. base64 -w 0 your-ca.crt

# export TKG_CUSTOM_IMAGE_REPOSITORY_CA_CERTIFICATE=LS0t[…]tLS0tLQ==

This CA certificate is automatically injected into all Tanzu Kubernetes clusters that you create in this Tanzu Kubernetes Grid instance.

Step 6: Generate the publish-images Script

Step 7: Run the publish-images Script

When the script finishes, verify that the harbor registry contains the tkg installation binaries.

Turn off the internet connection on the Linux jumphost once you’ve confirmed the availability of tkg installation binaries.

Step 8: Deploy TKG Management Cluster

To deploy the TKG Management cluster, start the installer interface by running the below command:

# tanzu management-cluster create --ui --bind <bootstrapper IP:Port> --browser none

1	# tanzu management-cluster create --ui --bind <bootstrapper IP:Port> --browser none

The installer interface launches in a browser and takes you through steps to configure the management cluster.

Once you’ve entered all of the necessary information and reached the Review Configuration screen, don’t click the Deploy Management Cluster button; instead, make a note of the location of the cluster config yaml file.

Append the entries for your private registry to the cluster configuration yaml file. An example yaml file is shown below for reference.

AVI_CA_DATA_B64: LS0tLS1CRUdJTiBDjlVSVElGSUNBVEUtLS0tLQo=
AVI_CLOUD_NAME: tkgvsphere-cloud01
AVI_CONTROLLER: 172.16.80.10
AVI_DATA_NETWORK: SS-VIP-NW
AVI_DATA_NETWORK_CIDR: 172.16.86.0/24
AVI_ENABLE: 'true'
AVI_LABELS: |
  'type': 'management'
AVI_PASSWORD: <encoded:Vk13YXJlMSE=>
AVI_SERVICE_ENGINE_GROUP: tkgvsphere-tkgmgmt-group01
AVI_USERNAME: admin
CLUSTER_CIDR: 100.96.0.0/11
CLUSTER_NAME: tkg14-mgmt-mj
CLUSTER_PLAN: prod
ENABLE_CEIP_PARTICIPATION: 'true'
ENABLE_MHC: 'true'
IDENTITY_MANAGEMENT_TYPE: none
INFRASTRUCTURE_PROVIDER: vsphere
SERVICE_CIDR: 100.64.0.0/13
TKG_HTTP_PROXY_ENABLED: false
DEPLOY_TKG_ON_VSPHERE7: 'true'
VSPHERE_DATACENTER: /vStellar-Mgmt-DC
VSPHERE_DATASTORE: /vStellar-Mgmt-DC/datastore/Mgmt-vSAN
VSPHERE_FOLDER: /vStellar-Mgmt-DC/vm/tkg-vsphere-tkg-mgmt
VSPHERE_NETWORK: TKG-Management
VSPHERE_PASSWORD: <encoded:Vk13YXJlMSE=>
VSPHERE_RESOURCE_POOL: /vStellar-Mgmt-DC/host/Mgmt-CL01/Resources/tkg-vsphere-tkg-Mgmt
VSPHERE_SERVER: mgmt-vc01.vstellar.local
VSPHERE_SSH_AUTHORIZED_KEY: ssh-rsa AAAAB3NzaC1yc2EAAAADAQZKPCNLO2zihRBcSw==
  administrator@vsphere.local
VSPHERE_USERNAME: administrator@vsphere.local
CONTROLPLANE_SIZE: medium
WORKER_SIZE: medium
VSPHERE_INSECURE: 'true'
AVI_CONTROL_PLANE_HA_PROVIDER: 'true'
ENABLE_AUDIT_LOGGING: 'true'
OS_ARCH: amd64
OS_NAME: photon
OS_VERSION: '3'
AVI_MANAGEMENT_CLUSTER_VIP_NETWORK_NAME: TKG-Cluster-VIP
AVI_MANAGEMENT_CLUSTER_VIP_NETWORK_CIDR: 172.16.83.0/24
TKG_CUSTOM_IMAGE_REPOSITORY: registry.vstellar.local/library
TKG_CUSTOM_IMAGE_REPOSITORY_SKIP_TLS_VERIFY: 'False'
TKG_CUSTOM_IMAGE_REPOSITORY_CA_CERTIFICATE: LS0tLS1CRUdJTiBDRRJRklDQVRFLS0tLS0K

AVI_CA_DATA_B64: LS0tLS1CRUdJTiBDjlVSVElGSUNBVEUtLS0tLQo=

AVI_CLOUD_NAME: tkgvsphere-cloud01

AVI_CONTROLLER: 172.16.80.10

AVI_DATA_NETWORK: SS-VIP-NW

AVI_DATA_NETWORK_CIDR: 172.16.86.0/24

AVI_ENABLE: 'true'

AVI_LABELS: |

'type': 'management'

AVI_PASSWORD: <encoded:Vk13YXJlMSE=>

AVI_SERVICE_ENGINE_GROUP: tkgvsphere-tkgmgmt-group01

AVI_USERNAME: admin

CLUSTER_CIDR: 100.96.0.0/11

CLUSTER_NAME: tkg14-mgmt-mj

CLUSTER_PLAN: prod

ENABLE_CEIP_PARTICIPATION: 'true'

ENABLE_MHC: 'true'

IDENTITY_MANAGEMENT_TYPE: none

INFRASTRUCTURE_PROVIDER: vsphere

SERVICE_CIDR: 100.64.0.0/13

TKG_HTTP_PROXY_ENABLED: false

DEPLOY_TKG_ON_VSPHERE7: 'true'

VSPHERE_DATACENTER: /vStellar-Mgmt-DC

VSPHERE_DATASTORE: /vStellar-Mgmt-DC/datastore/Mgmt-vSAN

VSPHERE_FOLDER: /vStellar-Mgmt-DC/vm/tkg-vsphere-tkg-mgmt

VSPHERE_NETWORK: TKG-Management

VSPHERE_PASSWORD: <encoded:Vk13YXJlMSE=>

VSPHERE_RESOURCE_POOL: /vStellar-Mgmt-DC/host/Mgmt-CL01/Resources/tkg-vsphere-tkg-Mgmt

VSPHERE_SERVER: mgmt-vc01.vstellar.local

VSPHERE_SSH_AUTHORIZED_KEY: ssh-rsa AAAAB3NzaC1yc2EAAAADAQZKPCNLO2zihRBcSw==

administrator@vsphere.local

VSPHERE_USERNAME: administrator@vsphere.local

CONTROLPLANE_SIZE: medium

WORKER_SIZE: medium

VSPHERE_INSECURE: 'true'

AVI_CONTROL_PLANE_HA_PROVIDER: 'true'

ENABLE_AUDIT_LOGGING: 'true'

OS_ARCH: amd64

OS_NAME: photon

OS_VERSION: '3'

AVI_MANAGEMENT_CLUSTER_VIP_NETWORK_NAME: TKG-Cluster-VIP

AVI_MANAGEMENT_CLUSTER_VIP_NETWORK_CIDR: 172.16.83.0/24

TKG_CUSTOM_IMAGE_REPOSITORY: registry.vstellar.local/library

TKG_CUSTOM_IMAGE_REPOSITORY_SKIP_TLS_VERIFY: 'False'

TKG_CUSTOM_IMAGE_REPOSITORY_CA_CERTIFICATE: LS0tLS1CRUdJTiBDRRJRklDQVRFLS0tLS0K

To deploy the management cluster, execute command: tanzu management-cluster create -f <cluster-config-file> -v 6

Once the management cluster is deployed, verify the health of the cluster by running the command: tanzu management-cluster get

Issues and Troubleshooting

1: Jumphost is unable to push tkg binaries to the harbor because of the self-signed certificate.

On the jumphost vm, navigate to the /etc/docker directory and execute the command: mkdir -p certs.d/<harbor-ip or fqdn>. In my lab, I’ve set up the harbor to use fqdn, so the directory structure looks like this.

[root@tkg-jump etc]# tree docker/
docker/
├── certs.d
│   └── registry.vstellar.local
│       └── ca.crt
└── key.json

[root@tkg-jump etc]# tree docker/

docker/

├── certs.d

│ └── registry.vstellar.local

│ └── ca.crt

└── key.json

Copy the <cert>.crt file from harbor registry and place it in the harbor directory which you just created and restart docker service.

2: If cert-manager or any other pod is stuck in ImagePullBackOff state and you are seeing errors similar to as shown below:

  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  18m                   default-scheduler  Successfully assigned cert-manager/cert-manager-595b5c4f75-b2gxl to tkg-kind-c6sbhhiinqel5l99rhq0-control-plane
  Normal   Pulling    14m (x4 over 18m)     kubelet            Pulling image "registry.vstellar.local/library/cert-manager-controller:v1.1.0_vmware.1"
  Warning  Failed     13m (x4 over 17m)     kubelet            Failed to pull image "registry.vstellar.local/library/cert-manager-controller:v1.1.0_vmware.1": rpc error: code = Unknown desc = failed to pull and unpack image "registry.vstellar.local/library/cert-manager-controller:v1.1.0_vmware.1": failed to resolve reference "registry.vstellar.local/library/cert-manager-controller:v1.1.0_vmware.1": failed to do request: Head https://registry.vstellar.local/v2/library/cert-manager-controller/manifests/v1.1.0_vmware.1: dial tcp: lookup registry.vstellar.local: no such host
  Warning  Failed     13m (x4 over 17m)     kubelet            Error: ErrImagePull
  Warning  Failed     13m (x6 over 17m)     kubelet            Error: ImagePullBackOff
  Normal   BackOff    2m55s (x48 over 17m)  kubelet            Back-off pulling image "registry.vstellar.local/library/cert-manager-controller:v1.1.0_vmware.1"

Type Reason Age From Message

---- ------ ---- ---- -------

Normal Scheduled 18m default-scheduler Successfully assigned cert-manager/cert-manager-595b5c4f75-b2gxl to tkg-kind-c6sbhhiinqel5l99rhq0-control-plane

Normal Pulling 14m (x4 over 18m) kubelet Pulling image "registry.vstellar.local/library/cert-manager-controller:v1.1.0_vmware.1"

Warning Failed 13m (x4 over 17m) kubelet Failed to pull image "registry.vstellar.local/library/cert-manager-controller:v1.1.0_vmware.1": rpc error: code = Unknown desc = failed to pull and unpack image "registry.vstellar.local/library/cert-manager-controller:v1.1.0_vmware.1": failed to resolve reference "registry.vstellar.local/library/cert-manager-controller:v1.1.0_vmware.1": failed to do request: Head https://registry.vstellar.local/v2/library/cert-manager-controller/manifests/v1.1.0_vmware.1: dial tcp: lookup registry.vstellar.local: no such host

Warning Failed 13m (x4 over 17m) kubelet Error: ErrImagePull

Warning Failed 13m (x6 over 17m) kubelet Error: ImagePullBackOff

Normal BackOff 2m55s (x48 over 17m) kubelet Back-off pulling image "registry.vstellar.local/library/cert-manager-controller:v1.1.0_vmware.1"

Create vsphere-overlay.yaml file for manually propagating harbor IP and fqdn in the /etc/hosts file of the control plane and worker nodes of the TKG clusters. Save the yaml file in the directory “~/.config/tanzu/tkg/providers/infrastructure-vsphere/ytt”.

A sample yaml is shown below for reference.

#@ load("@ytt:overlay", "overlay")

#@overlay/match by=overlay.subset({"kind":"KubeadmControlPlane"})
---
apiVersion: controlplane.cluster.x-k8s.io/v1alpha3
kind: KubeadmControlPlane
spec:
  kubeadmConfigSpec:
    preKubeadmCommands:
      #! Add nameserver to all k8s nodes
      #@overlay/append
      - printf  "\n172.16.81.30   registry.vstellar.local" >> /etc/hosts

#@overlay/match by=overlay.subset({"kind":"KubeadmConfigTemplate"})
---
apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3
kind: KubeadmConfigTemplate
spec:
  template:
    spec:
      preKubeadmCommands:
        #! Add nameserver to all k8s nodes
        #@overlay/append
        - printf "\n172.16.81.30   registry.vstellar.local" >> /etc/hosts

#@ load("@ytt:overlay", "overlay")

#@overlay/match by=overlay.subset({"kind":"KubeadmControlPlane"})

---

apiVersion: controlplane.cluster.x-k8s.io/v1alpha3

kind: KubeadmControlPlane

spec:

kubeadmConfigSpec:

preKubeadmCommands:

#! Add nameserver to all k8s nodes

#@overlay/append

- printf "\n172.16.81.30 registry.vstellar.local" >> /etc/hosts

#@overlay/match by=overlay.subset({"kind":"KubeadmConfigTemplate"})

---

apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3

kind: KubeadmConfigTemplate

spec:

template:

spec:

preKubeadmCommands:

#! Add nameserver to all k8s nodes

#@overlay/append

- printf "\n172.16.81.30 registry.vstellar.local" >> /etc/hosts

Redeploy the TKG cluster and it should go through.

That’s it for this post. I hope you enjoyed reading this post. Feel free to share this on social media if it is worth sharing.

Tanzu Kubernetes Grid 1.4 Installation in Internet-Restricted Environment

Like this:

Related

Leave a ReplyCancel reply

Spread the Love

Like this:

Related

Leave a ReplyCancel reply