Backup and Restore TKG Clusters using Velero Standalone |

Table of Contents

Toggle

In my last post, I discussed how Tanzu Mission Control Data Protection can be used to backup and restore stateful Kubernetes applications. In this tutorial, I’ll show you how to backup and restore applications using Velero standalone.

Why Use Velero Standalone?

You might wonder why you need to use Velero standalone when TMC Data Protection exists to make the process of backing up and restoring K8 applications easier. The answer is pretty simple. Currently, TMC does not have the ability to restore data across clusters. Backup and restore are limited to a single cluster. This is not an ideal solution from a business continuity point of view.

You may simply circumvent this situation if you use Velero standalone. In this approach, Velero is installed separately in both the source and target clusters. Both clusters have access to the S3 bucket where backups will be kept. If your source cluster is completely lost due to a disaster, you can redeploy the applications by downloading the backup from S3 and then restoring it.

When you can use Velero?

Velero is more than just a backup solution. In the following scenarios, velero can be used.

Back up your cluster and restore it in case of loss.
Recover from disaster.
Copy cluster resources to other clusters.
Replicate your production environment to create development and testing environments.
Take a snapshot of your application’s state before upgrading a cluster.

In this tutorial, I will be demonstrating the first use case.

For the purpose of demonstration, I am using the same Acme Fitness app, which I used in my last demo. Please see my previous post for instructions on installing and configuring the app. This application is running in a Tanzu Kubernetes cluster provisioned via Tanzu Mission Control.

For storing backups, I am using an S3 bucket (acme-backup) provisioned in MinIO. Instructions for configuring MinIO are documented here.

Install Velero CLI

Velero CLI can be installed on a Linux jumpbox from where you can access your Tanzu Kubernetes clusters. Before installing the Velero CLI, please check the supported Velero and K8 versions from the interop matrix published here.

The instructions for installing Velero CLI are listed below.

# wget https://github.com/vmware-tanzu/velero/releases/download/v1.6.2/velero-v1.6.2-linux-amd64.tar.gz

# tar -zxvf velero-v1.6.2-linux-amd64.tar.gz

# chmod +x velero

# mv velero /usr/local/bin/

# velero version --client-only
Client:
        Version: v1.6.2
        Git commit: 8c9cdb9603446760452979dc77f93b17054ea1cc

# wget https://github.com/vmware-tanzu/velero/releases/download/v1.6.2/velero-v1.6.2-linux-amd64.tar.gz

# tar -zxvf velero-v1.6.2-linux-amd64.tar.gz

# chmod +x velero

# mv velero /usr/local/bin/

# velero version --client-only

Client:

Version: v1.6.2

Git commit: 8c9cdb9603446760452979dc77f93b17054ea1cc

Create MinIO Credentials Store

For integrating MinIO with Velero, you need to provide MinIO credentials using which Velero can interact with the S3 bucket. Create a file to store your MinIO credentials

# cat credentials-minio

[default]
aws_access_key_id = <minio-root-user>
aws_secret_access_key = <minio-root-passwd>

# cat credentials-minio

[default]

aws_access_key_id = <minio-root-user>

aws_secret_access_key = <minio-root-passwd>

Install Velero in Source Cluster

Login to the source Tanzu Kubernetes cluster and switch the context to the cluster where you want to enable Velero protection. An example is shown below

# kubectl vsphere login --vsphere-username=administrator@vsphere.local --server=172.19.83.11 --insecure-skip-tls-verify --tanzu-kubernetes-cluster-name=mj-tkgs-wld01 --tanzu-kubernetes-cluster-namespace=workload

# kubectl config use-context mj-tkgs-wld01

Switched to context "mj-tkgs-wld01".

# kubectl vsphere login --vsphere-username=administrator@vsphere.local --server=172.19.83.11 --insecure-skip-tls-verify --tanzu-kubernetes-cluster-name=mj-tkgs-wld01 --tanzu-kubernetes-cluster-namespace=workload

# kubectl config use-context mj-tkgs-wld01

Switched to context "mj-tkgs-wld01".

Run the following command to enable velero:

In the below example, replace the IP 172.19.10.3 and Port 9000 with values configured in your environment.

# velero install --provider aws --plugins velero/velero-plugin-for-aws:v1.2.1 \
--bucket acme-backup \
--secret-file ./credentials-minio \
--use-volume-snapshots=false \
--use-restic \
--backup-location-config \
region=minio,s3ForcePathStyle="true",s3Url=http://172.19.10.3:9000,publicUrl=http://172.19.10.3:9000

# velero install --provider aws --plugins velero/velero-plugin-for-aws:v1.2.1 \

--bucket acme-backup \

--secret-file ./credentials-minio \

--use-volume-snapshots=false \

--use-restic \

--backup-location-config \

region=minio,s3ForcePathStyle="true",s3Url=http://172.19.10.3:9000,publicUrl=http://172.19.10.3:9000

Note: For a full list of configurable values with velero, run the command: velero install –help

The following components will be installed in the velero namespace once you run the aforementioned commands.

Verify that all components in the velero namespace are in a running/ready state.

Note: If the pods are stuck in the ImagePullBackOff state, then follow the steps from the troubleshooting section to fix the problem.

You should be able to see the configured backup location now

Install Velero in the Target Cluster

Installing Velero in the target cluster follows the same processes as in the source cluster. Connect to the target cluster and change to the proper context before running the velero install command.

Verify that the velero in the target cluster is installed and pods are in the running state.

And the target cluster is also able to see the configured backup location.

Test Backup and Restore

Now that you’ve installed Velero in both the source and target clusters, you can test whether a backup from the source cluster can be restored in the target cluster.

Step 1: Connect to the source cluster and perform the backup. For this demonstration, I am performing the backup of a namespace called acme.

# velero backup create acme-backup --include-namespaces acme

Backup request "acme-backup" submitted successfully.
Run `velero backup describe acme-backup` or `velero backup logs acme-backup` for more details.

# velero backup create acme-backup --include-namespaces acme

Backup request "acme-backup" submitted successfully.

Run `velero backup describe acme-backup` or `velero backup logs acme-backup` for more details.

On running the command velero backup describe acme-backup, you can fetch additional details about the backup.

Step 2: Connect to the target cluster and verify that you are able to see the backup.

Step 3: Perform the restore

Run the following command to initiate the restore.

# velero restore create acme-restore --from-backup acme-backup

Restore request "acme-restore" submitted successfully.
Run `velero restore describe acme-restore` or `velero restore logs acme-restore` for more details.

# velero restore create acme-restore --from-backup acme-backup

Restore request "acme-restore" submitted successfully.

Run `velero restore describe acme-restore` or `velero restore logs acme-restore` for more details.

On running the command velero restore describe <restore-name>, you can fetch additional details about the restore operation.

On triggering the restore command, verify that the backed-up namespace appears in the target cluster.

And all items in the namespace are in running state.

And that’s it for the backup and restore demo using Velero standalone.

Troubleshooting Tips

I had a problem where the Velero and Restic pods would not initialize and would remain in the ImagePullBackOff state.

On checking the events of the stuck pods, I found I was hitting the docker rate limit issue.

Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  2m20s                default-scheduler  Successfully assigned velero/restic-7nxs8 to mj-tkgs-wld02-default-nodepool-r4ffn-6ddffb6b4-hwt9c

  Normal   Pulling    32s (x4 over 2m19s)  kubelet            Pulling image "velero/velero:v1.6.2"

  Warning  Failed     25s (x4 over 2m13s)  kubelet            Failed to pull image "velero/velero:v1.6.2": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/velero/velero:v1.6.2": failed to copy: httpReaderSeeker: failed open: unexpected status code https://registry-1.docker.io/v2/velero/velero/manifests/sha256:c2d9fcaaa10dea1028e4249d4583aef86a7e4ba908675651608fc74bb1dbf4fd: 429 Too Many Requests - Server message: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit

Events:

Type Reason Age From Message

---- ------ ---- ---- -------

Normal Scheduled 2m20s default-scheduler Successfully assigned velero/restic-7nxs8 to mj-tkgs-wld02-default-nodepool-r4ffn-6ddffb6b4-hwt9c

Normal Pulling 32s (x4 over 2m19s) kubelet Pulling image "velero/velero:v1.6.2"

Warning Failed 25s (x4 over 2m13s) kubelet Failed to pull image "velero/velero:v1.6.2": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/velero/velero:v1.6.2": failed to copy: httpReaderSeeker: failed open: unexpected status code https://registry-1.docker.io/v2/velero/velero/manifests/sha256:c2d9fcaaa10dea1028e4249d4583aef86a7e4ba908675651608fc74bb1dbf4fd: 429 Too Many Requests - Server message: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit

To fix the problem, edit the velero deployment and restic app as shown below.

# kubectl edit deploy velero -n velero

# kubectl edit daemonset.apps/restic -n velero

# kubectl edit deploy velero -n velero

# kubectl edit daemonset.apps/restic -n velero

Search for the field image and replace velero/velero:v1.6.2 to projects.registry.vmware.com/velero/velero:v1.6.2.

Velero Cleanup

To cleanup velero installation, use the following commands

# velero restore delete acme-restore

# velero delete backups acme-backup

# velero delete backup-location default

# kubectl delete namespace/velero clusterrolebinding/velero

# kubectl delete crds -l component=velero

# kubectl delete CustomResourceDefinition/backups.velero.io

# velero restore delete acme-restore

# velero delete backups acme-backup

# velero delete backup-location default

# kubectl delete namespace/velero clusterrolebinding/velero

# kubectl delete crds -l component=velero

# kubectl delete CustomResourceDefinition/backups.velero.io

And that’s it for this post. I hope you enjoyed reading this post. Feel free to share this on social media if it is worth sharing.