NSX ALB Upgrade Breaking AKO Integration

Recently I upgraded NSX ALB from 20.1.4 to 20.1.5 in my lab and observed weird things whenever I attempted to deploy/delete any Kubernetes workload of type LoadBalancer.

The Issue

On deploying a new K8 application, AKO was unable to create a load balancer for the application. In NSX ALB UI, I can see that a pool has been created and a VIP assigned but no VS is present. I have also verified that the ‘ako-essential’ role has the necessary permission “PERMISSION_VIRTUALSERIVCE”  to create any new VS.

On attempting to delete a K8 application, the application got deleted from the TKG side, but it left lingering items (VS, Pools, etc) in the ALB UI. To investigate more on the issue, I manually tried deleting the server pool and captured the output using the browser network inspect option. 

As expected the delete operation failed with the error that the object that you are trying to delete is associated with ‘L4PolicySet’

But the l4policyset was empty

Investigation

On checking the portal_exception.log on the ALB controller, I found the root cause. The error was loud and clear ‘

Log Trace

On checking with the ALB engineering team, found out that is a known bug and the issue is resolved in ALB 20.1.6. To conclude, the issue is that after the ALB upgrade, permission ‘PERMISSION_L4POLICYSET’ is getting wiped out from the ako user role which gets created when you instantiate any new workload cluster. 

The Fix

Manually add PERMISSION_L4POLICYSET to ako user role via Controller CLI.

1: Connect to the ALB controller over SSH and obtain the shell access


2: Find AKO user role

3: Configure AKO user role to add necessary permission

After applying the fix, I tried to create/delete the K8 load balancer application again and this time things worked like a charm.

And that’s it for this post. I hope you enjoyed reading this post. Feel free to share this on social media if it is worth sharing.

Leave a Reply