Recently, while deploying vDefend SSP in my nested lab, I encountered an issue in which the SSP platform became unstable as soon as I activated platform services.
When platform services (Security Intelligence/Rule Analysis, etc.) are activated, SSP creates new pods. If, at that moment, the CPU on the worker nodes is stuck due to resource constraints on the physical host, the overall platform health degrades. The pods that make up the core service are restarted frequently, and they never come back.
You can use the SSPi diagnostic tool to have visibility into problematic worker nodes and the namespace/pods.
To view pod information, SSH to the SSPI VM and list the pods.
The SSP UI won’t let you login and throws weird errors. An example is shown below
If you attempt to query the platform status, you will see gateway timeout error.
The root cause of this problem was that each worker node is deployed with 16 vCPU, and a minimum of 4 worker nodes is deployed.… Read the rest





