A couple of weeks in the past, I spent a while with our assist and engineering groups serving to a buyer resolve an issue that occurred after they enabled Group Managed Service Accounts (gMSA) on Azure Kubernetes Service (AKS).
I made a decision to write down this weblog so different clients with the identical subject can keep away from going via it altogether. I’m writing the weblog within the sequence as I skilled it, however in the event you’re simply on the lookout for the answer, be at liberty to skip to the tip.
When that buyer enabled gMSA on their cluster, just a few issues began to occur:
- Any gMSA enabled deployment/container/pod entered a failed state. The occasions from the deployments would present the pods with the next error: Occasion Element: Did not setup the exterior credentials for Container ‘
‘: The RPC server is unavailable. - Any non-gMSA deployment/container/pod utilizing the client’s personal pictures and operating on Home windows nodes additionally entered a failed state. The deployments had been exhibiting an occasion of ErrImagePull.
- All different deployments/containers/pods each on Home windows and Linux nodes that weren’t utilizing personal pictures saved their wholesome state.
Eradicating the gMSA configuration from the cluster would mechanically revert to a wholesome state for your complete cluster.
The error with the gMSA pods took me instantly to different instances during which I’ve seen clients having comparable points due to community connectivity. The most typical gMSA points I’ve seen up to now are:
- Blocked ports: Having a firewall between your AKS cluster and the Energetic Listing (AD) Area Controllers (DCs). AD makes use of a number of protocols for communication between purchasers and DCs. I even created a easy script that validates the ports.
- Incorrect DNS configuration: AD makes use of DNS for service discovery. Area Controllers have a “SRV” entry within the DNS that purchasers question to allow them to discover not solely all DCs, however the closest one. If both the nodes or pods can’t resolve the area fqdn to a DC, gMSA received’t work.
- Incorrect secret on Azure Key Vault (AKV): A person account is utilized by the Window nodes, quite than a pc account because the nodes are usually not domain-joined. The format of the key ought to be
: .
There are different minor points that I’ve seen, however these are the principle ones. Within the case of this clients, we reviewed the above and every part gave the impression to be configured correctly.
At that time, I introduced other people they usually caught on one thing that I knew existed, however had not seen utilizing gMSA but: AKS personal clusters.
This buyer has a safety coverage in-place that mandates Azure sources ought to be utilizing personal endpoints every time potential. That was true for the AKS cluster and due to this fact, it launched a conduct that broke the cluster.
I discussed above that gMSA makes use of DNS for DC discovering. Let me clarify what the default config is and what occurred after enabling gMSA:
By default, Linux and Home windows nodes on AKS will use the Azure vNet DNS server for DNS queries. Home windows and Linux pods will use CoreDNS for DNS queries. Azure DNS can’t resolve AD area FQDNs since these are usually personal to on-premises or personal cloud networks.
For that motive, while you allow gMSA and move the parameter of DNS server for use, two issues are modified within the AKS cluster. First, the Home windows nodes will begin utilizing the DNS server offered. Second, the CoreDNS setting is modified so as to add a forwarder. This forwards something associated to the area FQDN to the desired DNS server. With these two configs, Home windows nodes and Home windows pods can now “discover” the DCs.
Azure Portal exhibiting the CoreDNS configuration with a DNS forwarder after gMSA has been configured.
Nevertheless, this introduces one other subject when mixed with a personal AKS cluster. Personal endpoints are behind a personal DNS zone. Azure DNS servers can resolve for these zones, however non-Azure DNS servers can’t. Since now the Home windows nodes and Home windows pods are utilizing a DNS server exterior of Azure, the personal zone of the AKS cluster can’t be resolved so the DCs can’t entry the Home windows nodes and Home windows pods.
Not solely that, however this buyer additionally had their Azure Container Registry (ACR) behind a personal endpoint. The second symptom above was additionally brought on by this configuration, as now the Home windows nodes can’t resolve for the personal zone of the ACR registry and consequently can’t pull their personal pictures.
For reference, these are the container associated providers and their personal zones:
Personal hyperlink useful resource kind |
Subresource |
Personal DNS zone identify |
Public DNS zone forwarders |
Azure Kubernetes Service – Kubernetes API (Microsoft.ContainerService/managedClusters) |
administration |
privatelink.{regionName}.azmk8s.io |
{regionName}.azmk8s.io |
Azure Container Apps (Microsoft.App/ManagedEnvironments) |
managedEnvironments |
privatelink.{regionName}.azurecontainerapps.io |
azurecontainerapps.io |
Azure Container Registry (Microsoft.ContainerRegistry/registries) |
registry |
privatelink.azurecr.io |
azurecr.io |
For a full record of zones, take a look at the Azure documentation.
The answer right here is straightforward. For the non-Azure DNS servers to resolve Personal Endpoint zones, a DNS forwarder might be created.
This buyer had a really particular implementation, however generally what it’s essential configure is a DNS forwarder to the zones associated to the providers you’re utilizing. For instance:
– AKS clusters: Create a forwarder of azmk8s.io to 168.63.129.16.
– For ACR registries: Create a forwarder of azurecr.io to 168.63.129.16.
168.63.129.16. is the digital IP tackle of the Azure platform that serves because the communication channel to the platform sources. One among its providers is DNS. In actual fact, that is the unique service that the Home windows nodes and Home windows pods had been utilizing earlier than gMSA was enabled.
It’s at all times DNS!
In case you are utilizing gMSA on AKS, understand that Home windows nodes and Home windows pods will begin utilizing a DNS server exterior of Azure (or that has no visibility into the Azure platform instantly, comparable to Personal Endpoint zones). You may must configure DNS forwarders when you begin utilizing gMSA on AKS, though this will probably be true for any service.
I hope this weblog put up helps you keep away from this subject – or helps you troubleshoot it. Tell us within the feedback!