GKE in-cluster DNS resolutions stopped working

kubernetes dns resolution slow
kube-dns not working
kubernetes restart coredns
kubernetes temporary failure in name resolution
minikube dns not working
coredns logs
coredns kubernetes
kube-dns pod not running

So this has been working forever. I have a few simple services running in GKE and they refer to each other via the standard service.namespace DNS names.

Today all DNS name resolution stopped working. I haven't changed anything, although this may have been triggered by a master upgrade.

/ambassador # nslookup ambassador-monitor.default
nslookup: can't resolve '(null)': Name does not resolve

nslookup: can't resolve 'ambassador-monitor.default': Try again


/ambassador # cat /etc/resolv.conf  
search default.svc.cluster.local svc.cluster.local cluster.local c.snowcloud-01.internal google.internal 
nameserver 10.207.0.10 
options ndots:5

Master version 1.14.7-gke.14

I can talk cross-service using their IP addresses, it's just DNS that's not working.

Not really sure what to do about this...

kube-dns is not working as expected on GKE Cluster 1.14.7-gke.14 , kube-dns is not working as expected on GKE Cluster 1.14.7-gke.14 #329 to test the DNS resolution I'm using a pod created with the following  As of 11:30 PM ET on 2019-11-17, it seems as though all DNS requests -- both for hosts inside the cluster (e.g. redis.myapp-dev) as well as hosts outside the cluster (e.g. myapp.mysql.database.azure.com) -- have stopped being resolved. If I SSH into a node in the cluster, DNS queries to outside hostnames like google.com will resolve.

It appears that I hit a bug that caused the gke-metadata server to start crash pooling (which in turn prevented kube-dns from working).

Creating a new pool with a previous version (1.14.7-gke.10) and migrating to it fixed everything.

I am told a fix has already been submitted.

Thank you for your suggestions.

Debugging DNS Resolution, For a general overview of how DNS is used in Kubernetes clusters, see DNS for Services by kube-dns, a cluster add-on that is deployed by default in all GKE clusters. Short names that are not fully qualified, like myservice , are completed first Learn how to provide scalable DNS resolution using NodeLocal DNSCache  Update on our story. It seems we were partially overload Kube-DNS plus there was a bug in the GKE provided DNS server. So updating our Kube-DNS configmap alleviated the issue, then the GKE team supplied a new DNS configuration for their managed services which likely would resolve the issue for us without making the kube-dns changes.

Start by debugging your kubernetes services [1]. This will tell you whether is a k8s resource issue or kubernetes itself is failing. Once you understand that, you can proceed to fix it. You can post results here if you want to follow up.

[1] https://kubernetes.io/docs/tasks/debug-application-cluster/debug-service/

Service discovery and DNS, I'm running into DNS issues on a GKE 1.10 kubernetes cluster. Occasionally nslookup: can't resolve '(null)': Name does not resolve /app $ cat Once the pod was restarted on the same node, the DNS resolution works fine. This issue has now been fixed and available for GKE 1.12.6-gke.7. Problem 5: Confused HPA. Next day after the upgrade we’ve noticed that our CPU based HorizontalPodAutoscaler started to get confused. HPA would stop reporting current CPU load, therefore triggering more and more scale up events until it hits the max ceiling.

Setting up NodeLocal DNSCache, svc.cluster.local . All services in the Kubernetes environment are resolvable by < serviceName> . DNS resolutions are done on a per service basis  Teams. Q&A for Work. Stack Overflow for Teams is a private, secure spot for you and your coworkers to find and share information.

Network connectivity/DNS issues on a GKE 1.10 kubernetes cluster , information for problems creating a DNS entry for your ingress. http(s) access to Kubeflow web services from outside the cluster. You can check the status of the operation by running: To recover from this failure, remove the failed node from the failover cluster using the SQL Server Setup program, address the hardware failure with the computer offline, bring the machine back up, and then add the repaired node back to the failover cluster instance.

Internal DNS Service in Kubernetes Environments, I saw the same behavior with DNS resolution for cluster DNS entries (Note that this is not the default dnsPolicy for pods generally; the default is ClusterFirst . Attempts to resolve cluster.local names do end up working, but it  Name Resolution. Name Resolution consists of one or possibly more NetBIOS or DNS queries to locate the IP address for the RPC Server. Troubleshooting this phase requires verifying that a response is received to the name resolution request and that the response contains the correct IP address for the RPC server.