#kubernetes 如何调试k8s集群启动失败的应用
任何应用的开发过程,都不总会一帆风顺,那么,怎么调试,就是一个很重要的问题。
对于k8s集群,即使是按照文档一步步去部署一个很成熟的服务时,依然可能会出现各种各样的错误。
例如,最近在按照文档部署elk时,就出现过各种各样的问题。本文以此为例,示范如何进行调试,查找错误原因。
1. 查看pod列表
kubectl get pods -n <namespace>
示例
kubectl get pod -n k8s-logging
NAME READY STATUS RESTARTS AGE
es-logging-es-default-0 0/1 Init:CrashLoopBackOff 7 14m
2. 查看pod的详细信息
可以用以下命令查看失败状态的pod的详细信息:
kubectl describe pod <pod name> -n <namespace>
该命令会输出pod的Events列表,可以看到该pod运行过程中的相关事件。
有时候,这个Events列表中就已经包含了详细的错误原因。
示例
kubectl describe pod es-logging-es-default-0 -n k8s-logging
Name: es-logging-es-default-0
Namespace: k8s-logging
Priority: 0
Node: k8s-node-02/192.168.1.15
Start Time: Fri, 14 May 2021 02:35:49 +0000
Labels: common.k8s.elastic.co/type=elasticsearch
controller-revision-hash=es-logging-es-default-7ffcbbf5
elasticsearch.k8s.elastic.co/cluster-name=es-logging
elasticsearch.k8s.elastic.co/config-hash=1754400308
elasticsearch.k8s.elastic.co/http-scheme=https
elasticsearch.k8s.elastic.co/node-data=true
elasticsearch.k8s.elastic.co/node-ingest=true
elasticsearch.k8s.elastic.co/node-master=true
elasticsearch.k8s.elastic.co/node-ml=true
elasticsearch.k8s.elastic.co/node-remote_cluster_client=true
elasticsearch.k8s.elastic.co/node-transform=true
elasticsearch.k8s.elastic.co/node-voting_only=false
elasticsearch.k8s.elastic.co/statefulset-name=es-logging-es-default
elasticsearch.k8s.elastic.co/version=7.12.1
statefulset.kubernetes.io/pod-name=es-logging-es-default-0
Annotations: co.elastic.logs/module: elasticsearch
update.k8s.elastic.co/timestamp: 2021-05-14T02:35:52.442769569Z
Status: Pending
IP: 10.244.2.114
IPs:
IP: 10.244.2.114
Controlled By: StatefulSet/es-logging-es-default
Init Containers:
elastic-internal-init-filesystem:
Container ID: docker://ef9155f4c23f12cc95baf4eb56256497ef6495355c0c2a87adfd4c8973686855
Image: docker.elastic.co/elasticsearch/elasticsearch:7.12.1
Image ID: docker-pullable://docker.elastic.co/elasticsearch/elasticsearch@sha256:561bf27aa989803bfbac48ebd48e32daadb4215cf7940c599a62c13f225427fa
Port: <none>
Host Port: <none>
Command:
bash
-c
/mnt/elastic-internal/scripts/prepare-fs.sh
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Fri, 14 May 2021 02:56:55 +0000
Finished: Fri, 14 May 2021 02:56:55 +0000
Ready: False
Restart Count: 9
Limits:
cpu: 100m
memory: 50Mi
Requests:
cpu: 100m
memory: 50Mi
Environment:
POD_IP: (v1:status.podIP)
POD_NAME: es-logging-es-default-0 (v1:metadata.name)
NODE_NAME: (v1:spec.nodeName)
NAMESPACE: k8s-logging (v1:metadata.namespace)
HEADLESS_SERVICE_NAME: es-logging-es-default
Mounts:
/mnt/elastic-internal/downward-api from downward-api (ro)
/mnt/elastic-internal/elasticsearch-bin-local from elastic-internal-elasticsearch-bin-local (rw)
/mnt/elastic-internal/elasticsearch-config from elastic-internal-elasticsearch-config (ro)
/mnt/elastic-internal/elasticsearch-config-local from elastic-internal-elasticsearch-config-local (rw)
/mnt/elastic-internal/elasticsearch-plugins-local from elastic-internal-elasticsearch-plugins-local (rw)
/mnt/elastic-internal/probe-user from elastic-internal-probe-user (ro)
/mnt/elastic-internal/scripts from elastic-internal-scripts (ro)
/mnt/elastic-internal/transport-certificates from elastic-internal-transport-certificates (ro)
/mnt/elastic-internal/unicast-hosts from elastic-internal-unicast-hosts (ro)
/mnt/elastic-internal/xpack-file-realm from elastic-internal-xpack-file-realm (ro)
/usr/share/elasticsearch/config/http-certs from elastic-internal-http-certificates (ro)
/usr/share/elasticsearch/config/transport-remote-certs/ from elastic-internal-remote-certificate-authorities (ro)
/usr/share/elasticsearch/data from elasticsearch-data (rw)
/usr/share/elasticsearch/logs from elasticsearch-logs (rw)
Containers:
elasticsearch:
Container ID:
Image: docker.elastic.co/elasticsearch/elasticsearch:7.12.1
Image ID:
Ports: 9200/TCP, 9300/TCP
Host Ports: 0/TCP, 0/TCP
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Limits:
memory: 2Gi
Requests:
memory: 2Gi
Readiness: exec [bash -c /mnt/elastic-internal/scripts/readiness-probe-script.sh] delay=10s timeout=5s period=5s #success=1 #failure=3
Environment:
POD_IP: (v1:status.podIP)
POD_NAME: es-logging-es-default-0 (v1:metadata.name)
NODE_NAME: (v1:spec.nodeName)
NAMESPACE: k8s-logging (v1:metadata.namespace)
PROBE_PASSWORD_PATH: /mnt/elastic-internal/probe-user/elastic-internal-probe
PROBE_USERNAME: elastic-internal-probe
READINESS_PROBE_PROTOCOL: https
HEADLESS_SERVICE_NAME: es-logging-es-default
NSS_SDB_USE_CACHE: no
Mounts:
/mnt/elastic-internal/downward-api from downward-api (ro)
/mnt/elastic-internal/elasticsearch-config from elastic-internal-elasticsearch-config (ro)
/mnt/elastic-internal/probe-user from elastic-internal-probe-user (ro)
/mnt/elastic-internal/scripts from elastic-internal-scripts (ro)
/mnt/elastic-internal/unicast-hosts from elastic-internal-unicast-hosts (ro)
/mnt/elastic-internal/xpack-file-realm from elastic-internal-xpack-file-realm (ro)
/usr/share/elasticsearch/bin from elastic-internal-elasticsearch-bin-local (rw)
/usr/share/elasticsearch/config from elastic-internal-elasticsearch-config-local (rw)
/usr/share/elasticsearch/config/http-certs from elastic-internal-http-certificates (ro)
/usr/share/elasticsearch/config/transport-certs from elastic-internal-transport-certificates (ro)
/usr/share/elasticsearch/config/transport-remote-certs/ from elastic-internal-remote-certificate-authorities (ro)
/usr/share/elasticsearch/data from elasticsearch-data (rw)
/usr/share/elasticsearch/logs from elasticsearch-logs (rw)
/usr/share/elasticsearch/plugins from elastic-internal-elasticsearch-plugins-local (rw)
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
elasticsearch-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: elasticsearch-data-es-logging-es-default-0
ReadOnly: false
downward-api:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.labels -> labels
elastic-internal-elasticsearch-bin-local:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
elastic-internal-elasticsearch-config:
Type: Secret (a volume populated by a Secret)
SecretName: es-logging-es-default-es-config
Optional: false
elastic-internal-elasticsearch-config-local:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
elastic-internal-elasticsearch-plugins-local:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
elastic-internal-http-certificates:
Type: Secret (a volume populated by a Secret)
SecretName: es-logging-es-http-certs-internal
Optional: false
elastic-internal-probe-user:
Type: Secret (a volume populated by a Secret)
SecretName: es-logging-es-internal-users
Optional: false
elastic-internal-remote-certificate-authorities:
Type: Secret (a volume populated by a Secret)
SecretName: es-logging-es-remote-ca
Optional: false
elastic-internal-scripts:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: es-logging-es-scripts
Optional: false
elastic-internal-transport-certificates:
Type: Secret (a volume populated by a Secret)
SecretName: es-logging-es-default-es-transport-certs
Optional: false
elastic-internal-unicast-hosts:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: es-logging-es-unicast-hosts
Optional: false
elastic-internal-xpack-file-realm:
Type: Secret (a volume populated by a Secret)
SecretName: es-logging-es-xpack-file-realm
Optional: false
elasticsearch-logs:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 24m default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.
Normal Scheduled 23m default-scheduler Successfully assigned k8s-logging/es-logging-es-default-0 to k8s-node-02
Normal Pulled 22m (x5 over 23m) kubelet Container image "docker.elastic.co/elasticsearch/elasticsearch:7.12.1" already present on machine
Normal Created 22m (x5 over 23m) kubelet Created container elastic-internal-init-filesystem
Normal Started 22m (x5 over 23m) kubelet Started container elastic-internal-init-filesystem
Warning BackOff 3m57s (x92 over 23m) kubelet Back-off restarting failed container
从示例可以看到,该pod最后的错误警告是:
Warning BackOff 3m57s (x92 over 23m) kubelet Back-off restarting failed container
很不幸,Events事件列表中,只包含的比较简单的信息:容器启动失败。
3. 查看pod日志
当事件列表中找不到详细错误时,需要查看pod的详细日志来定位:
kubectl logs <pod name> -n <namespace>
示例
kubectl logs -n k8s-logging es-logging-es-default-0
Error from server (BadRequest): container "elasticsearch" in pod "es-logging-es-default-0" is waiting to start: PodInitializing
表示该pod的默认container是elasticsearch,而它还没有初始化成功,所以没有运行日志。
说明出错的不是默认的container。
4. 查看对应container的日志
这时候需要查看特定contaienr的日志:
kubectl logs <pod name> -n <namespace> -c <container>
示例
从刚刚pod的详情里,找到Init Containers的列表。
示例中,Init Containers只有一个elastic-internal-init-filesystem,这个信息与Events列表中也是一致的。
kubectl logs -c elastic-internal-init-filesystem -n k8s-logging es-logging-es-default-0
# 以下是输出日志
Starting init script
Linking /mnt/elastic-internal/xpack-file-realm/users to /usr/share/elasticsearch/config/users
Linking /mnt/elastic-internal/xpack-file-realm/roles.yml to /usr/share/elasticsearch/config/roles.yml
Linking /mnt/elastic-internal/xpack-file-realm/users_roles to /usr/share/elasticsearch/config/users_roles
Linking /mnt/elastic-internal/elasticsearch-config/elasticsearch.yml to /usr/share/elasticsearch/config/elasticsearch.yml
Linking /mnt/elastic-internal/unicast-hosts/unicast_hosts.txt to /usr/share/elasticsearch/config/unicast_hosts.txt
File linking duration: 0 sec.
Copying /usr/share/elasticsearch/config/* to /mnt/elastic-internal/elasticsearch-config-local/
removed '/mnt/elastic-internal/elasticsearch-config-local/elasticsearch.yml'
'/usr/share/elasticsearch/config/elasticsearch.yml' -> '/mnt/elastic-internal/elasticsearch-config-local/elasticsearch.yml'
'/usr/share/elasticsearch/config/http-certs/..2021_05_14_02_35_50.501721750/ca.crt' -> '/mnt/elastic-internal/elasticsearch-config-local/http-certs/..2021_05_14_02_35_50.501721750/ca.crt'
'/usr/share/elasticsearch/config/http-certs/..2021_05_14_02_35_50.501721750/tls.crt' -> '/mnt/elastic-internal/elasticsearch-config-local/http-certs/..2021_05_14_02_35_50.501721750/tls.crt'
'/usr/share/elasticsearch/config/http-certs/..2021_05_14_02_35_50.501721750/tls.key' -> '/mnt/elastic-internal/elasticsearch-config-local/http-certs/..2021_05_14_02_35_50.501721750/tls.key'
removed '/mnt/elastic-internal/elasticsearch-config-local/http-certs/ca.crt'
'/usr/share/elasticsearch/config/http-certs/ca.crt' -> '/mnt/elastic-internal/elasticsearch-config-local/http-certs/ca.crt'
removed '/mnt/elastic-internal/elasticsearch-config-local/http-certs/tls.crt'
'/usr/share/elasticsearch/config/http-certs/tls.crt' -> '/mnt/elastic-internal/elasticsearch-config-local/http-certs/tls.crt'
removed '/mnt/elastic-internal/elasticsearch-config-local/http-certs/tls.key'
'/usr/share/elasticsearch/config/http-certs/tls.key' -> '/mnt/elastic-internal/elasticsearch-config-local/http-certs/tls.key'
removed '/mnt/elastic-internal/elasticsearch-config-local/http-certs/..data'
'/usr/share/elasticsearch/config/http-certs/..data' -> '/mnt/elastic-internal/elasticsearch-config-local/http-certs/..data'
'/usr/share/elasticsearch/config/jvm.options' -> '/mnt/elastic-internal/elasticsearch-config-local/jvm.options'
'/usr/share/elasticsearch/config/log4j2.file.properties' -> '/mnt/elastic-internal/elasticsearch-config-local/log4j2.file.properties'
'/usr/share/elasticsearch/config/log4j2.properties' -> '/mnt/elastic-internal/elasticsearch-config-local/log4j2.properties'
'/usr/share/elasticsearch/config/role_mapping.yml' -> '/mnt/elastic-internal/elasticsearch-config-local/role_mapping.yml'
removed '/mnt/elastic-internal/elasticsearch-config-local/roles.yml'
'/usr/share/elasticsearch/config/roles.yml' -> '/mnt/elastic-internal/elasticsearch-config-local/roles.yml'
'/usr/share/elasticsearch/config/transport-remote-certs/..2021_05_14_02_35_50.623420157/ca.crt' -> '/mnt/elastic-internal/elasticsearch-config-local/transport-remote-certs/..2021_05_14_02_35_50.623420157/ca.crt'
removed '/mnt/elastic-internal/elasticsearch-config-local/transport-remote-certs/ca.crt'
'/usr/share/elasticsearch/config/transport-remote-certs/ca.crt' -> '/mnt/elastic-internal/elasticsearch-config-local/transport-remote-certs/ca.crt'
removed '/mnt/elastic-internal/elasticsearch-config-local/transport-remote-certs/..data'
'/usr/share/elasticsearch/config/transport-remote-certs/..data' -> '/mnt/elastic-internal/elasticsearch-config-local/transport-remote-certs/..data'
removed '/mnt/elastic-internal/elasticsearch-config-local/unicast_hosts.txt'
'/usr/share/elasticsearch/config/unicast_hosts.txt' -> '/mnt/elastic-internal/elasticsearch-config-local/unicast_hosts.txt'
removed '/mnt/elastic-internal/elasticsearch-config-local/users'
'/usr/share/elasticsearch/config/users' -> '/mnt/elastic-internal/elasticsearch-config-local/users'
removed '/mnt/elastic-internal/elasticsearch-config-local/users_roles'
'/usr/share/elasticsearch/config/users_roles' -> '/mnt/elastic-internal/elasticsearch-config-local/users_roles'
Empty dir /usr/share/elasticsearch/plugins
Copying /usr/share/elasticsearch/bin/* to /mnt/elastic-internal/elasticsearch-bin-local/
'/usr/share/elasticsearch/bin/elasticsearch' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch'
'/usr/share/elasticsearch/bin/elasticsearch-certgen' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-certgen'
'/usr/share/elasticsearch/bin/elasticsearch-certutil' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-certutil'
'/usr/share/elasticsearch/bin/elasticsearch-cli' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-cli'
'/usr/share/elasticsearch/bin/elasticsearch-croneval' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-croneval'
'/usr/share/elasticsearch/bin/elasticsearch-env' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-env'
'/usr/share/elasticsearch/bin/elasticsearch-env-from-file' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-env-from-file'
'/usr/share/elasticsearch/bin/elasticsearch-keystore' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-keystore'
'/usr/share/elasticsearch/bin/elasticsearch-migrate' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-migrate'
'/usr/share/elasticsearch/bin/elasticsearch-node' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-node'
'/usr/share/elasticsearch/bin/elasticsearch-plugin' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-plugin'
'/usr/share/elasticsearch/bin/elasticsearch-saml-metadata' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-saml-metadata'
'/usr/share/elasticsearch/bin/elasticsearch-setup-passwords' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-setup-passwords'
'/usr/share/elasticsearch/bin/elasticsearch-shard' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-shard'
'/usr/share/elasticsearch/bin/elasticsearch-sql-cli' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-sql-cli'
'/usr/share/elasticsearch/bin/elasticsearch-sql-cli-7.12.1.jar' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-sql-cli-7.12.1.jar'
'/usr/share/elasticsearch/bin/elasticsearch-syskeygen' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-syskeygen'
'/usr/share/elasticsearch/bin/elasticsearch-users' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-users'
'/usr/share/elasticsearch/bin/x-pack-env' -> '/mnt/elastic-internal/elasticsearch-bin-local/x-pack-env'
'/usr/share/elasticsearch/bin/x-pack-security-env' -> '/mnt/elastic-internal/elasticsearch-bin-local/x-pack-security-env'
'/usr/share/elasticsearch/bin/x-pack-watcher-env' -> '/mnt/elastic-internal/elasticsearch-bin-local/x-pack-watcher-env'
Files copy duration: 0 sec.
chowning /usr/share/elasticsearch/data to elasticsearch:elasticsearch
chown: changing ownership of '/usr/share/elasticsearch/data': Operation not permitted
failed to change ownership of '/usr/share/elasticsearch/data' from 1024:users to elasticsearch:elasticsearch
在日志的最后,可以看到出错原因:
failed to change ownership of '/usr/share/elasticsearch/data' from 1024:users to elasticsearch:elasticsearch
根据对应的错误,查找原因即可。
更多调试方法
https://kubernetes.io/zh/docs/tasks/debug-application-cluster/debug-running-pod/