创建Deployment后,无法创建Pod问题处理

学向勤中得,萤窗万卷书。这篇文章主要讲述创建Deployment后,无法创建Pod问题处理相关的知识,希望能为你提供帮助。


问题描述我在二进制安装的kubernetes集群部署Ingress服务过程中,使用yaml文件apply创建对应资源Deployment过程中,一直无法看到Pod被创建出来。

[root@two-master Install]# kubectl apply -f ingress-nginx-controller.yaml
deployment.apps/default-http-backend created
service/default-http-backend created
serviceaccount/nginx-ingress-serviceaccount created
clusterrole.rbac.authorization.k8s.io/nginx-ingress-clusterrole created
role.rbac.authorization.k8s.io/nginx-ingress-role created
clusterrolebinding.rbac.authorization.k8s.io/nginx-ingress-clusterrole-nisa-binding created
deployment.apps/nginx-ingress-controller created




[root@two-master Install]# kubectl -n kube-system getpods




处理过程查看Deployment的详细信息
[root@two-master Install]# kubectl -n kube-system describedeployments.apps nginx-ingress-controller


没有得到有价值的信息。


查看kube-controller-manager服务状态
[root@two-master Install]# systemctl status kube-controller-manager.service
......
6月 25 18:54:54 two-master kube-controller-manager[759]: E0625 18:54:54.563462759 leaderelection.go:325] error retrieving resource lo...nager)


可以看出kube-controller-manager服务状态确实有问题。
  • 继续查看kube-controller-manager服务详细的报错信息
[root@two-master Install]# systemctl status kube-controller-manager.service> kube-controller.log
[root@two-master Install]# vim + kube-controller.log#导出到日志文件中,方便查看
known reason (get leases.coordination.k8s.io kube-controller-manager)
6月 25 19:10:25 two-master kube-controller-manager[759]: E0625 19:10:25.198986759 leaderelection.go:325]
error retrieving resource lock kube-system/kube-controller-manager:
the server rejected our request for an unknown reason (get leases.coordination.k8s.io kube-controller-manager)


得出有效的报错信息:
known reason (get leases.coordination.k8s.io kube-controller-manager)
6月 25 19:10:25 two-master kube-controller-manager[759]: E0625 19:10:25.198986759 leaderelection.go:325] error retrieving resource lock kube-system/kube-controller-manager: the server rejected our request for an unknown reason (get leases.coordination.k8s.io kube-controller-manager)



百度翻译下:
【创建Deployment后,无法创建Pod问题处理】
意思:
  检索资源锁kube-system/kube-controller-manager出错:服务器因未知原因拒绝了我们的请求(get leaves.coordination.k8s.io kube控制器管理器)


查看etcd集群状态和告警信息
  复制报错信息到百度搜索,有说是etcd集群问题,于是查看我的etcd是否正常。
我的kubernetes就一个master、2个node。


  • 查看etcd状态是否正常
[root@two-master Install]# etcdctl endpoint health--endpoints=https://192.168.2.70:2379 \\
> --write-out=table\\
> --cacert=/etc/kubernetes/pki/etcd/ca.pem\\
> --cert=/etc/kubernetes/pki/etcd/etcd.pem \\
> --key=/etc/kubernetes/pki/etcd/etcd-key.pem
+---------------------------+--------+------------+-------+
|ENDPOINT| HEALTH |TOOK| ERROR |
+---------------------------+--------+------------+-------+
| https://192.168.2.70:2379 |true | 4.965087ms ||
+---------------------------+--------+------------+-------+#正常,没有报错

  • 查看etcd告警信息
[root@two-master Install]# etcdctl --endpoints192.168.2.70:2379alarm list \\
> --cacert=/etc/kubernetes/pki/etcd/ca.pem \\
> --cert=/etc/kubernetes/pki/etcd/etcd.pem\\
> --key=/etc/kubernetes/pki/etcd/etcd-key.pem


无告警信息,etcd正常,问题的原因还没找到。


重启kube-controller-manager服务
试试重启大法。
                                                                                         
[root@two-master ~]# systemctl restartkube-controller-manager.service
[root@two-master ~]# systemctl status kube-controller-manager.service


服务还是异常。


查看资源锁
leases是轻量级的资源锁,用于代替老版本的configmap和endpoints,我们使用kubectl get lease kube-controller-manager -n kube-system -o yaml命令可以看到以下的yaml。
$ kubectl get lease kube-controller-manager -n kube-system -o yaml
apiVersion: coordination.k8s.io/v1
kind: Lease
metadata:
creationTimestamp: "2022-06-08T07:52:17Z"
managedFields:
- apiVersion: coordination.k8s.io/v1
fieldsType: FieldsV1
fieldsV1:
f:spec:
f:acquireTime:
f:holderIdentity:
f:leaseDurationSeconds:
f:leaseTransitions:
f:renewTime:
manager: kube-controller-manager
operation: Update
time: "2022-06-08T07:52:17Z"
name: kube-controller-manager
namespace: kube-system
resourceVersion: "977951"
uid: 758e5b3d-422f-4254-9839-3581f532b7e5
spec:
acquireTime: "2022-06-24T02:08:11.905250Z"
holderIdentity: two-master_f1deccfa-7a21-4b6c-97b6-611eaaff083c
leaseDurationSeconds: 15
leaseTransitions: 7
renewTime: "2022-06-24T03:01:34.576989Z"



该资源记录了哪个实例持有了该资源,更新任期的时间,获得锁的时间等等信息。
  • LeaseDuration:持有锁的时间,表示该任期的持续时间,在这时间内其他LeaderElector客户端无法获取leader职位,即便当前的leader无法正常工作。
  • RenewDeadline:更新锁的持有时间,该字段仅对leader生效,用于刷新leaseDuration延续其任期的时间,放弃之前重试刷新领导的时间。RenewDeadline必须小于LeaseDuration。
  • RetryPeriod:重试时间,每个LeaderElector客户端的重试时间,用于尝试成为leader。
  • Callbacks:每次成为leader或者失去leader时回调函数。


请求资源锁
[root@two-master ~]# curl -X GEThttps://192.168.2.70:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager -k

"kind": "Status",
"apiVersion": "v1",
"metadata":

,
"status": "Failure",
"message": "Unauthorized",
"reason": "Unauthorized",
"code": 401#返回401异常,
[root@two-master ~]#



补充:
后来排查发现在/etc/kubernetes/kube-controller-manager.conf文件中多添加了到kube-apiserver的参数,删除该参数重启服务后正常了。








    推荐阅读