扩容时RKE2的Apiserver所在pod没有实时更新hosts文件导致的api无法读取新节点日志

版本说明:

root@oc-master-01:~# rancherd --version
rancherd version v2.5.8 (HEAD)
go version go1.14.15

server节点变动:

添加新节点后,oc-master-01这个服务器的/etc/hosts多了几条记录。

问题描述:

在给集群扩容新节点,apiserver的发现无法获取到新节点的日志,检查是apiserver无法知道新节点的ip。

问题排查:

而进入apiserver pod的容器内查看,发现其/etc/hosts依旧是旧的配置,没变化。

其他说明:

apiserver是一个独立pod,部署在master节点oc-master-01
通过系统启动pod的配置文件启动,登录UI界面和命令行均无法修改其内容:
/opt/rancher/rke2/agent/pod-manifests/kube-apiserver.yaml

问题临时解决:

kubectl exec -it kube-apiserver-oc-master-01  -n kube-system -- /bin/bash
cat >>/etc/hosts<<EOF
10.66.231.114 oc-master-07
10.66.231.115 oc-master-08
10.66.231.116 oc-master-09
10.66.231.117 oc-master-10
EOF

安装配置说明:

ApiServer 节点配置服务安装文件
token: my-shared-secret
tls-san:
          - oc-master-01
kubelet-arg:
  - "root-dir=/opt/rancher/kubelet" 
  - "max-pods=500"
data-dir: "/opt/rancher/rke2" 
disable: rke2-ingress-nginx
Node节点配置服务安装文件

/etc/rancher/rke2/config.yaml

server: https://oc-master-01:9345
token: my-shared-secret
kubelet-arg:
  - "root-dir=/opt/rancher/kubelet" 
  - "max-pods=500"
data-dir: "/opt/rancher/rke2"

安装说明:

  • rancherd-server
    {
    rm -f /usr/local/lib/systemd/system/rancherd-agent.service
    systemctl daemon-reload
    systemctl enable rancherd-server.service
    systemctl start rancherd-server.service
    }
    
  • rancherd-agent
    {
    rm -f /usr/local/lib/systemd/system/rancherd-server.service
    systemctl daemon-reload
    systemctl enable rancherd-agent.service
    systemctl start rancherd-agent.service
    }
    
已邀请:

已解决,把hosts挂载进容器即可。
例如:

root@oc-master-01:/opt/rancher/rke2/agent/pod-manifests# cat kube-scheduler.yaml 
apiVersion: v1
kind: Pod
metadata:
labels:
component: kube-scheduler
tier: control-plane
name: kube-scheduler
namespace: kube-system
spec:
containers:
- command:
- kube-scheduler
- --address=127.0.0.1
- --bind-address=127.0.0.1
- --kubeconfig=/opt/rancher/rke2/server/cred/scheduler.kubeconfig
- --port=10251
- --profiling=false
- --secure-port=0
env:
- name: FILE_HASH
value: 50c09a0096dc5c5bddedbdff4bd8972de5f27f8f9e5ab0caa4d961884c53a53a
- name: NO_PROXY
value: .svc,.cluster.local,10.42.0.0/16,10.43.0.0/16
image: index.docker.io/rancher/hardened-kubernetes:v1.20.5-rke2r1
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /healthz
port: 10251
scheme: HTTP
initialDelaySeconds: 15
timeoutSeconds: 15
name: kube-scheduler
resources:
requests:
cpu: 100m
volumeMounts:
- mountPath: /opt/rancher/rke2/server/db/etcd/name
name: file0
readOnly: true
- mountPath: /opt/rancher/rke2/server/cred/scheduler.kubeconfig
name: file1
readOnly: true
- mountPath: /opt/rancher/rke2/server/tls/client-scheduler.crt
name: file2
readOnly: true
- mountPath: /opt/rancher/rke2/server/tls/client-scheduler.key
name: file3
readOnly: true
- mountPath: /opt/rancher/rke2/server/tls/server-ca.crt
name: file4
readOnly: true
- mountPath: /etc/hosts
name: file5
readOnly: true
hostNetwork: true
priorityClassName: system-cluster-critical
volumes:
- hostPath:
path: /opt/rancher/rke2/server/db/etcd/name
type: File
name: file0
- hostPath:
path: /opt/rancher/rke2/server/cred/scheduler.kubeconfig
type: File
name: file1
- hostPath:
path: /opt/rancher/rke2/server/tls/client-scheduler.crt
type: File
name: file2
- hostPath:
path: /opt/rancher/rke2/server/tls/client-scheduler.key
type: File
name: file3
- hostPath:
path: /opt/rancher/rke2/server/tls/server-ca.crt
type: File
name: file4
- hostPath:
path: /etc/hosts
type: File
name: file5

要回复问题请先登录注册