K8S新增节点后POD间网段不同

Published on 2019 - 12 - 26

故障情况

起了一个POD,replicas为3,但是当kubectl get pods -n rich的时候 发现三个pod之前竟然不在同一网段,并且不能通讯

# 发现web-rich-0 是172网段,并且与其他两个相同网段的pod不通,当然 其他两个POD之间是可以通讯的
[root@kubernetes-master01 rich]# kubectl get pods -n sarich -o wide
NAME           READY   STATUS    RESTARTS   AGE   IP            NODE     NOMINATED NODE   READINESS GATES
web-rich-0   1/1     Running   0          15m   172.17.0.5    node06   <none>           <none>
web-rich-1   1/1     Running   0          15m   10.244.5.6    node02   <none>           <none>
web-rich-2   1/1     Running   0          15m   10.244.57.6   node01   <none>           <none>

其他故障信息

发现主要是在node5 和 node6中的POD 竟然都是172网段

排查过程

  1. 登录到有问题的节点中,查看路由表,发现docker0网卡居然是172网段的
    [root@kubernetes-master01 ~]# route
    Kernel IP routing table
    Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
    default         gateway         0.0.0.0         UG    0      0        0 eth0
    10.244.5.0      10.244.5.0      255.255.255.0   UG    0      0        0 flannel.1
    10.244.24.0     10.244.24.0     255.255.255.0   UG    0      0        0 flannel.1
    10.244.30.0     10.244.30.0     255.255.255.0   UG    0      0        0 flannel.1
    10.244.31.0     10.244.31.0     255.255.255.0   UG    0      0        0 flannel.1
    10.244.52.0     10.244.52.0     255.255.255.0   UG    0      0        0 flannel.1
    10.244.57.0     10.244.57.0     255.255.255.0   UG    0      0        0 flannel.1
    172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0  # 这里不正常
    10.244.79.0     10.244.79.0     255.255.255.0   UG    0      0        0 flannel.1
    10.244.86.0     10.244.86.0     255.255.255.0   UG    0      0        0 flannel.1
    link-local      0.0.0.0         255.255.0.0     U     1002   0        0 eth0
    172.18.0.0      0.0.0.0         255.255.0.0     U     0      0        0 eth0
    
  2. 查看flannel配置是正常的,与其他节点的配置相同
  3. 查看docker状态,发现没有flannel网络信息
    [root@kubernetes-master01 ~]# systemctl status docker
    ● docker.service - Docker Application Container Engine
      Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
      Active: active (running) since Mon 2019-12-16 14:05:00 CST; 4 days ago
        Docs: https://docs.docker.com
    Main PID: 2951 (dockerd)
       Tasks: 40
      Memory: 220.2M
      CGroup: /system.slice/docker.service
              └─2951 /usr/bin/dockerd --bip=10.244.64.1/24 --ip-masq=false --mtu=1450 # 这里不正常
    
    Dec 16 14:04:59 kubernetes-master01 dockerd[2951]: time="2019-12-16T14:04:59.803667915+08:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
    Dec 16 14:04:59 kubernetes-master01 dockerd[2951]: time="2019-12-16T14:04:59.803716188+08:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc000790140, CONNECTING" module=grpc
    Dec 16 14:04:59 kubernetes-master01 dockerd[2951]: time="2019-12-16T14:04:59.803826846+08:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc000790140, READY" module=grpc
    Dec 16 14:05:00 kubernetes-master01 dockerd[2951]: time="2019-12-16T14:05:00.071742027+08:00" level=info msg="Graph migration to content-addressability took 0.00 seconds"
    Dec 16 14:05:00 kubernetes-master01 dockerd[2951]: time="2019-12-16T14:05:00.072256296+08:00" level=info msg="Loading containers: start."
    Dec 16 14:05:00 kubernetes-master01 dockerd[2951]: time="2019-12-16T14:05:00.321669668+08:00" level=info msg="Loading containers: done."
    Dec 16 14:05:00 kubernetes-master01 dockerd[2951]: time="2019-12-16T14:05:00.430940916+08:00" level=info msg="Docker daemon" commit=039a7df graphdriver(s)=overlay2 version=18.09.9
    Dec 16 14:05:00 kubernetes-master01 dockerd[2951]: time="2019-12-16T14:05:00.431253641+08:00" level=info msg="Daemon has completed initialization"
    Dec 16 14:05:00 kubernetes-master01 dockerd[2951]: time="2019-12-16T14:05:00.442545522+08:00" level=info msg="API listen on /var/run/docker.sock"
    Dec 16 14:05:00 kubernetes-master01 systemd[1]: Started Docker Application Container Engine.
    

    故障原因

    原来,在新增node5 node6 节点时,Docker的网络组件选择的是CNM,不是CNI,Docker基于CNM部署flannel时,需要将/run/flannel/subnet.env作为docker的环境变量,并且启动时指定需要指定flannel的网段信息

    解决方案

    1. 修改docker配置
      [root@kubernetes-master01 ~]# vim /usr/lib/systemd/system/docker.service # 这个文件内容见下文
      # 在 [service] 中新增如下两行
      EnvironmentFile=/run/flannel/subnet.env # 这个文件内容见下文
      ExecStart=/usr/bin/dockerd $DOCKER_NETWORK_OPTIONS -H fd:// --containerd=/run/containerd/containerd.sock
      
    2. 重新加载,并重启docker
      systemctl daemon-reload
      systemctl restart docker
      

完整subnet.env 内容

DOCKER_OPT_BIP="--bip=10.244.64.1/24"
DOCKER_OPT_IPMASQ="--ip-masq=false"
DOCKER_OPT_MTU="--mtu=1450"
DOCKER_NETWORK_OPTIONS=" --bip=10.244.64.1/24 --ip-masq=false --mtu=1450"

完整 docker.service 内容

[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
BindsTo=containerd.service
After=network-online.target firewalld.service containerd.service
Wants=network-online.target
Requires=docker.socket

[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
EnvironmentFile=/run/flannel/subnet.env
ExecStart=/usr/bin/dockerd $DOCKER_NETWORK_OPTIONS  -H fd:// --containerd=/run/containerd/containerd.sock
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always

# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
# Both the old, and new location are accepted by systemd 229 and up, so using the old location
# to make them work for either version of systemd.
StartLimitBurst=3

# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
# this option work for either version of systemd.
StartLimitInterval=60s

# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity

# Comment TasksMax if your systemd version does not supports it.
# Only systemd 226 and above support this option.
TasksMax=infinity

# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes

# kill only the docker process, not all processes in the cgroup
KillMode=process

[Install]
WantedBy=multi-user.target