kubeadm快速搭建k8s集群(单master节点)
一、集群部署前规划
主机 | 操作系统 | IP | docker版本 | k8s版本 |
---|---|---|---|---|
k8s-master1 | Centos7.9 | 192.168.15.139 | 20.10.12 | 1.23.4-0 |
k8s-node1 | Centos7.9 | 192.168.8.135 | 20.10.12 | 1.23.4-0 |
k8s-node2 | Centos7.9 | 192.168.8.136 | 20.10.12 | 1.23.4-0 |
二、主要步骤
- 节点准备工作(所有节点)
- 部署容器运行时docker(所有节点)
- 部署kubeadm,kubelet,kubectl这三个服务(所有节点)
- 初始化master节点(master节点)
- node节点使用kubeadm join 加入集群(所有node节点)
三、节点准备工作
#设置主机名
hostnamectl set-hostname master1
#时间同步
yum install -y chrony
systemctl enable chronyd && systemctl restart chronyd
timedatectl set-ntp true
# 关闭防火墙
systemctl stop firewalld
systemctl disable firewalld
#禁用selinux,设置SELINUX=disabled
sed -i 's/enforcing/disabled/' /etc/selinux/config
setenforce 0
#禁用swap分区
swapoff -a
sed -ri 's/.*swap.*/#&/' /etc/fstab
# 确保 br_netfilter 模块被加载
lsmod | grep br_netfilter
# 若要显式加载该模块,可执行
sudo modprobe br_netfilter
#允许iptables检查桥接流量
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sudo sysctl --system
#加载ipvs相关模块
cat > /etc/sysconfig/modules/ipvs.modules <<EOF
#!/bin/bash
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack_ipv4
EOF
#继续执行脚本
chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules && lsmod | grep -e ip_vs -e nf_conntrack_ipv4
#管理工具ipvsadm安装
yum install ipset ipvsadm -y
四、安装docker
# 1. 如果已经安装了docker,卸载旧版本(版本过低的情况下(k8s版本和docker版本有依赖关系)
yum remove docker \
docker-client \
docker-client-latest \
docker-common \
docker-latest \
docker-latest-logrotate \
docker-logrotate \
docker-engine
#2. 安装docker
# 提供yum-config-manager程序,device mapper 存储驱动程序需要 device-mapper-persistent-data 和 lvm2
sudo yum install -y yum-utils device-mapper-persistent-data lvm2
# 3. 设置镜像仓库
# yum-config-manager会自动生成/etc/yum.repos.d下面的yum源文件
# 使用阿里云源
sudo yum-config-manager --add-repo \
https://download.docker.com/linux/centos/docker-ce.repo
# 4. 查看可用版本
yum list docker-ce --showduplicates | sort -r
# 5. 安装最新版本,或者也可以安装指定版本
yum -y install docker-ce docker-ce-cli containerd.io
yum -y install docker-ce-<VERSION_STRING> docker-ce-cli-<VERSION_STRING> containerd.io
# 6. 设置docker开机自启动
systemctl start docker && systemctl enable docker
# 7. 检查docker是否正常运行
docker version
# 8. 配置docker,使用 systemd 来管理容器的 cgroup
sudo mkdir /etc/docker
cat <<EOF | sudo tee /etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2"
}
EOF
# 9. 重启docker
sudo systemctl daemon-reload
sudo systemctl restart docker
五、部署kubeadm,kubelet,kubectl
# 由于官网中的地址不可访问,所以添加阿里源
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=http://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=http://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg http://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
exclude=kube*
EOF
#安装 kubelet kubeadm kubectl
#--disableexcludes=kubernetes 禁掉除了这个之外的别的仓库
yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
systemctl enable kubelet && systemctl start kubelet
此时用 systemctl status kubelet查看kubelet发现未启动成功是正常的
master节点执行init操作成功或node节点加人集群后会启动
六、初始化master节点
1.生成初始化文件
kubeadm config print init-defaults > kubeadm-init.yaml
2.编kubeadm-init.yaml
将advertiseAddress: 1.2.3.4修改为本机IP地址
将imageRepository: k8s.gcr.io修改为imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers(阿里云的仓库)
修改节点名称,如果不修改就是默认的’node’
如果采用calico作为网络插件,在serviceSubnet: 10.96.0.0/12下面
添加podSubnet: 192.168.0.0/16
修改好的kubeadm-init.yaml
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 192.168.15.139
bindPort: 6443
nodeRegistration:
criSocket: /var/run/dockershim.sock
imagePullPolicy: IfNotPresent
name: master1
taints: null
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: 1.23.0
networking:
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12
podSubnet: 192.168.0.0/16
scheduler: {}
3.执行init操作
kubeadm init --config kubeadm-init.yaml
成功后保存生成的kubeadm join 192.168.15.139:6443 –token 1812pi.ejiahyyg5978c5oh –discovery-token-ca-cert-hash sha256:06feacafb8dc352f2432e9a121e440840144f1f746bdeb8173274dcb510a7e12命令,在子节点加入集群时会用到
忘记操作某个步骤导致init失败可执行以下命令重置,然后重新init
kubeadm reset
4.运行kubectl
# 添加权限
# 如果要使用root用户执行kubectl
export KUBECONFIG=/etc/kubernetes/admin.conf
# 要使非 root 用户可以运行 kubectl,请运行以下命令, 它们也是 kubeadm init 输出的一部分
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
此时查看节点的状态,为Not ready,因为还没有安装网络插件
4.安装网络插件
calico和flannel 二选一安装即可
安装calico
curl https://docs.projectcalico.org/manifests/calico.yaml -O
kubectl apply -f calico.yaml
安装flannel
wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
kubectl apply -f kube-flannel.yml
安装完后查看节点状态为 Ready
kubectl get node
NAME STATUS ROLES AGE VERSION
master01 Ready control-plane,master 4h4m v1.23.3
5.kube-proxy开启ipvs
#修改ConfigMap的kube-system/kube-proxy中的config.conf,mode: “ipvs”:
kubectl edit cm kube-proxy -n kube-system
#之后重启各个节点上的kube-proxy pod:
kubectl get pod -n kube-system | grep kube-proxy | awk '{system("kubectl delete pod "$1" -n kube-system")}'
#查看日志
kubectl logs kube-proxy-2696f -n kube-system
日志中打印出了Using ipvs Proxier,说明ipvs模式已经开启。
七、node节点加入集群
在子节点服务器上执行kubeadm join 命令(master节点init操作生成的)
kubeadm join 192.168.15.139:6443 --token 1812pi.ejiahyyg5978c5oh --discovery-token-ca-cert-hash sha256:06feacafb8dc352f2432e9a121e440840144f1f746bdeb8173274dcb510a7e12
要是忘记保存这串命令可以执行以下命令重新获取
kubeadm token create --print-join-command
删除节点
由于这个节点上运行着服务,直接删除掉节点会导致服务不可用.我们首先使用
kubectl drain
命令来驱逐这个节点上的所有pod
kubectl drain nodename --delete-local-data --force --ignore-daemonsets
kubectl delete node nodename
——————————————————-(以下皆是可选安装)——————————————————————–
八、安装Dashboard
- 下载yaml,并运行Dashboard
1.下载yaml
wget https://raw.githubusercontent.com/kubernetes/dashboard/v2.3.1/aio/deploy/recommended.yaml
2.修改kubernetes-dashboard的Service类型
vim recommended.yaml
kind: Service
apiVersion: v1
metadata:
labels:
k8s-app: kubernetes-dashboard
name: kubernetes-dashboard
namespace: kubernetes-dashboard
spec:
type: NodePort # 新增
ports:
- port: 443
targetPort: 8443
nodePort: 30009 # 新增
selector:
k8s-app: kubernetes-dashboard
3.部署
kubectl create -f recommended.yaml
4.查看namespace下的kubernetes-dashboard下的资源
[root@master1 kubernetes]# kubectl get pod,svc -n kubernetes-dashboard
NAME READY STATUS RESTARTS AGE
pod/dashboard-metrics-scraper-79459f84f-v5995 1/1 Running 0 60s
pod/kubernetes-dashboard-76dc96b85f-tqctb 1/1 Running 0 60s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/dashboard-metrics-scraper ClusterIP 10.99.123.73 <none> 8000/TCP 60s
service/kubernetes-dashboard NodePort 10.103.73.202 <none> 443:30009/TCP 60s
若出现问题:
解决:
vim /etc/hosts
199.232.96.133 raw.githubusercontent.com
- 创建访问账户,获取token
# 创建账号
kubectl create serviceaccount dashboard-admin -n kubernetes-dashboard
# 授权
kubectl create clusterrolebinding dashboard-admin-rb --clusterrole=cluster-admin --serviceaccount=kubernetes-dashboard:dashboard-admin
# 获取账号token
[root@master1 kubernetes]# kubectl get secrets -n kubernetes-dashboard | grep dashboard-admin
dashboard-admin-token-hmphn kubernetes.io/service-account-token 3 57s
[root@master1 mnt]# kubectl describe secrets dashboard-admin-token-bssq7 -n kubernetes-dashboard
Name: dashboard-admin-token-bssq7
Namespace: kubernetes-dashboard
Labels: <none>
Annotations: kubernetes.io/service-account.name: dashboard-admin
kubernetes.io/service-account.uid: c81ce85c-1903-4fd9-97df-e66ee8cba593
Type: kubernetes.io/service-account-token
Data
====
token: eyJhbGciOiJSUzI1NiIsImtpZCI6ImVyV1g5dW5uR2NHVVd3ZkkzcEtST2ViOUIzbXVUSmlPcEVlYkNSZEFOd28ifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkYXNoYm9hcmQtYWRtaW4tdG9rZW4tdGo1anMiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoiZGFzaGJvYXJkLWFkbWluIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiMGYwNTFhOWQtOWYxYy00MDdiLTgwZDYtNTVlN2EzYmZkNjM4Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmVybmV0ZXMtZGFzaGJvYXJkOmRhc2hib2FyZC1hZG1pbiJ9.Cc-uva1RsQYmJb3bN2BTVmzUyklfzYM4qd9l5caz4XFWtplZT3kNmNELX_N9X8dg7lb-h9pOptIFA1FeuEVU5Q0mMeuV5PVQlZAUs3OUAW4A9R4HQ-f5_4UIXAGCz5hSf55ChwmOxLsSi16orFnfR96YIC-uQvY7VVP_KJB2oIhhraX-Mbzu-LzOSrSIjhhmf3HBTPud9H3GoLZUyNGrG6VNzkG6XUanF2P36aLLolq8V-7IPRezKGnjhF7W3cjPDxj0vzdwVd9IAOhMDeXkXU011GuW04YSRJ4FzMjQVASaB3GVj7c4-tSINCv3Wto9o48PVC6tsloQuoxzwsr_CQ
ca.crt: 1099 bytes
namespace: 20 bytes
- 通过浏览器访问Dashboard的UI
在登录页面上输入上面的token
出现下面的页面代表成功
九、安装kubectl命令自动补全功能
yum install -y bash-completion
source /usr/share/bash-completion/bash_completion
source <(kubectl completion bash)
echo "source <(kubectl completion bash)" >> ~/.bashrc
十、安装helm
K8s 版本支持的各个 helm 版本对照表:
https://helm.sh/zh/docs/topics/version_skew/
1.安装helm客户端工具
wget https://get.helm.sh/helm-v3.8.0-linux-amd64.tar.gz
tar xf helm-v3.8.0-linux-amd64.tar.gz
mv linux-amd64/helm /usr/local/bin/helm
2.添加仓库
#添加阿里云的 chart 仓库
helm repo add aliyun https://kubernetes.oss-cn-hangzhou.aliyuncs.com/charts
#添加 bitnami 的 chart 仓库
helm repo add bitnami https://charts.bitnami.com/bitnami
#添加ingress-nginx
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
3.更新仓库
#更新 chart 仓库
helm repo update
#查看配置的 chart 仓库有哪些
helm repo list
4.删除仓库地址
#删除 chart 仓库地址
helm repo remove aliyun
5.helm基本使用
搜索和下载 Chart
#查看阿里云 chart 仓库中的 memcached
helm search repo aliyun |grep memcached
#查看 chart 信息
helm show chart aliyun/memcached
#下载 chart 包到本地
helm pull aliyun/memcached
#安装chart
helm install aliyun memcached -n namespace
十一、安装ingress
ingress负载七层负载均衡,如果使用到需要安装
方式一:通过yaml文件安装
1.从github获取deploy.yam文件
wget https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/baremetal/1.19/deploy.yaml
#deploy.yaml文件里的镜像国内无法下载,需要修改为国内镜像
vi deploy.yaml
#需要修改的镜像有三处,将image的值改为如下值:
k8s.gcr.io/ingress-nginx/controller:v1.1.1(第一处)
anjia0532/google-containers.ingress-nginx.controller:v1.1.1
k8s.gcr.io/ingress-nginx/kube-webhook-certgen:v1.1.1(第二和第三处)
anjia0532/google-containers.ingress-nginx.kube-webhook-certgen:v1.1.1
#部署
kubectl apply -f deploy.yaml
# 检查安装的结果
kubectl get pod,svc -n ingress-nginx
# 最后别忘记把svc暴露的端口要放行
方式二:使用helm安装
1.搜索chart
[root@master1 ~]# helm search repo bitnami |grep ingress
bitnami/contour 7.3.11 1.20.1 Contour is an open source Kubernetes ingress co...
bitnami/nginx-ingress-controller 9.1.9 1.1.1 NGINX
2.拉取ingress的chart,这里我用的是bitnami 仓库的nginx-ingress-controller
helm pull bitnami nginx-ingress-controller
3.安装chart(ingress-nginx 在k8s提前建好)
helm install bitnami nginx-ingress-controller -n ingress-nginx
4.# 检查安装的结果
kubectl get pod,svc -n ingress-nginx
十二、安装metrics-server
1.下载yaml
wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
2.修改yaml
镜像修改为:bitnami/metrics-server
添加一行tls验证
spec:
containers:
- args:
.........
.........
- --kubelet-insecure-tls
image: bitnami/metrics-server
3.部署
kubectl apply -f components.yaml
4.查看
kubectl top nod
kubectl top pod
十三、部署storageclass
前提:nfs服务器(可参考之前的nfs部署)
创建共享目录
mkdir /nfs/k8s && chmod 777 /nfs/k8s
编辑共享目录配置
vim /etc/exports
/nfs/k8s * (rw,async,no_subtree_check)
使配置生效
exportfs -r
查看网络上可用的NFS服务
showmount -e 192.168.15.139
1.使用RBAC进行授权
为nfs-client-provisioner创建一个serviceAccount,然后绑定上对应的权限(rbac.yaml)
apiVersion: v1
kind: ServiceAccount
metadata:
name: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: storgeclass
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: nfs-client-provisioner-runner
rules:
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "watch", "update","create"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: run-nfs-client-provisioner
subjects:
- kind: ServiceAccount
name: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: storgeclass
roleRef:
kind: ClusterRole
name: nfs-client-provisioner-runner
apiGroup: rbac.authorization.k8s.io
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: leader-locking-nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: storgeclass
rules:
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: leader-locking-nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: storgeclass
subjects:
- kind: ServiceAccount
name: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: storgeclass
roleRef:
kind: Role
name: leader-locking-nfs-client-provisioner
apiGroup: rbac.authorization.k8s.io
kubectl apply -f rbac.yaml
2.创建Deployment(nfs-client-provisioner.yam)
注意修改nfs服务端IP和共享目录
apiVersion: apps/v1
kind: Deployment
metadata:
name: nfs-client-provisioner
labels:
app: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: storgeclass
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: nfs-client-provisioner
template:
metadata:
labels:
app: nfs-client-provisioner
spec:
serviceAccountName: nfs-client-provisioner
containers:
- name: nfs-client-provisioner
image: quay.io/external_storage/nfs-client-provisioner:latest
volumeMounts:
- name: nfs-client-root
mountPath: /persistentvolumes
env:
- name: PROVISIONER_NAME
value: fuseim.pri/ifs
- name: NFS_SERVER
value: 192.168.15.139
- name: NFS_PATH
value: /nfs/k8s
volumes:
- name: nfs-client-root
nfs:
server: 192.168.15.139
path: /nfs/k8s
kubectl apply -f nfs-client.yam
3.创建默认存储类(storageclass.yaml)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: managed-nfs-storage
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: fuseim.pri/ifs # or choose another name, must match deployment's env PROVISIONER_NAME'
parameters:
archiveOnDelete: "false"
kubectl apply -f storageclass.yaml
4.验证
[root@master1 storgeclass]# kubectl get pod -n storgeclass
NAME READY STATUS RESTARTS AGE
nfs-client-provisioner-57b584586b-v4xz7 1/1 Running 0 4h1m
[root@master1 storgeclass]# kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
managed-nfs-storage (default) fuseim.pri/ifs Delete Immediate false 178m
十四、部署kubesphere
部署kubesphere前需要先部署一个storageclass(参考storagephere部署)
1.部署
kubectl apply -f https://github.com/kubesphere/ks-installer/releases/download/v3.2.1/kubesphere-installer.yaml
kubectl apply -f https://github.com/kubesphere/ks-installer/releases/download/v3.2.1/cluster-configuration.yaml
2.编辑cluster-configuration.yaml
endpointIps上的localhost改为对应的etcd地址
endpointIps: 192.168.15.139 # etcd cluster EndpointIps. It can be a bunch of IPs here.
port: 2379 # etcd port.
开启插拔式插件可参考官方文档:
https://kubesphere.com.cn/docs/pluggable-components/devops/
3.检查安装日志
kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l app=ks-install -o jsonpath='{.items[0].metadata.name}') -f
出现这个界面则成功
4.访问kubesphere
Console: http://192.168.15.139:30880
Account: admin
Password: P@88w0rd
附录:
1.为 Kubernetes 项目生成对外提供服务时所需的证书文件,都放在 Master 节点的 /etc/kubernetes/pki 目录下。
比如:用户使用 kubectl 获取容器日志等 streaming 操作时,需要通过 kube-apiserver 向 kubelet 发起请求,这个连接也必须是安全的。kubeadm 为这一步生成的是 apiserver-kubelet-client.crt 文件,对应的私钥是 apiserver-kubelet-client.key。
完整的证书文件如下,其中以.key结尾的都是私钥文件:
apiserver.crt
apiserver-etcd-client.crt
apiserver-etcd-client.key
apiserver.key
apiserver-kubelet-client.crt # kube-apiserver 向kubelet发起请求使用(比如kubectl获取容器日志等操作)
apiserver-kubelet-client.key # apiserver-kubelet-client.crt的私钥
ca.crt #最主要的证书
ca.key # ca.crt的私钥
etcd # 这是一个目录
front-proxy-ca.crt
front-proxy-ca.key
front-proxy-client.crt
front-proxy-client.key
sa.key
sa.pub
在目录/etc/kubernetes/生成配置文件,配置文件中记录当前这个 Master 节点的服务器地址、监听端口、证书目录等信息, 以便 kubelet、kubectl和scheduler可以直接加载相应的conf文件使用里面的信息,来与 API 服务器建立安全的连接。
同时生成一个名为 admin.conf 的独立的 kubeconfig 文件,用于管理操作。剩下的过程请参考官方文档。
# 完整的配置文件列表
[root@master01 kubernetes]# ll /etc/kubernetes/|grep conf|awk '{print $NF}'
admin.conf
controller-manager.conf
kubelet.conf
scheduler.conf
1.k8s删除命名空间出现The resource may continue to run on the cluster indefinitely
kubesphere-monitoring-federated Terminating 7h28m
并且用
kubectl delete nskubesphere-monitoring-federated --force --grace-period=0
也无法删除时
kubectl edit ns kubesphere-monitoring-federated
把
finalizers:
- finalizers.kubesphere.io/namespaces
去掉则可正常删除
2.k8s存储类(storageclass)动态创建pv失败
背景:安装kubesphere后发现prometheus的pod创建一直有问题
kubectl get pod -n kubesphere-monitoring-system
.......
prometheus-k8s-0 0/2 Pending 0 3h47m
prometheus-k8s-1 0/2 Pending 0 3h47m
..........
查看日志报错
pod has unbound immediate PersistentVolumeClaims
查看pvc发现一直是Pending
kubectl get pvc -n kubesphere-monitoring-system
.........
prometheus-k8s-db-prometheus-k8s-0 Pending managed-nfs-storage 16h
prometheus-k8s-db-prometheus-k8s-1 Pending managed-nfs-storage 16h
.........
再查看pv发现没有被自动创建
kubectl get pv -n kubesphere-monitoring-system
查看nfs-client-provisioner日志发现报错
kubectl logs -n storgeclass nfs-client-provisioner-57b584586b-v4xz7
.........
unexpected error getting claim reference: selfLink was empty, can't make reference
.........
原因:
elfLink was empty 在k8s集群 v1.20之前都存在,在v1.20之后被删除,需要在/etc/kubernetes/manifests/kube-apiserver.yaml 添加参数
增加 – –feature-gates=RemoveSelfLink=false
vim /etc/kubernetes/manifests/kube-apiserver.yaml
spec:
containers:
- command:
...........
- kube-apiserver
- --feature-gates=RemoveSelfLink=false #添加内容
..............
kubectl apply -f /etc/kubernetes/manifests/kube-apiserver.yaml
到此问题解决
3.解决监控异常问题:
kubectl get pod -n kubesphere-monitoring-system
prometheus-k8s-0 0/2 Pending 0 3h47m
prometheus-k8s-1 0/2 Pending 0 3h47m
普罗米修斯这个pod创建一直有问题
kubectl describe pod prometheus-k8s-0 -n kubesphere-monitoring-system
Warning FailedScheduling 94s (x237 over 3h57m) default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.
是因为缺失证书
#监控证书位置
ps -ef | grep kube-apiserver
...........
--etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
...........
解决问题
kubectl -n kubesphere-monitoring-system create secret generic kube-etcd-client-certs --from-file=etcd-client-ca.crt=/etc/kubernetes/pki/etcd/ca.crt --from-file=etcd-client.crt=/etc/kubernetes/pki/apiserver-etcd-client.crt --from-file=etcd-client.key=/etc/kubernetes/pki/apiserver-etcd-client.key
4.安装calico网络插件后K8s集群节点间通信找不到主机路由(no route to host)
背景:k8s安装calico网络插件后master节点ping不通其它node节点,但可以ping通外网,同时calico有一个pod启动异常,日志报错信息calico/node is not ready: BIRD is not ready: BGP not established with 192.168.8.xxx,192.168.8.xxx
[root@master1 ~]# ping 192.168.8.131
connect: No route to host
[root@master1 ~]# ping 192.168.8.132
connect: No route to host
节点一会处于Ready状态,一会处于NotReady状态
[root@master1 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master1 Ready control-plane,master 22d v1.23.4
node1 NotReady <none> 22d v1.23.4
node2 NotReady <none> 22d v1.23.4
最后排查到的原因
Pod CIDR
与节点IP冲突,Calico的
Pod CIDR
即
--pod-network-cidr
默认使用的是
192.168.0.0/16
,而当集群节点的IP段也为
192.168.0.0/16
时,必然导致IP段冲突
当Pod子网和主机网络出现冲突的情况下就会出现问题。节点与节点,Pod与Pod之间通信会因为路由问题被中断。仔细检查网络设置,确保
Pod CIDR
、
VLAN
或
VPC
之间不会有重叠。如果有冲突的,我们可以在CNI插件或kubelet的
pod-cidr
参数中指定 IP 地址范围,避免冲突。
解决方案
重新配置Calico的Pod CIDR
vim calico.yaml
..............
- name: CALICO_IPV4POOL_CIDR
#value: "192.168.0.0/16"
value: "172.16.0.0/16"
..............
kubectl delete -f calico.yaml
kubectl apply -f calico.yaml
到此,calico所有pod成功启动,节点间可以相互ping通,问题解决
nect: No route to host
[root@master1 ~]# ping 192.168.8.132
connect: No route to host
节点一会处于Ready状态,一会处于NotReady状态
[root@master1 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master1 Ready control-plane,master 22d v1.23.4
node1 NotReady 22d v1.23.4
node2 NotReady 22d v1.23.4
最后排查到的原因
> `Pod CIDR`与节点IP冲突,Calico的`Pod CIDR`即`--pod-network-cidr`默认使用的是`192.168.0.0/16`,而当集群节点的IP段也为`192.168.0.0/16`时,必然导致IP段冲突
当Pod子网和主机网络出现冲突的情况下就会出现问题。节点与节点,Pod与Pod之间通信会因为路由问题被中断。仔细检查网络设置,确保`Pod CIDR`、`VLAN`或`VPC`之间不会有重叠。如果有冲突的,我们可以在CNI插件或kubelet的`pod-cidr`参数中指定 IP 地址范围,避免冲突。
解决方案
重新配置Calico的Pod CIDR
vim calico.yaml
…
-
name: CALICO_IPV4POOL_CIDR
#value: “192.168.0.0/16”
value: “172.16.0.0/16”
…
kubectl delete -f calico.yaml
kubectl apply -f calico.yaml
到此,calico所有pod成功启动,节点间可以相互ping通,问题解决