Centos7使用kubeadm安装部署kubernetes1.14.2(史上最全,19.5.28)

  • Post author:
  • Post category:其他

Centos7使用kubeadm安装部署kubernetes1.14.2(最新,19.5.28)

2080Ti*4

k8s一般两种安装方式,一种是kubeadm,另外一种是二进制安装(相对繁琐)

测试环境:

  • Centos7.6 master 192.168.0.108

  • Centos7.6 node1 192.168.0.107

  • Centos7.6 node2 192.168.0.109

    网络插件 calico

具体步骤:

1.安装centos7.6系统(所有主机)

2.安装nvidia显卡驱动(所有主机)

​ (如果之前安装过)

​ yum list installed(查看安装的所有的包)

​ 找到NVIDIA为首或者相关的包:yum remove -f

​ 卸载完成

​ (如果没有安装过,上面的步骤省略, 进root后操作)

(1)添加ElRepo源

​ rpm –import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org

​ rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm

(2)安装显卡驱动检查

​ yum install -y nvidia-detect

(3)运行 nvidia-detect -v 显示结果为我们要安装的版本号

(4)这里下载该版本的.run文件

https://www.nvidia.com/Download/index.aspx?lang=en-us

(5)进入安装前的准备工作

​ yum -y update //注意这是升级系统,请酌情使用

​ yum -y groupinstall “Development Tools” 安装一些必要的开发工具

​ yum -y install kernel-devel

​ yum -y install epel-release

​ yum -y install dkms

(6)编写grub文件

​ vi /etc/default/grub

​ 在“GRUB_CMDLINE_LINUX”中添加

​ rd.driver.blacklist=nouveau nouveau.modeset=0

​ 随后生成配置

​ grub2-mkconfig -o /boot/grub2/grub.cfg

(7)创建blacklist

​ vi /etc/modprobe.d/blacklist.conf

​ 添加:blacklist nouveau

(8)更新配置

​ mv /boot/initramfs-

(

u

n

a

m

e

r

)

.

i

m

g

/

b

o

o

t

/

i

n

i

t

r

a

m

f

s

(uname -r).img /boot/initramfs-

(unamer).img/boot/initramfs(uname -r)-nouveau.img

​ dracut /boot/initramfs-$(uname -r).img $(uname -r)

(9)重启

​ reboot

(10)确认仅用了nouveau

​ lsmod | grep nouveau 无输出则说明禁用成功

(11)开始安装

​ sh NVIDIA-Linux-x86_64-418.74.run

3.安装cuda(所有主机)

(1)检查是否安装了gcc、g++编译器

​ yum list installed | grep gcc

​ yum list installed | grep g++

(2)安装gcc、g++

​ yum install gcc

​ yum install gcc-c++

(3)下载cuda

https://developer.nvidia.com/cuda-downloads

(4)根据安装提示安装cuda

(5)vi .bashrc

​ 添加内容:(具体看你安装的版本,对cuda-<版本号>进行修改)

export PATH=$PATH:/usr/local/cuda-10.1/bin    
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.1/lib64 

​ 保存退出

​ source .bashrc(立即生效文件)

4.安装cudnn

(1)自行注册并下载cudnn(下载的后缀可能没见过并且比较长,自行改为.tgz)

(2)解压cudnn

​ tar -xzvf cudnn-10.1-linux-x64-v7.5.1.10.tgz

(3)复制相关文件到特定目录

​ (版本号自行更改)

sudo cp cuda/include/cudnn.h /usr/local/cuda-10.1/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda-10.1/lib64

(4)修改文件权限

​ sudo chmod a+r /usr/local/cuda-10.1/include/cudnn.h /usr/local/cuda-10.1/lib64/libcudnn*

5.安装kubernetes

(1)环境预设(所有主机)

​ 关闭firewalld

​ systemctl stop firewalld && systemctl disable firewalld

​ 关闭SElinux

setenforce 0 && sed -i "s/SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config

​ 关闭swap

swapoff -a && sed -i "s/\/dev\/mapper\/centos-swap/\#\/dev\/mapper\/centos-swap/g" /etc/fstab

​ 使用阿里云yum源:

​ mv /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.bak (备份)

​ wget -o /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo

​ 更新/etc/hosts文件加入

​ 192.168.0.108 master

​ 192.168.0.107 node1

​ 192.168.0.109 node2

(2)安装docker引擎(所有主机)

​ 卸载旧版本的docker

​ yum remove docker docker-common docker-selinux docker-engine

​ 安装依赖包

​ yum install -y yum-utils device-mapper-persistent-data lvm2

​ 安装阿里云docker源

	yum-config-manager \
    --add-repo \
    https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo

​ 安装docker-ce

​ yum makecache fast

​ yum install -y docker-ce

​ 安装nvidia-docker2

​ 删除旧版本的docker

docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
sudo yum remove nvidia-docker

​ 添加nvidia-docker源

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | \
  sudo tee /etc/yum.repos.d/nvidia-docker.repo

​ 安装

​ yum install -y nvidia-docker2

​ sudo pkill -SIGHUP dockerd

​ 调整docker部分参数

mkdir -p /etc/docker
tee /etc/docker/daemon.conf <<-'EOF'
{
  "registry-mirrors": ["https://5twf62k1.mirror.aliyuncs.com"],   
  "exec-opts": ["native.cgroupdriver=systemd"] 
}
EOF
tee /etc/docker/daemon.json <<-'EOF'
{
    "registry-mirrors": ["https://docker.mirrors.ustc.edu.cn/"],
    "exec-opts": ["native.cgroupdriver=systemd"],
    "default-runtime":"nvidia",
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

EOF

​ sudo pkill -SIGHUP dockerd

​ 测试nvidia-smi

​ docker run –runtime=nvidia –rm nvidia/cuda nvidia-smi(能够显示和nvidia-smi一样的效果)

​ 检查确认docker的Cgroup Driver信息:

​ docker info | grep Cgroup(可能会有warning,度娘搜索一下即可解决)

​ 显示:Cgoup Driver: systemd

(3)安装kubernetes初始化工具(在所有主机)

​ 使用阿里云的kubernetes源

cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF

​ yum install -y kubelet kubeadm kubectl //此时最新版本为1.14.2

​ 启动:systemctl enable kubelet && systemctl start kubelet

​ 此时运行 systemctl status kubelet 可能不成功,后面初始化的时候会成功

(4)预下载镜像(在master上操作)

​ 查看集群初始化所需镜像及对应依赖版本号:

​ kubeadm config images list (因为这些重要镜像都被墙了,所以要预先单独下载好,然后才能初始化集群。)

……
k8s.gcr.io/kube-apiserver:v1.14.1
k8s.gcr.io/kube-controller-manager:v1.14.1
k8s.gcr.io/kube-scheduler:v1.14.1
k8s.gcr.io/kube-proxy:v1.14.1
k8s.gcr.io/pause:3.1
k8s.gcr.io/etcd:3.3.10
k8s.gcr.io/coredns:1.3.1

​ 下载脚本

#!/bin/bash

set -e

KUBE_VERSION=v1.14.2
KUBE_PAUSE_VERSION=3.1
ETCD_VERSION=3.3.10
CORE_DNS_VERSION=1.3.1

GCR_URL=k8s.gcr.io
ALIYUN_URL=registry.cn-hangzhou.aliyuncs.com/google_containers

images=(kube-proxy:${KUBE_VERSION}
kube-scheduler:${KUBE_VERSION}
kube-controller-manager:${KUBE_VERSION}
kube-apiserver:${KUBE_VERSION}
pause:${KUBE_PAUSE_VERSION}
etcd:${ETCD_VERSION}
coredns:${CORE_DNS_VERSION})


for imageName in ${images[@]} ; do
  docker pull $ALIYUN_URL/$imageName
  docker tag  $ALIYUN_URL/$imageName $GCR_URL/$imageName
  docker rmi $ALIYUN_URL/$imageName
done

(5)初始化集群(在master上操作)

​ kubeadm init –kubernetes-version=v1.14.2 –pod-network-cidr=192.168.0.0/16

​ (最后显示kubeadm join 192.168.0.108:6443 –token …说明初始化成功)

​ 提示信息给出了接下来的必要步骤和节点加入集群的命令,照着做即可

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

​ 查看已经运行的pod

​ kubectl get pod -n kube-system -owide

​ 可以看到里面还有一些preding,因为这时候还没有安装网络插件,接下来安装calico后就变成 正常runing(多等会再次看)

(6)安装calico(在master上操作)

kubectl apply -f \
https://docs.projectcalico.org/v3.5/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml

​ 查看节点状态

​ kubectl get node -owide

(7)加入集群(在node节点操作)

​ 先在需要加入集群的节点上下载必要镜像,下载脚本如下:

#!/bin/bash

set -e

KUBE_VERSION=v1.14.1
KUBE_PAUSE_VERSION=3.1

GCR_URL=k8s.gcr.io
ALIYUN_URL=registry.cn-hangzhou.aliyuncs.com/google_containers

images=(kube-proxy-amd64:${KUBE_VERSION}
pause:${KUBE_PAUSE_VERSION})


for imageName in ${images[@]} ; do
  docker pull $ALIYUN_URL/$imageName
  docker tag  $ALIYUN_URL/$imageName $GCR_URL/$imageName
  docker rmi $ALIYUN_URL/$imageName
done

​ 输入主节点初始化输出中获取加入集群的命令,复制到工作点即可:

​ kubeadm join 192.168.0.108:6443 –token …

(8)在master节点上查看各节点工作状态

​ kubectl get nodes

(9)增加节点

​ master节点初始化token 会24小时过期(master)

​ kubeadm token list(过24小时就会没有)(master)

​ kubeadm token create(生成token)(master)

​ openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed ‘s/^.* //’ (生成哈希值)(master)

​ kubeadm join 192.168.0.108:6443 –token –discovery-token-ca-cert-hash sha256: (node)

​ kubectl get nodes(master)

(10)node节点配置nvidia-device-plugin

​ scp -r /etc/kubernetes/admin.conf 192.168.0.107(109):/etc/kubernetes/ (master)

​ echo “export KUBECONFIG=/etc/kubernetes/admin.conf” >> ~/.bash_profile(node)

​ source ~/.bash_profile(node)

​ wget https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.11/nvidia-device-plugin.yml(node)

​ kubectl delete -f nvidia-device-plugin.yml(node)

​ kubectl create -f nvidia-device-plugin.yml(node)

​ systemctl restart kubelet(node)

​ kubectl describe node master/node1/node2就可以看见nvidia.com/gpu: 4了,说明成功了
在这里插入图片描述

​ (如果不出现,看看是否显卡驱动挂了 ,nvidia-smi查看)

(安装中会有好多镜像下载不成功,多试几次就好了


版权声明:本文为XDhughie原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。