节点 master ip 10.4.7.139
node01 ip 10.4.7.140
node02 ip 10.4.7.141
1 安装前准备(所有节点)
1.1关selinux, 关firewalld
sed -i "s/SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config
systemctl stop firewalld
systemctl disable firewalld
1.2配置主机名,静态解析(所有节点,注意)
hostnamectl set-hostname master[node01,node02]
cat /etc/hosts
127.0.0.1 master[node01,node02] localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.4.7.139 master
10.4.7.140 node01
10.4.7.141 node02
1.3创建用户(所有节点)
useradd -m lsfadmin
1.4设置免密登录(所有节点)
ssh-keygen
ssh-copy-id root@10.4.7.140
1.5创建共享目录(master节点)
mkdir /opt/lsf
cat /etc/exports
/opt/lsf 10.4.7.140(rw,async,no_root_squash)
/opt/lsf 10.4.7.141(rw,async,no_root_squash)
1.6挂载共享目录(node01,node02)
showmount -e master
[root@node01 ~]# echo "master:/opt/lsf /opt/lsf nfs defaults 0 0">>/etc/fstab
[root@node01 ~]# mount -a
[root@node02 ~]# echo "master:/opt/lsf /opt/lsf nfs defaults 0 0">>/etc/fstab
[root@node02 ~]# mount -a
2安装(master节点)
2.1上传安装包到 /opt/lsf 并解压(master节点)
2.1.1解压社区版lsfsce10.2.0.6-x86_64.tar.gz
#pwd
/opt/lsf
# tar -zxvf lsfsce10.2.0.6-x86_64.tar.gz
2.1.2将解压的tar.Z文件移动到共享目录 /opt/lsf 下
#mv /opt/lsf/lsfsce10.2.0.6-x86_64/lsf/*.tar.Z /opt/lsf
# ll
-rw-rw-r-- 1 33209 10007 1138872309 Jun 15 2018 lsf10.1_linux2.6-glibc2.3-x86_64.tar.Z
-rw-rw-r-- 1 33209 10007 118877581 Jun 15 2018 lsf10.1_lsfinstall_linux_x86_64.tar.Z
2.1.3解压lsf10.1_lsfinstall_linux_x86_64.tar.Z,但是不要解压lsf10.1_linux2.6-glibc2.3-x86_64.tar.Z
# tar -xvf lsf10.1_lsfinstall_linux_x86_64.tar.Z
2.2修改配置文件
echo 'LSF_TOP="/opt/lsf"
LSF_ADMINS="lsfadmin"
LSF_CLUSTER_NAME="DigitalChina-No.1"
LSF_MASTER_LIST="master"
LSF_TARDIR="/opt/lsf/"
LSF_ADD_SERVERS="node01 node02"'>>/opt/lsf/lsf10.1_lsfinstall/install.config
2.3 安装(master节点)
./lsfinstall -f install.config
tips:期间需要输入几次选项 1
tips:仔细阅读安装过程的输出内容,安装完成后会生成一个lsf_quick_admin.html网页,后续步骤可以参考这个网页。
2.4 自动添加环境变量(所有节点)
echo “. /opt/lsf/conf/profile.lsf”>>/etc/profile
2.5 由于安装完默认集群间通过rsh通信,我们需要修改为ssh
echo "LSF_RSH=ssh" >> /opt/lsf/conf/lsf.conf
3启动集群并测试(所有节点)
3.1启动集群
lsadmin limstartup
lsadmin resstartup
badmin hstartup
[root@node01 ~]# lshosts
HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES
master X86_64 Intel_E5 12.5 1 1.7G 2G Yes (mg)
node01 X86_64 Intel_E5 12.5 1 1.7G 2G Yes ()
node02 X86_64 Intel_E5 12.5 1 1.7G 2G Yes ()
3.2测试
[root@node01 lsf10.1_lsfinstall]# bsub sleep 120
User permission denied. Job not submitted.
[root@node01 lsf10.1_lsfinstall]# su - lsfadmin
[lsfadmin@node01 ~]$ bsub sleep 120
Job <101> is submitted to default queue <normal>.
[lsfadmin@node01 ~]$ bsub sleep 130
Job <102> is submitted to default queue <normal>.
[lsfadmin@node01 ~]$ bsub sleep 140
Job <103> is submitted to default queue <normal>.
[lsfadmin@node01 ~]$ bsub sleep 150
Job <104> is submitted to default queue <normal>.
[lsfadmin@node01 ~]$ bsub sleep 160
Job <105> is submitted to default queue <normal>.
[lsfadmin@node01 ~]$ bsub sleep 170
Job <106> is submitted to default queue <normal>.
[lsfadmin@node01 ~]$ bjobs
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
101 lsfadmi RUN normal node01 node01 sleep 120 Nov 19 22:50
102 lsfadmi RUN normal node01 master sleep 130 Nov 19 22:51
103 lsfadmi RUN normal node01 node02 sleep 140 Nov 19 22:51
104 lsfadmi PEND normal node01 sleep 150 Nov 19 22:51
105 lsfadmi PEND normal node01 sleep 160 Nov 19 22:51
106 lsfadmi PEND normal node01 sleep 170 Nov 19 22:51
4 设置开机自启动(所有节点)
/opt/lsf/10.1/install/hostsetup –top=”/opt/lsf” –boot=”y”