IBM-LSF-社区版搭建记录

  • Post author:
  • Post category:其他


节点  master        ip 10.4.7.139

node01        ip 10.4.7.140

node02        ip 10.4.7.141

1 安装前准备(所有节点)

1.1关selinux, 关firewalld

sed -i "s/SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config
systemctl stop firewalld
systemctl disable firewalld

1.2配置主机名,静态解析(所有节点,注意)

hostnamectl set-hostname master[node01,node02]
cat /etc/hosts
127.0.0.1   master[node01,node02] localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.4.7.139 master
10.4.7.140 node01
10.4.7.141 node02

1.3创建用户(所有节点)

useradd -m lsfadmin

1.4设置免密登录(所有节点)

ssh-keygen
ssh-copy-id root@10.4.7.140

1.5创建共享目录(master节点)

mkdir /opt/lsf
cat /etc/exports
/opt/lsf 10.4.7.140(rw,async,no_root_squash)
/opt/lsf 10.4.7.141(rw,async,no_root_squash)

1.6挂载共享目录(node01,node02)

showmount -e master

[root@node01 ~]# echo "master:/opt/lsf /opt/lsf nfs defaults 0 0">>/etc/fstab
[root@node01 ~]# mount -a

[root@node02 ~]# echo "master:/opt/lsf /opt/lsf nfs defaults 0 0">>/etc/fstab
[root@node02 ~]# mount -a

2安装(master节点)

2.1上传安装包到 /opt/lsf 并解压(master节点)

2.1.1解压社区版lsfsce10.2.0.6-x86_64.tar.gz

#pwd
/opt/lsf
# tar -zxvf lsfsce10.2.0.6-x86_64.tar.gz

2.1.2将解压的tar.Z文件移动到共享目录 /opt/lsf 下

#mv /opt/lsf/lsfsce10.2.0.6-x86_64/lsf/*.tar.Z  /opt/lsf
# ll
-rw-rw-r-- 1 33209 10007 1138872309 Jun 15  2018 lsf10.1_linux2.6-glibc2.3-x86_64.tar.Z
-rw-rw-r-- 1 33209 10007  118877581 Jun 15  2018 lsf10.1_lsfinstall_linux_x86_64.tar.Z

2.1.3解压lsf10.1_lsfinstall_linux_x86_64.tar.Z,但是不要解压lsf10.1_linux2.6-glibc2.3-x86_64.tar.Z

# tar -xvf lsf10.1_lsfinstall_linux_x86_64.tar.Z

2.2修改配置文件

echo 'LSF_TOP="/opt/lsf"
LSF_ADMINS="lsfadmin"
LSF_CLUSTER_NAME="DigitalChina-No.1"
LSF_MASTER_LIST="master"
LSF_TARDIR="/opt/lsf/"
LSF_ADD_SERVERS="node01 node02"'>>/opt/lsf/lsf10.1_lsfinstall/install.config 

2.3 安装(master节点)

./lsfinstall -f install.config

tips:期间需要输入几次选项  1

tips:仔细阅读安装过程的输出内容,安装完成后会生成一个lsf_quick_admin.html网页,后续步骤可以参考这个网页。

2.4 自动添加环境变量(所有节点)

echo “. /opt/lsf/conf/profile.lsf”>>/etc/profile

2.5 由于安装完默认集群间通过rsh通信,我们需要修改为ssh

echo "LSF_RSH=ssh" >> /opt/lsf/conf/lsf.conf

3启动集群并测试(所有节点)

3.1启动集群

lsadmin limstartup
lsadmin resstartup
badmin hstartup

[root@node01 ~]# lshosts
HOST_NAME      type    model  cpuf ncpus maxmem maxswp server RESOURCES
master       X86_64 Intel_E5  12.5     1   1.7G     2G    Yes (mg)
node01       X86_64 Intel_E5  12.5     1   1.7G     2G    Yes ()
node02       X86_64 Intel_E5  12.5     1   1.7G     2G    Yes ()

3.2测试

[root@node01 lsf10.1_lsfinstall]# bsub sleep 120
User permission denied. Job not submitted.
[root@node01 lsf10.1_lsfinstall]# su - lsfadmin
[lsfadmin@node01 ~]$ bsub sleep 120
Job <101> is submitted to default queue <normal>.
[lsfadmin@node01 ~]$ bsub sleep 130
Job <102> is submitted to default queue <normal>.
[lsfadmin@node01 ~]$ bsub sleep 140
Job <103> is submitted to default queue <normal>.
[lsfadmin@node01 ~]$ bsub sleep 150
Job <104> is submitted to default queue <normal>.
[lsfadmin@node01 ~]$ bsub sleep 160
Job <105> is submitted to default queue <normal>.
[lsfadmin@node01 ~]$ bsub sleep 170
Job <106> is submitted to default queue <normal>.
[lsfadmin@node01 ~]$ bjobs
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
101     lsfadmi RUN   normal     node01      node01      sleep 120  Nov 19 22:50
102     lsfadmi RUN   normal     node01      master      sleep 130  Nov 19 22:51
103     lsfadmi RUN   normal     node01      node02      sleep 140  Nov 19 22:51
104     lsfadmi PEND  normal     node01                  sleep 150  Nov 19 22:51
105     lsfadmi PEND  normal     node01                  sleep 160  Nov 19 22:51
106     lsfadmi PEND  normal     node01                  sleep 170  Nov 19 22:51

4 设置开机自启动(所有节点)

/opt/lsf/10.1/install/hostsetup –top=”/opt/lsf” –boot=”y”

参考文档链接:

LSF集群搭建笔记_weixin_44064258的博客-CSDN博客


IBM Spectrum LSF 10.1.0 – IBM Documentation



版权声明:本文为wo4owen原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。