节点 master ip 10.4.7.139
node01 ip 10.4.7.140
node02 ip 10.4.7.141
1 安装前准备(所有节点)
1.1关selinux, 关firewalld
sed -i "s/SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config systemctl stop firewalld systemctl disable firewalld
1.2配置主机名,静态解析(所有节点,注意)
hostnamectl set-hostname master[node01,node02] cat /etc/hosts 127.0.0.1 master[node01,node02] localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.4.7.139 master 10.4.7.140 node01 10.4.7.141 node02
1.3创建用户(所有节点)
useradd -m lsfadmin
1.4设置免密登录(所有节点)
ssh-keygen ssh-copy-id root@10.4.7.140
1.5创建共享目录(master节点)
mkdir /opt/lsf cat /etc/exports /opt/lsf 10.4.7.140(rw,async,no_root_squash) /opt/lsf 10.4.7.141(rw,async,no_root_squash)
1.6挂载共享目录(node01,node02)
showmount -e master [root@node01 ~]# echo "master:/opt/lsf /opt/lsf nfs defaults 0 0">>/etc/fstab [root@node01 ~]# mount -a [root@node02 ~]# echo "master:/opt/lsf /opt/lsf nfs defaults 0 0">>/etc/fstab [root@node02 ~]# mount -a
2安装(master节点)
2.1上传安装包到 /opt/lsf 并解压(master节点)
2.1.1解压社区版lsfsce10.2.0.6-x86_64.tar.gz
#pwd /opt/lsf # tar -zxvf lsfsce10.2.0.6-x86_64.tar.gz
2.1.2将解压的tar.Z文件移动到共享目录 /opt/lsf 下
#mv /opt/lsf/lsfsce10.2.0.6-x86_64/lsf/*.tar.Z /opt/lsf # ll -rw-rw-r-- 1 33209 10007 1138872309 Jun 15 2018 lsf10.1_linux2.6-glibc2.3-x86_64.tar.Z -rw-rw-r-- 1 33209 10007 118877581 Jun 15 2018 lsf10.1_lsfinstall_linux_x86_64.tar.Z
2.1.3解压lsf10.1_lsfinstall_linux_x86_64.tar.Z,但是不要解压lsf10.1_linux2.6-glibc2.3-x86_64.tar.Z
# tar -xvf lsf10.1_lsfinstall_linux_x86_64.tar.Z
2.2修改配置文件
echo 'LSF_TOP="/opt/lsf" LSF_ADMINS="lsfadmin" LSF_CLUSTER_NAME="DigitalChina-No.1" LSF_MASTER_LIST="master" LSF_TARDIR="/opt/lsf/" LSF_ADD_SERVERS="node01 node02"'>>/opt/lsf/lsf10.1_lsfinstall/install.config
2.3 安装(master节点)
./lsfinstall -f install.config
tips:期间需要输入几次选项 1
tips:仔细阅读安装过程的输出内容,安装完成后会生成一个lsf_quick_admin.html网页,后续步骤可以参考这个网页。
2.4 自动添加环境变量(所有节点)
echo ". /opt/lsf/conf/profile.lsf">>/etc/profile
2.5 由于安装完默认集群间通过rsh通信,我们需要修改为ssh
echo "LSF_RSH=ssh" >> /opt/lsf/conf/lsf.conf
3启动集群并测试(所有节点)
3.1启动集群
lsadmin limstartup lsadmin resstartup badmin hstartup
[root@node01 ~]# lshosts HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES master X86_64 Intel_E5 12.5 1 1.7G 2G Yes (mg) node01 X86_64 Intel_E5 12.5 1 1.7G 2G Yes () node02 X86_64 Intel_E5 12.5 1 1.7G 2G Yes ()
3.2测试
[root@node01 lsf10.1_lsfinstall]# bsub sleep 120 User permission denied. Job not submitted. [root@node01 lsf10.1_lsfinstall]# su - lsfadmin [lsfadmin@node01 ~]$ bsub sleep 120 Job <101> is submitted to default queue. [lsfadmin@node01 ~]$ bsub sleep 130 Job <102> is submitted to default queue . [lsfadmin@node01 ~]$ bsub sleep 140 Job <103> is submitted to default queue . [lsfadmin@node01 ~]$ bsub sleep 150 Job <104> is submitted to default queue . [lsfadmin@node01 ~]$ bsub sleep 160 Job <105> is submitted to default queue . [lsfadmin@node01 ~]$ bsub sleep 170 Job <106> is submitted to default queue . [lsfadmin@node01 ~]$ bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 101 lsfadmi RUN normal node01 node01 sleep 120 Nov 19 22:50 102 lsfadmi RUN normal node01 master sleep 130 Nov 19 22:51 103 lsfadmi RUN normal node01 node02 sleep 140 Nov 19 22:51 104 lsfadmi PEND normal node01 sleep 150 Nov 19 22:51 105 lsfadmi PEND normal node01 sleep 160 Nov 19 22:51 106 lsfadmi PEND normal node01 sleep 170 Nov 19 22:51
4 设置开机自启动(所有节点)
/opt/lsf/10.1/install/hostsetup --top="/opt/lsf" --boot="y"
参考文档链接:LSF集群搭建笔记_weixin_44064258的博客-CSDN博客
IBM Spectrum LSF 10.1.0 - IBM documentation



