情况一:
提交任务后,squeue显示如下,新的任务跑不了,老的任务还在跑。
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
53 debug G5_extra user PD 0:00 1 (Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions)
然后输入这个命令后就可以跑了:
scontrol update nodename=localhost.localdomain state=resume
情况二:
用squeue或sinfo都显示:
slurm_load_jobs error: Unable to contact slurm controller (connect failure)
重启mysql:
systemctl restart mysql
重启slurmctld,就可以了。
systemctl restart slurmctld
版权声明:本文为jack_magpie原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。