栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Java

cromwell-配置集群/云作业管理系统

Java 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

cromwell-配置集群/云作业管理系统

cromwell-配置集群/云作业管理系统

cromwell 不仅支持本地计算机任务调度,同时支持集群/云计算作业管理系统,只需要进行简单配置,就可以实现大规模计算。

配置文件

与前文提到的cromwell 命令行配置一样,cromwell进行集群运算也需要进行文件配置。官方针对不同的集群/云作业管理系统提供了相关的配置文件(https://github.com/broadinstitute/cromwell/tree/develop/cromwell.example.backends),但是本质都是讲调度命令嵌入其中,下面我们以大家常用的SGE与Docker作业调度系统为例,进行介绍。

作业调度系统的配置文件并非完整的配置文件,必须添加到 https://github.com/broadinstitute/cromwell/blob/develop/cromwell.example.backends/cromwell.examples.conf 的 backend 部分

SGE
# This is an example of how you can use the the Sungrid Engine backend
# for Cromwell. *This is not a complete configuration file!* The
# content here should be copy pasted into the backend -> providers section
# of cromwell.example.backends/cromwell.examples.conf in the root of the repository.
# You should uncomment lines that you want to define, and read carefully to customize
# the file. If you have any questions, please open an issue at
# https://www.github.com/broadinstitute/cromwell/issues

# documentation:
# https://cromwell.readthedocs.io/en/stable/backends/SGE

backend {
# 选择默认的providers, 名字需要与 providers 内部的名字一致
  default = SGE

  providers {
  # 配置SGE providers 
    SGE {
    
   # 所有调度系统都是在该配置文件的基础上进行
      actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
      # 具体的配置文件
      config {
        
        # Limits the number of concurrent jobs, 主要针对cromwell server 设计
        concurrent-job-limit = 5

        # If an 'exit-code-timeout-seconds' value is specified:
        # - check-alive will be run at this interval for every job
        # - if a job is found to be not alive, and no RC file appears after this interval
        # - Then it will be marked as Failed.
        # Warning: If set, Cromwell will run 'check-alive' for every job at this interval
        # 多长时间检查(check-alive)一次任务状态,默认是120s
        exit-code-timeout-seconds = 120

        # 运行环境的属性, 需要与 task/workflow 中的 `runtime` 模块属性 以及 本文submit 命令中的变量 一致
        # 同时说明,可以通过修改配置文件,任意修改 task/workflow 中的 `runtime` 模块属性 满足个性化需求
        runtime-attributes = """
        Int cpu = 1
        Float? memory_gb
        String? sge_queue
        String? sge_project
        """

# submit/kill/check-alive 对调度系统中对应命令进行封装,将cromwell变量嵌入其中

# submit 命令中提到的 jobs_name, cwd, out, err, jobs_id 均为cromwell内置变量
# 其他变量需要提前在 runtime-attributes 中声明

        submit = """
        qsub 
        -terse 
        -V 
        -b y 
        -N ${job_name} 
        -wd ${cwd} 
        -o ${out}.qsub 
        -e ${err}.qsub 
        -pe smp ${cpu} 
        ${"-l mem_free=" + memory_gb + "g"} 
        ${"-q " + sge_queue} 
        ${"-P " + sge_project} 
        /usr/bin/env bash ${script}
        """

        kill = "qdel ${job_id}"
        check-alive = "qstat -j ${job_id}"
        job-id-regex = "(\d+)"
      }
    }
}

docker
dockerRoot=/cromwell-executions
backend {
  default = Docker

  providers {

    # Example backend that _only_ runs workflows that specify docker for every command.
    Docker {
      actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
      config {
        run-in-background = true
        runtime-attributes = "String docker"
        # 嵌入 docker 的运行命令
        # docker_cwd 通过 dockerRoot(默认 /cromwell-executions) 设置, 与当前目录(${cwd})下 ./cromwell-executions 相对应
        submit-docker = "docker run --rm -v ${cwd}:${docker_cwd} -i ${docker} /bin/bash < ${docker_script}"
      }
    }
}
SGE + docker

SGE + docker 可能是目前生物信息学分析过程中常见的组合配置,但是官方仅有单独的SGE配置,单独的docker配置,并没有SGE+docker的配置。我单独做了一个完整的配置可以直接在命令行中使用。

# cromwell.sge.docker.config
# 完整配置文件
include required(classpath("application"))

backend {
  default = SGE_Docker

  providers {
    SGE_Docker {
      actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
      config {

        # Limits the number of concurrent jobs
        concurrent-job-limit = 500

        # If an 'exit-code-timeout-seconds' value is specified:
        # - check-alive will be run at this interval for every job
        # - if a job is found to be not alive, and no RC file appears after this interval
        # - Then it will be marked as Failed.
        # Warning: If set, Cromwell will run 'check-alive' for every job at this interval

        # exit-code-timeout-seconds = 120

        # `script-epilogue` configures a shell command to run after the execution of every command block.
        #
        # If this value is not set explicitly, the default value is `sync`, equivalent to:
        # script-epilogue = "sync"
        #
        # To turn off the default `sync` behavior set this value to an empty string:
        # script-epilogue = ""

        script-epilogue = "sync && sleep 8 "

        # 运行环境的属性, 需要与 task/workflow 中的 `runtime` 模块属性 以及 本文submit 命令中的变量 一致
        # 同时说明,可以通过修改配置文件,任意修改 task/workflow 中的 `runtime` 模块属性 满足个性化需求
        runtime-attributes = """
        String docker
        String? root = '/'
        Int? cpu = 1
        Int? memory_gb = 2
        String? sge_queue
        """

        # submit/kill/check-alive 对调度系统中对应命令进行封装,将cromwell变量嵌入其中
        # submit 命令中提到的 jobs_name, cwd, out, err, jobs_id 均为cromwell内置变量
        # 其他变量需要提前在 runtime-attributes 中声明
        submit-docker  = """
        qsub 
        -terse 
        -V 
        -b y 
        -N ${job_name} 
        -wd ${cwd} 
        -o ${out}.qsub 
        -e ${err}.qsub 
        -l vf=${memory_gb}G 
        ${"-pe smp " + cpu} 
        ${"-q " + sge_queue} 
        docker run --rm  --user $(id -u):$(id -g) -a STDERR -v ${root}:${root} ${docker}  /usr/bin/env bash ${docker_script}
        """
        # docker 运行变量,本身是有 -v ${cwd}:${docker_cwd} 的文件夹映射
        #  docker_cwd 通过 dockerRoot(默认 /cromwell-executions) 设置, 与当前目录(${cwd})下 ./cromwell-executions 相对应, 比如 
        # -v /your/current_work_path/cromwell-executions/TestHelloWorld/3888ab6f-4dcf-4d03-8d84-17ee5623c2bb/call-Hel
loWorld/shard-0:/cromwell-executions/TestHelloWorld/3888ab6f-4dcf-4d03-8d84-17ee5623c2bb/call-HelloWorld/shard-0
        # 这时候比较麻烦的就是,如果你有以绝对路径表示的输出文件,那么docker容器内部的路径名与外部路径不一致,就会造成混乱
        # 所以,我一般会去掉-v ${cwd}:${docker_cwd} , 引入 root 变量,把docker 容器内部、外部的路径打通,方便文件读取、写入
        # 需要注意的是,root 的目录的层级,很明显 根目录/ 最简单,但是权限很高,风险也就高 

        kill = "qdel ${job_id}"
        check-alive = "qstat -j ${job_id}"
        job-id-regex = "(\d+)"

        kill-docker = "qdel ${job_id}"
        check-alive-docker = "qstat -j ${job_id}"
        job-id-regex-docker = "(\d+)"
      }
    }
}
}
docker.hash-lookup.enabled = false

将上述文件cromwell.sge.docker.config作为参数输入,即可使用cromwell调度sge+docker

java -Dconfig.file=cromwell.sge.docker.config cromwell.jar run ...

注意,如果计算节点没有docker image, cromwell 可以自动pull, 如果docker hub 也没有,需要手动 docker build

其他调度系统配置文件 Cloud Providers(云计算配置)
  • AWS(AWS.conf): Amazon Web Services (https://cromwell.readthedocs.io/en/stable/tutorials/AwsBatch101/)
  • BCS(BCS.conf) Alibaba Cloud Batch Compute (BCS) backend (https://cromwell.readthedocs.io/en/stable/backends/BCS/)
  • TES(TES.conf) is a backend that submits jobs to a server with protocol defined by GA4GH (https://cromwell.readthedocs.io/en/stable/backends/TES/)
  • PAPIv2(PAPIv2.conf): Google Pipelines API backend (version 2!) (https://cromwell.readthedocs.io/en/stable/backends/Google/)
Containers
  • Docker(Docker.conf): an example backend that only runs workflows with docker in every command
  • Singularity(singularity.conf): run Singularity containers locally (documentation)
  • Singularity+Slurm(singularity.slurm.conf): An example using Singularity with SLURM (documentation)
  • TESK(TESK.conf) is the same, but intended for Kubernetes. See the TES docs at the bottom.
  • udocker(udocker.conf): to interact with udocker locally documentation
  • udocker+Slurm(udocker.slurm.conf): to interact with udocker on SLURM (documentation)
Workflow Managers
  • HtCondor(HtCondor.conf): a workload manager at UW-Madison (https://cromwell.readthedocs.io/en/stable/backends/HTcondor/)
  • LSF(LSF.conf): the Platform Load Sharing Facility backend(https://cromwell.readthedocs.io/en/stable/backends/LSF/)
  • SGE(SGE.conf): a backend for Sungrid Engine (https://cromwell.readthedocs.io/en/stable/backends/SGE)
  • slurm(slurm.conf): SLURM workload manager (https://cromwell.readthedocs.io/en/stable/backends/SLURM/)
Custom
  • LocalExample: What you should use if you want to define a new backend provider (documentation)
转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/298579.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号