可以通过配置文件或 Java 命令行在缓存、文件系统、数据库等方面来配置 Cromwell。
java -Dconfig.file=/path/to/cromwell.conf cromwell.jar ...
具体命令行使用,参考之前的文章cromwell 命令行
配置文件按照[HOCON]
https://github.com/lightbend/config/blob/master/HOCON.md#hocon-human-optimized-config-object-notation
自定义配置文件cromwell.conf 的头部必须引用 application.conf
# include the application.conf at the top
include required(classpath("application"))
缓存(Call caching)配置
Call caching允许 Cromwell 检测过去是否运行过作业,因此不必重新计算结果。
个人认为, 在某些环境下,caching不起作用。
call-caching {
enabled = true
invalidate-bad-cache-results = true
}
-
enabled (default: false) 是否使用缓存,默认不使用缓存;true 表示在适当的时候,引用或复制以前运行的作业的结果;
-
invalidate-bad-cache-results (default: true) Cromwell 将使包含无法在缓存中访问的文件的任何缓存无效;
config {
filesystems {
local {
# When localizing a file, what type of file duplication should occur.
# possible values: "hard-link", "soft-link", "copy", "cached-copy".
# For more information check: https://cromwell.readthedocs.io/en/stable/backends/HPC/#shared-filesystem
localization: [
"hard-link", "soft-link", "copy"
]
caching {
# When copying a cached result, what type of file duplication should occur.
# possible values: "hard-link", "soft-link", "copy", "cached-copy".
# For more information check: https://cromwell.readthedocs.io/en/stable/backends/HPC/#shared-filesystem
# Attempted in the order listed below:
duplication-strategy: [
"hard-link", "soft-link", "copy"
]
# Possible values: md5, xxh64, fingerprint, path, path+modtime
# For extended explanation check: https://cromwell.readthedocs.io/en/stable/Configuring/#call-caching
# "md5" will compute an md5 hash of the file content.
# "xxh64" will compute an xxh64 hash of the file content. Much faster than md5
# "fingerprint" will take last modified time, size and hash the first 10 mb with xxh64 to create a file fingerprint.
# This strategy will only be effective if the duplication-strategy (above) is set to "hard-link", as copying changes the last modified time.
# "path" will compute an md5 hash of the file path. This strategy will only be effective if the duplication-strategy (above) is set to "soft-link",
# in order to allow for the original file path to be hashed.
# "path+modtime" will compute an md5 hash of the file path and the last modified time. The same conditions as for "path" apply here.
# Default: "md5"
hashing-strategy: "md5"
# When the 'fingerprint' strategy is used set how much of the beginning of the file is read as fingerprint.
# If the file is smaller than this size the entire file will be read.
# Default: 10485760 (10MB).
fingerprint-size: 10485760
# When true, will check if a sibling file with the same name and the .md5 extension exists, and if it does, use the content of this file as a hash.
# If false or the md5 does not exist, will proceed with the above-defined hashing strategy.
# Default: false
check-sibling-md5: false
}
}
}
}
-
localization/duplication-strategy wdl语言是基于java开发的,需要提前声明变量类型,需要说明的是 File 类型,用来定义一个文件。cromwell 执行时,会提前将File类型的文件copy到每个任务(Task)的执行目录。copy的方式有3种hard-link(硬链接)、 soft-link(软连接)、copy(直接复制),处理缓存(caching)也类似。
-
hashing-strategy 判断文件的hash值,分md5/xxh64/fingerprint/path/path+modtime 几种
md5 计算文件的md5 hash,一个文件仅仅对应一个唯一的md5值,没有重复,缺点是速度慢
xxh64 计算文件hash的另一种算法,速度比md5块
fingerprint 计算last modified time, size and hash the first 10 mb(fingerprint-size) with xxh64 to create a file fingerprint.
流程日志workflow-options {
workflow-log-dir = "cromwell-workflow-logs"
workflow-log-temporary = true
}
shell 配置
cromewll 默认使用bash, 而很多docker没有安装bash,所以可以指定sh运行
# -Dsystem.job-shell 设置所有后台系统, 使用sh java -Dsystem.job-shell=/bin/sh -Dconfig.file=/path/to/cromwell.conf cromwell.jar ... # -Dbackend.providers.Local.config.job-shell=/bin/sh 设置特定后台系统,使用sh数据库配置
cromwell 命令行(run)运行,不需要配置数据库,web server版本需要,在后面的内容中会重点讲这一部分。
更多内容,请关注本公众号,持续更新…



