GATK4 & Cromwell workflow¶
GATK4 & Cromwell workflow is only for Institute of Genomics.
Important
This guide is for existing Cromwell users. If you want to start using Cromwell, please, write to support@hpc.ut.ee .
Introduction to WDL workflows¶
Workflow components¶
<workflow_name>.inputs.json
- an input file, it contains paths for all input file and references needed to execute the workflow. Here is a inputs.json example for a test sample .
<workflow_name>.option.json
- an option file, the main purpose of this file to specify the output directory of the workflow. Here is a option.json example for a test sample .
Note
JavaScript Object Notation (JSON) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute—value pairs and arrays. JSON objects are for transferring data between server and client. JSON examples Ream more on Wikipedia .
Workflow tools¶
-
Cromwell server
— a workflow execution engine that runs WDL workflows on the UTHPC cluster. Cromwell server talks to Slurm and handles jobs on its own. Cromwell github page and docs .Note
For Institute of Genomics, Cromwell server is run under UTHPC management.
-
cromshell
- is a command-line tool to talk to Cromwell server.Note
Automation tools are specific for a particular workflow
Make sure to use the correct ones.
- WholeGenomeGermlineSingleSample (WGS) .
createInputWGS.py
- an automation script that creates ’inputs.json’ and ’option.json’ for all samples into a given dir. For details look createInputWGS.py .submit-WGS-batch.sh
- an automation script that utilisescromshell
for an easy submission of workflows. For details look submit-WGS-batch.sh .
- VariantCalling (VC) .
createVCinput.py
- an automation script thatvc_output
dir of a sample name dir and ’inputs.json’ and ’option.json’ for all specified samples. For details look createVCinput.pysubmitVC-batch
- an automation script that utilisescromshell
for an easy submission of workflows. For details look submitVC-batch
- WholeGenomeGermlineSingleSample (WGS) .
Basic steps to run a workflow¶
- Create
inputs.json
andoption.json
files for each sample. - Submit the workflow for a sample or more. It tells you all samples to run and workflow ID for each sample. workflow ID looks like this
{"id":"xxxxxxxx-xxxx-xxxx-xxxx","status":"Submitted"}
. - Check if the workflow has finished with
cromshell list -u
.
GATK4 & Cromwell module¶
Available GATK4 and Cromwell modules versions:
- GATK4: 4.2.0.0 module
any/gatk4
. - Cromwell: 53.1, 63.1 module
any/cromwell/63.1
.
Important
There is no need to load the gatk4/cromwell
modules to interact with the Cromwell server. You're required to load only cromwell-tools
module. Read more below
module load any/cromwell-tools
Cromwell¶
Cromwell server¶
UTHPC has a Singularity Cromwell server and ordinary Cromwell server. They have different ’CROMWELL_URL’, so be careful when choosing the ’CROMWELL_URL’. However, you can always change CROMWELL_URL later .
- Native Cromwell server has
CROMWELL_URL:172.17.63.1:15000
and runs WGS pipeline and VC pipeline . - Singularity Cromwell has
CROMWELL_URL:172.17.63.3:14000
and runs MoChA pipeline
Configure cromshell¶
To interact with Cromwell server use cromshell
command. To start using it, you have to configure it first. You need to do the configuration only once.
module load any/cromwell-tools
Depending on what pipeline you intend to run, you chose your Cromwell server. For running WholeGenomeGermlineSingleSample and VariantCalling pipelines, select ’Native Cromwell’. For MoChA pipeline, choose ’Singularity Cromwell’. Insert the appropriate ’CROMWELL_URL’ when cromshell prompts for it.
Native Cromwell | Singularity Cromwell | |
---|---|---|
CROMWELL_URL | 172.17.63.1:15000 | 172.17.63.3:14000 |
Executable Pipelines | WGS & VariantCalling | MoChA |
To connect your cromshell to GI Cromwell server, you should enter the details as stated below:
- Run
cromshell
command. - Enter the following info on a prompt:
- Cromwell URL:
<insert the required CROMWELL_URL>
. - Confirmation:
yes
.
- Cromwell URL:
Here is an example of setting up for Native Cromwell:
cromshell
Welcome to Cromshell
What is the URL of the Cromwell server you use to execute
your jobs?
Cromwell URL, please: 172.17.63.1:15000
Oh my! It looks like your server is: 172.17.63.1:15000
Is this correct [Yes/No] : yes
OK. Im now setting your Cromwell server to be: 172.17.63.1:15000
Dont worry - this wont hurt a bit...DONE
OK, you should be ALL SET!
Interacting with Cromwell server¶
To interact with Cromwell server use cromshell
command. To start using it, you have to configure it first. You need to do the configuration only once.
module load any/cromwell-tools
Cromshell usage examples¶
Note
To learn how to submit a workflow, please read 'Running your own workflow' section of the pipeline description: WGS , VC and MoChA
For further interaction with Cromwell, here are useful cromshell examples:
# abort a workflow
cromshell abort <workflow id>
# check the last workflow status
cromshell status
# Display a list of jobs submitted through cromshell
cromshell list -c
# Check completion status of all unfinished jobs
cromshell list -u
# Check metadata
cromshell -t 20 metadata <workflow id>
Change CROMWELL_URL¶
To change an already set ’CROMWELL_URL’, you have to edit cromshell configuration file located in your $HOME
. You can comment out the existing CROMWELL_URL and insert the other one on the second line.
vim ~/.cromshell/cromwell_server.config
Workflows¶
Tested and optimised workflows on UTHPC cluster:
- WholeGenomeGermlineSingleSample (WGS)
- VariantCalling (reduced WGS version)
- MoChA WDL pipeline
WholeGenomeGermlineSingleSample¶
All workflow parameters and procedures have been pre-optimised for running on UTHPC cluster.
You can view an NA12878 example output produced by the pipeline at the following path:
/gpfs/space/software/gatk4_pipeline/cromwell/workflows/test_catalogue_wgs/wgs_output
Read more:
- The GitHub release used: WholeGenomeGermlineSingleSample v2.3.2
- WGS Broad institute docs: WGS Overview
- WGS Methods link
Variant calling¶
Variant Calling pipeline is a reduced version of WholeGenomeGermlineSingleSample pipeline.
The pipeline takes an aligned CRAM with index as an input. The pipeline's output includes GVCF containing variant calls with an corresponding index.
MoChA WDL pipeline¶
The pipeline runs under Singularity on UTHPC cluster.
Warning
MoChA pipeline runs under Singularity Cromwell, make sure your cromshell
has been setup correctly. The instruction on how to setup cromshell or control ’CROMWELL_URL’ are here
Read more about: MoChA WDL pipeline
WholeGenomeGermlineSingleSample (WGS) manual¶
Running a test WGS workflow¶
This test workflow bases on public test sample for gatk4 NA12878_20k. First, load Cromwell-tools module:
module load any/cromwell-tools
To run a test workflow, you have to copy the test directory to your project dir:
cp -R /gpfs/software/soft/manual/any/cromwell/workflows/test_catalogue_wgs /gpfs/hpc/projects/egv_hg38/wgs_output/
Warning
Make sure to configure cromshell before continuing with the next steps.
Create ’inputs’ and ’option’ files for workflow submission. From inside your sample directory, run createInputWGS.py
script. It creates wgs_output
dir inside the sample dir and two files WGGSS_NA12878_20k.inputs.json
and WGGSS_NA12878_20k.option.json
.
Here is how to do it:
#Go inside the sample catalogue
cd /gpfs/hpc/projects/egv_hg38/wgs_output/test_catalogue_wgs
#From inside test_catalogue_wgs
createInputWGS.py -i ./
Output:
CATALOGUE PATH: /path/to/sample/catalogue/
WGS Input created: /path/to/sample/catalogue/wgs_output/test_catalogue_wgs/NA12878_20k/WGGSS_NA12878_20k.inputs.json
WGS Options created: /path/to/sample/catalogue/wgs_output/test_catalogue_wgs/NA12878_20k/WGGSS_NA12878_20k.option.json
Submit the workflow for sample ’NA12878_20k’
submit-WGS-batch.sh NA12878_20k | tee -a cromshell_logs
Output:
Workflow will run only for samples: NA12878_20k
WGGSS workflow is submitted for: NA12878_20k
sample_dir: /gpfs/hpc/projects/egv_hg38/wgs_output/test_catalogue_wgs/NA12878_20k
Sub-Command: submit
Submitting job to server: 172.17.63.1:15000
{"id":"xxxxxxxx-xxxx-xxxx-xxxx","status":"Submitted"}
# check the last workflow status
cromshell status
# Display a list of jobs submitted through cromshell
cromshell list -c
# Check completion status of all unfinished jobs
cromshell list -u
Running your own workflow¶
Below is example, how to run your own workflow.
-
Go inside your sample directory with all samples
Warning
If you don't have write permission inside the sample directory, look if
wgs_output
dir exists already and you have write permission, if you don't, ask the owner to createwgs_output
dir for you and give you permissions forwgs_output
dir. -
Run
createInputWGS.py -i ./
from inside your sample directory. It createswgs_output
in the sample directory, if it doesn't exist already, and sample directories withinputs.json
&option.json
files for all samples insidewgs_output
.<sample-catalogue>/wgs_output/<sample-name>
is location of all output files after the workflow finishes.Warning
Make sure to configure cromshell before continuing with the next steps.
-
Submit a batch of samples with
submit-WGS-batch.sh
script. Please, ensure thatsample_name
has no/
character at the end.submit-WGS-batch.sh
takes ’sample_names’ not directories.Submission example:
submit-WGS-batch.sh <sample_name1> <sample_name2> <sample_name3> <sample_name4> | tee -a cromshell_logs
Important
cromshell_logs
is an important file to identify sample name and workflow ID later, in case you have several samples to submit.
List of commands to see the submission status:
# check the last workflow status
cromshell status
# Display a list of jobs submitted through cromshell
cromshell list -c
# Check completion status of all unfinished jobs
cromshell list -u
Automation tools¶
Create inputs.json & option.json¶
createInputWGS.py
is a python script looks for wgs_output
dir in the sample directory and creates a sample dir inside wgs_output
, which contains inputs.json
and option.json
for a given sample. The tool assumes that samples are in a directory of samples as shown below. Please, note submit-WGS-batch.sh
also relies on the following structure.
Here is a structure example of a sample directory:
samples-catalogue/
├── V00000
│ ├── H7G2M.1.bam
│ ├── H7G2M.2.bam
│ ├── H7G2M.3.bam
│ ├── H7G2M.4.bam
│ ├── H7G2M.5.bam
│ ├── H7G2M.6.bam
│ └── H7G2M.7.bam
└── V00001
├── H1G1M.1.bam
├── H1G1M.2.bam
├── H1G1M.3.bam
├── H1G1M.4.bam
├── H1G1M.5.bam
├── H1G1M.6.bam
└── H1G1M.7.bam
Usage example:
#run the tool
createInputWGS.py -i <path/to/catalogue/of/samples>
# help
createInputWGS.py -h
Sample directory after running createInputWGS.py
:
samples-catalogue/
├── V00000
│ ├── H7G2M.1.bam
│ ├── H7G2M.2.bam
│ ├── H7G2M.3.bam
│ ├── H7G2M.4.bam
│ ├── H7G2M.5.bam
│ ├── H7G2M.6.bam
│ └── H7G2M.7.bam
├── V00001
│ ├── H1G1M.1.bam
│ ├── H1G1M.2.bam
│ ├── H1G1M.3.bam
│ ├── H1G1M.4.bam
│ ├── H1G1M.5.bam
│ ├── H1G1M.6.bam
│ └── H1G1M.7.bam
└── wgs_output
├── V00000
│ ├── WGGSS_V00000.inputs.json
│ └── WGGSS_V00000.option.json
└── V00001
├── WGGSS_V00001.inputs.json
└── WGGSS_V00001.option.json
Submit several samples¶
submit-WGS-batch.sh
is a bash script that submits several samples into Cromwell server. It must run from the root of the sample directory.
Note
It's recommended to use | tee -a cromshell_logs
seen in the example below as it appends info to cromshell_logs
, so it would be possible to match a ’workflowID’ with a sample name.
cd <path/to/catalogue/of/samples>
submit-WGS-batch.sh <V00000> <V00001> ... | tee -a cromshell_logs
Examples wgs.inputs.json and wgs.option.json¶
wgs.inputs.json¶
{
"WholeGenomeGermlineSingleSample.sample_and_unmapped_bams": {
"sample_name": "NA12878_20k",
"base_file_name": "NA12878_20k",
"flowcell_unmapped_bams": [
"/gpfs/space/home/<user>/test_catalogue_wgs/NA12878_20k/NA12878_A.bam",
"/gpfs/space/home/<user>/test_catalogue_wgs/NA12878_20k/NA12878_B.bam",
"/gpfs/space/home/<user>/test_catalogue_wgs/NA12878_20k/NA12878_C.bam"
],
"final_gvcf_base_name": "NA12878_20k",
"unmapped_bam_suffix": ".bam"
},
"WholeGenomeGermlineSingleSample.references": {
"contamination_sites_ud": "/gpfs/hpc/databases/broadinstitute/gpc-public-data--broad-references/hg38/v0/contamination-resources/1000g/1000g.phase3.100k.b38.vcf.gz.dat.UD",
"contamination_sites_bed": "/gpfs/hpc/databases/broadinstitute/gpc-public-data--broad-references/hg38/v0/contamination-resources/1000g/1000g.phase3.100k.b38.vcf.gz.dat.bed",
"contamination_sites_mu": "/gpfs/hpc/databases/broadinstitute/gpc-public-data--broad-references/hg38/v0/contamination-resources/1000g/1000g.phase3.100k.b38.vcf.gz.dat.mu",
"calling_interval_list": "/gpfs/hpc/databases/broadinstitute/gpc-public-data--broad-references/hg38/v0/wgs_calling_regions.hg38.interval_list",
"reference_fasta": {
"ref_dict": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.dict",
"ref_fasta": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.fasta",
"ref_fasta_index": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.fasta.fai",
"ref_alt": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.fasta.64.alt",
"ref_sa": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.fasta.64.sa",
"ref_amb": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.fasta.64.amb",
"ref_bwt": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.fasta.64.bwt",
"ref_ann": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.fasta.64.ann",
"ref_pac": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.fasta.64.pac"
},
"known_indels_sites_vcfs": [
"/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz",
"/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz"
],
"known_indels_sites_indices": [
"/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi",
"/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi"
],
"dbsnp_vcf": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf",
"dbsnp_vcf_index": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx",
"evaluation_interval_list": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/wgs_evaluation_regions.hg38.interval_list",
"haplotype_database_file": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.haplotype_database.txt"
},
"WholeGenomeGermlineSingleSample.scatter_settings": {
"haplotype_scatter_count": 10,
"break_bands_at_multiples_of": 100000
},
"WholeGenomeGermlineSingleSample.wgs_coverage_interval_list": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/wgs_coverage_regions.hg38.interval_list",
"WholeGenomeGermlineSingleSample.papi_settings": {
"preemptible_tries": 3,
"agg_preemptible_tries": 3
}
}
wgs.option.json¶
{
"final_workflow_outputs_dir": "/gpfs/space/home/<user>/test_catalogue_wgs/wgs_output/NA12878_20k",
"use_relative_output_paths": "true"
}
VariantCalling manual¶
Running a test VariantCalling workflow¶
This test workflow bases on crams of the public test sample for gatk4 NA12878_20k.
Warning
Make sure to correctly configure cromshell . Check CROMWELL_URL=172.17.63.1:15000
by issuing cromshell -h
command.
First, load Cromwell-tools module:
module load any/cromwell-tools
To run a test workflow, you have to copy the test directory to your project dir (Make sure that gvaramu has write permission to the dir):
cp -R /gpfs/space/software/gatk4_pipeline/cromwell/workflows/test_catalogue_wgs/NA12878_20k_cram </path/to/project/dir>
Create inputs
and option
files for workflow submission. From inside your sample directory containing crams, run createVCinput.py
script. It creates vc_output
dir with NA12878_20k sample dir and two files vc_NA12878_20k.inputs.json
and vc_NA12878_20k.option.json
.
Here is how to do it:
#Go inside the sample catalogue
cd </path/to/project/dir/>NA12878_20k_cram
#From inside NA12878_20k_cram
createVCinput.py NA12878_20k
SAMPLES ['NA12878_20k']
CATALOGUE PATH: </path/to/project/dir/>/NA12878_20k_cram/
vc_output dir created in CATALOGUE PATH
VC Input created: </path/to/project/dir/>/NA12878_20k_cram/vc_output/NA12878_20k/vc_NA12878_20k.inputs.json
VC Options created: </path/to/project/dir/>/NA12878_20k_cram/vc_output/NA12878_20k/vc_NA12878_20k.option.json
Submit the workflow for sample NA12878_20k
submitVC-batch NA12878_20k | tee -a cromshell_logs
Workflow will run only for samples: NA12878_20k
WGGSS workflow is submitted for: NA12878_20k
sample_dir: </path/to/project/dir/>/NA12878_20k_cram
Sub-Command: submit
Submitting job to server: 172.17.63.1:15000
{"id":"xxxxxxxx-xxxx-xxxx-xxxx","status":"Submitted"}
Here is a list of commands to see the submission status:
# check the last workflow status
cromshell status
# Display a list of jobs submitted through cromshell
cromshell list -c
# Check completion status of all unfinished jobs
cromshell list -u
Running your own VC workflow¶
Please, read the Running a test VariantCalling section as it provides more exhaustive instructions.
Warning
Make sure to correctly configure cromshell . Check CROMWELL_URL=172.17.63.1:15000
by issuing cromshell -h
command.
-
Go inside your sample directory, witch is directory with all samples.
Warning
If you don't have write permission inside the sample directory, look if
vc_output
dir exists already and you have write permission, if you don't, ask the owner to createvc_output
dir for you and give you permissions forvc_output
dir. -
Run
createVCinput.py <sample name>
from inside your sample directory. It createsvc_output
in the sample directory, if it doesn't already exist, and sample directories withinputs.json
&option.json
files for all samples insidevc_output
.<sample-catalogue>/vc_output/<sample-name>
is the location of all output files after workflow finishes. -
Submit a batch of samples with
submitVC-batch
command. Please, ensure that ’sample_names’ have no extension like.cram
.submitVC-batch
takes ’sample_names’ not directories. Submission example:submitVC-batch <sample_name1> <sample_name2> <sample_name3> <sample_name4> | tee -a cromshell_logs
Important
cromshell_logs
is an important file to identify sample name and workflow ID later, in case you have several samples to submit.
Here is a list of commands to see the submission status:
# check the last workflow status
cromshell status
# Display a list of jobs submitted through cromshell
cromshell list -c
# Check completion status of all unfinished jobs
cromshell list -u
VC automation tools¶
Create vc.inputs.json and vc.option.json¶
createVCinput.py
is a python script creates a sample dir containing vc.inputs.json
and vc.option.json
for a given sample inside vc_output
. The tool assumes that sample crams are in the directory of samples as shown below.
Here is a structure example of a sample directory:
samples-catalogue/
├── V00000.cram
├── V00000.cram.crai
├── V00001.cram
└── V00001.cram.crai
# submit a sample
createVCinput.py <sample1> <sample2> ... <samplen>
# help
createVCinput.py -h
Sample directory after running createVCinput.py
samples-catalogue/
├── V00000.cram
├── V00000.cram.crai
├── V00001.cram
├── V00001.cram.crai
└── vc_output
├── V00000
│ ├── vc_V00000.inputs.json
│ └── vc_V00000.option.json
└── V00001
├── vc_V00001.inputs.json
└── vc_V00001.option.json
Submit samples to VC workflow¶
submitVC-batch
is a bash script that submits several samples into Cromwell server. It takes care of defining all the variables required for running the VC workflow, so you have to provide only the sample name. It must run from the root of the sample directory.
Note
It's recommended to use | tee -a cromshell_logs
seen in the example below as it appends info to cromshell_logs
, so it would be possible to match a workflowID with a sample name.
Usage example:
cd <path/to/catalogue/of/samples>
submitVC-batch <V00000> <V00001> ... | tee -a cromshell_logs
Examples vc.inputs.json and vc.option.json¶
vc.inputs.json¶
{
"WholeGenomeGermlineSingleSample.sample_and_mapped_crams": {
"sample_name": "NA12878_20k",
"base_file_name": "NA12878_20k",
"final_gvcf_base_name": "NA12878_20k",
"input_cram": "<path/to/test/catalogue>/NA12878_20k_cram/NA12878_20k.cram",
"input_cram_index": "<path/to/test/catalogue>/NA12878_20k_cram/NA12878_20k.cram.crai"
},
"WholeGenomeGermlineSingleSample.references": {
"contamination_sites_ud": "/gpfs/hpc/databases/broadinstitute/gpc-public-data--broad-references/hg38/v0/contamination-resources/1000g/1000g.phase3.100k.b38.vcf.gz.dat.UD",
"contamination_sites_bed": "/gpfs/hpc/databases/broadinstitute/gpc-public-data--broad-references/hg38/v0/contamination-resources/1000g/1000g.phase3.100k.b38.vcf.gz.dat.bed",
"contamination_sites_mu": "/gpfs/hpc/databases/broadinstitute/gpc-public-data--broad-references/hg38/v0/contamination-resources/1000g/1000g.phase3.100k.b38.vcf.gz.dat.mu",
"calling_interval_list": "/gpfs/hpc/databases/broadinstitute/gpc-public-data--broad-references/hg38/v0/wgs_calling_regions.hg38.interval_list",
"reference_fasta": {
"ref_dict": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.dict",
"ref_fasta": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.fasta",
"ref_fasta_index": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.fasta.fai",
"ref_alt": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.fasta.64.alt",
"ref_sa": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.fasta.64.sa",
"ref_amb": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.fasta.64.amb",
"ref_bwt": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.fasta.64.bwt",
"ref_ann": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.fasta.64.ann",
"ref_pac": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.fasta.64.pac"
},
"known_indels_sites_vcfs": [
"/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz",
"/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz"
],
"known_indels_sites_indices": [
"/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi",
"/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi"
],
"dbsnp_vcf": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf",
"dbsnp_vcf_index": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx",
"evaluation_interval_list": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/wgs_evaluation_regions.hg38.interval_list",
"haplotype_database_file": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.haplotype_database.txt"
},
"WholeGenomeGermlineSingleSample.scatter_settings": {
"haplotype_scatter_count": 10,
"break_bands_at_multiples_of": 100000
},
"WholeGenomeGermlineSingleSample.wgs_coverage_interval_list": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/wgs_coverage_regions.hg38.interval_list",
"WholeGenomeGermlineSingleSample.papi_settings": {
"preemptible_tries": 3,
"agg_preemptible_tries": 3
}
}
vc.option.json¶
{
"final_workflow_outputs_dir": "<path/to/test/catalogue>/NA12878_20k_cram/vc_output/NA12878_20k",
"use_relative_output_paths": "true"
}
MoChA manual¶
Warning
Make sure to correctly configure cromshell . Check CROMWELL_URL=172.17.63.3:14000
by issuing cromshell -h
command.
Running a test Illumina example¶
Please, read GitHub page for Illumina Example and download all necessary files to running the Illumina example. GitHub Illumina example
-
Create/change
illumina_example.json
andillumina.option.json
to the following examples below. Substitute paths.illumina_example.json
example:{ "mocha.sample_set_id": "hapmap370k", "mocha.mode": "idat", "mocha.realign": true, "mocha.max_win_size_cm": 300.0, "mocha.overlap_size_cm": 5.0, "mocha.ref_name": "GRCh38", "mocha.ref_path": "</path/to/downloaded/>GRCh38", "mocha.manifest_path": "</path/to/downloaded/>manifests", "mocha.data_path": "</path/to/downloaded/>idats", "mocha.batch_tsv_file": "</path/to/downloaded/>tsvs/hapmap370k.batch.tsv", "mocha.sample_tsv_file": "</path/to/downloaded/>tsvs/hapmap370k.sample.tsv", "mocha.ped_file": "</path/to/downloaded/>hapmap370k.ped", "mocha.docker_registry": "/tmp/singularity_img/", "mocha.do_not_check_bpm": true }
Note
Make sure that gvaramu group has write permission to the project dir and workflow output dir specified in
option.json
.chmod 775 <dir>
illumina.option.json
example:{ "final_workflow_outputs_dir": "</path/to/upload/the/output>", "use_relative_output_paths": "true" }
-
Load Cromwell-tools module:
module load any/cromwell-tools
-
Submit the workflow to Cromwell.
$MOCHA
variable loaded with the Cromwell-tools module and it defines a path tomocha.wdl
tested on UTHPC cluster.cromshell submit $MOCHA illumina_example.json options.json
Running your own MoChA workflow¶
Warning
Make sure to correctly configure cromshell . Check CROMWELL_URL=172.17.63.3:14000
by issuing cromshell -h
command.
- Create
inputs.json
similar toillumina_example.json
from ’Running a test Illumina Example’ section. - Create
options.json
from the example below and replace the output dir path{ "final_workflow_outputs_dir": "</path/to/upload/the/output>", "use_relative_output_paths": "true" }
- Load the Cromwell-tools and Submit the workflow
module load any/cromwell-tools cromshell submit $MOCHA <your/inputs>.json <your/options>.json
Imputation pipeline¶
Warning
Make sure to correctly configure cromshell . Check CROMWELL_URL=172.17.63.3:14000
by issuing cromshell -h
command.
Running a test imputation example¶
Please, read GitHub page for Imputation Example and download all necessary files to running the Imputation example. GitHub Imputation example .
-
Create/change
impute.inputs.json
andimpute.options.json
to the following examples below. Substitute paths.impute.inputs.json
example:{ "impute.sample_set_id": "hapmap370k", "impute.mode": "pgt", "impute.target": "ext", "impute.batch_tsv_file": "</path/to/>tsvs/impute.hapmap370k.batch.tsv", "impute.max_win_size_cm": 50.0, "impute.overlap_size_cm": 5.0, "impute.target_chrs": ["chr12", "chrX"], "impute.ref_name": "GRCh38", "impute.ref_path": "</path/to/>GRCh38", "impute.data_path": "</path/to/>output", "impute.beagle": false }
Note
Make sure that gvaramu group has write permission to the project dir and workflow output dir specified in ’option.json’.
chmod 775 <dir>
impute.options.json
example:{ "final_workflow_outputs_dir": "</path/to/upload/the/output>", "use_relative_output_paths": "true" }
-
Load Cromwell-tools module:
module load any/cromwell-tools
-
Submit the workflow to Cromwell.
$IMPUTE
variable loaded with the Cromwell-tools module and it defines a path tomocha.wdl
tested on UTHPC cluster.cromshell submit $IMPUTE impute.inputs.json impute.options.json
Running your own impute workflow¶
Warning
Make sure to correctly configure cromshell . Check CROMWELL_URL=172.17.63.3:14000
by issuing cromshell -h
command.
- Create
inputs.json
similar toimpute.inputs.json
from ’Running a test Impute Example’ section. - Create
options.json
from the example below and replace the output dir path{ "final_workflow_outputs_dir": "</path/to/upload/the/output>", "use_relative_output_paths": "true" }
- Load the Cromwell-tools and Submit the workflow
module load any/cromwell-tools cromshell submit $IMPUTE <your/inputs>.json <your/options>.json
Allelic Shift pipeline¶
Warning
Make sure to correctly configure cromshell . Check CROMWELL_URL=172.17.63.3:14000
by issuing cromshell -h
command.
Running a test Allelic Shift example¶
Please, read GitHub page for Allelic Shift Pipeline and download all necessary files to running the Allelic Shift Pipeline example. GitHub Allelic Shift Pipeline
-
Create/change
shift.json
andshift.options.json
to the following examples below. Substitute paths.shift.json
example:{ "shift.sample_set_id": "hapmap370k", "shift.region": "chrX", "shift.samples_file": "</path/to/>tsvs/hapmap370k.mLOX.lines", "shift.batch_tsv_file": "</path/to/>tsvs/shift.hapmap370k.batch.tsv", "shift.ref_path": "</path/to/>GRCh38", "shift.data_path": "</path/to/>impute_output" }
Note
Make sure that gvaramu group has write permission to the project dir and workflow output dir specified in ’option.json’.
chmod 775 <dir>
shift.options.json
example:{ "final_workflow_outputs_dir": "</path/to/upload/the/output>", "use_relative_output_paths": "true" }
-
Load Cromwell-tools module:
module load any/cromwell-tools
-
Submit the workflow to Cromwell.
$SHIFT
variable loaded with the Cromwell-tools module and it defines a path toshift.wdl
tested on UTHPC cluster.cromshell submit $SHIFT shift.json shift.options.json
Running your own Shift workflow¶
Warning
Make sure to correctly configure cromshell . Check CROMWELL_URL=172.17.63.3:14000 by issuing cromshell -h
.
- Create
inputs.json
similar toshift.json
from "Running a test Shift Example" section. - Create
shift.options.json
from the example below and replace the output dir path{ "final_workflow_outputs_dir": "</path/to/upload/the/output>", "use_relative_output_paths": "true" }
- Load the Cromwell-tools and Submit the workflow
module load any/cromwell-tools cromshell submit $SHIFT <your/inputs>.json <your/options>.json
Troubleshooting¶
If there is any issues or questions in regards to Workflows, Cromwell and/or GATK4, please, contact support@hpc.ut.ee for a support.