GATK4 & Cromwell workflow¶
GATK4 & Cromwell workflow is only for Institute of Genomics.
Important
This guide is for existing Cromwell users. If you want to start using Cromwell, please, write to support@hpc.ut.ee .
Introduction to WDL workflows¶
Workflow components¶
<workflow_name>.inputs.json - an input file, it contains paths for all input file and references needed to execute the workflow. Here is a inputs.json example for a test sample .
<workflow_name>.option.json - an option file, the main purpose of this file to specify the output directory of the workflow. Here is a option.json example for a test sample .
Note
JavaScript Object Notation (JSON) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute—value pairs and arrays. JSON objects are for transferring data between server and client. JSON examples Ream more on Wikipedia .
Workflow tools¶
-
Cromwell server— a workflow execution engine that runs WDL workflows on the UTHPC cluster. Cromwell server talks to Slurm and handles jobs on its own. Cromwell github page and docs .Note
For Institute of Genomics, Cromwell server is run under UTHPC management.
-
cromshell- is a command-line tool to talk to Cromwell server.Note
Automation tools are specific for a particular workflow
Make sure to use the correct ones.
- WholeGenomeGermlineSingleSample (WGS) .
createInputWGS.py- an automation script that creates ’inputs.json’ and ’option.json’ for all samples into a given dir. For details look createInputWGS.py .submit-WGS-batch.sh- an automation script that utilisescromshellfor an easy submission of workflows. For details look submit-WGS-batch.sh .
- VariantCalling (VC) .
createVCinput.py- an automation script thatvc_outputdir of a sample name dir and ’inputs.json’ and ’option.json’ for all specified samples. For details look createVCinput.pysubmitVC-batch- an automation script that utilisescromshellfor an easy submission of workflows. For details look submitVC-batch
- WholeGenomeGermlineSingleSample (WGS) .
Basic steps to run a workflow¶
- Create
inputs.jsonandoption.jsonfiles for each sample. - Submit the workflow for a sample or more. It tells you all samples to run and workflow ID for each sample. workflow ID looks like this
{"id":"xxxxxxxx-xxxx-xxxx-xxxx","status":"Submitted"}. - Check if the workflow has finished with
cromshell list -u.
GATK4 & Cromwell module¶
Available GATK4 and Cromwell modules versions:
- GATK4: 4.2.0.0 module
any/gatk4. - Cromwell: 53.1, 63.1 module
any/cromwell/63.1.
Important
There is no need to load the gatk4/cromwell modules to interact with the Cromwell server. You're required to load only cromwell-tools module. Read more below
module load any/cromwell-tools
Cromwell¶
Cromwell server¶
UTHPC has a Singularity Cromwell server and ordinary Cromwell server. They have different ’CROMWELL_URL’, so be careful when choosing the ’CROMWELL_URL’. However, you can always change CROMWELL_URL later .
- Native Cromwell server has
CROMWELL_URL:172.17.63.1:15000and runs WGS pipeline and VC pipeline . - Singularity Cromwell has
CROMWELL_URL:172.17.63.3:14000and runs MoChA pipeline
Configure cromshell¶
To interact with Cromwell server use cromshell command. To start using it, you have to configure it first. You need to do the configuration only once.
module load any/cromwell-tools
Depending on what pipeline you intend to run, you chose your Cromwell server. For running WholeGenomeGermlineSingleSample and VariantCalling pipelines, select ’Native Cromwell’. For MoChA pipeline, choose ’Singularity Cromwell’. Insert the appropriate ’CROMWELL_URL’ when cromshell prompts for it.
| Native Cromwell | Singularity Cromwell | |
|---|---|---|
| CROMWELL_URL | 172.17.63.1:15000 | 172.17.63.3:14000 |
| Executable Pipelines | WGS & VariantCalling | MoChA |
To connect your cromshell to GI Cromwell server, you should enter the details as stated below:
- Run
cromshellcommand. - Enter the following info on a prompt:
- Cromwell URL:
<insert the required CROMWELL_URL>. - Confirmation:
yes.
- Cromwell URL:
Here is an example of setting up for Native Cromwell:
cromshell
Welcome to Cromshell
What is the URL of the Cromwell server you use to execute
your jobs?
Cromwell URL, please: 172.17.63.1:15000
Oh my! It looks like your server is: 172.17.63.1:15000
Is this correct [Yes/No] : yes
OK. Im now setting your Cromwell server to be: 172.17.63.1:15000
Dont worry - this wont hurt a bit...DONE
OK, you should be ALL SET!
Interacting with Cromwell server¶
To interact with Cromwell server use cromshell command. To start using it, you have to configure it first. You need to do the configuration only once.
module load any/cromwell-tools
Cromshell usage examples¶
Note
To learn how to submit a workflow, please read 'Running your own workflow' section of the pipeline description: WGS , VC and MoChA
For further interaction with Cromwell, here are useful cromshell examples:
# abort a workflow
cromshell abort <workflow id>
# check the last workflow status
cromshell status
# Display a list of jobs submitted through cromshell
cromshell list -c
# Check completion status of all unfinished jobs
cromshell list -u
# Check metadata
cromshell -t 20 metadata <workflow id>
Change CROMWELL_URL¶
To change an already set ’CROMWELL_URL’, you have to edit cromshell configuration file located in your $HOME. You can comment out the existing CROMWELL_URL and insert the other one on the second line.
vim ~/.cromshell/cromwell_server.config
Workflows¶
Tested and optimised workflows on UTHPC cluster:
- WholeGenomeGermlineSingleSample (WGS)
- VariantCalling (reduced WGS version)
- MoChA WDL pipeline
WholeGenomeGermlineSingleSample¶
All workflow parameters and procedures have been pre-optimised for running on UTHPC cluster.
You can view an NA12878 example output produced by the pipeline at the following path:
/gpfs/space/software/gatk4_pipeline/cromwell/workflows/test_catalogue_wgs/wgs_output
Read more:
- The GitHub release used: WholeGenomeGermlineSingleSample v2.3.2
- WGS Broad institute docs: WGS Overview
- WGS Methods link
Variant calling¶
Variant Calling pipeline is a reduced version of WholeGenomeGermlineSingleSample pipeline.
The pipeline takes an aligned CRAM with index as an input. The pipeline's output includes GVCF containing variant calls with an corresponding index.
MoChA WDL pipeline¶
The pipeline runs under Singularity on UTHPC cluster.
Warning
MoChA pipeline runs under Singularity Cromwell, make sure your cromshell has been setup correctly. The instruction on how to setup cromshell or control ’CROMWELL_URL’ are here
Read more about: MoChA WDL pipeline
WholeGenomeGermlineSingleSample (WGS) manual¶
Running a test WGS workflow¶
This test workflow bases on public test sample for gatk4 NA12878_20k. First, load Cromwell-tools module:
module load any/cromwell-tools
To run a test workflow, you have to copy the test directory to your project dir:
cp -R /gpfs/software/soft/manual/any/cromwell/workflows/test_catalogue_wgs /gpfs/hpc/projects/egv_hg38/wgs_output/
Warning
Make sure to configure cromshell before continuing with the next steps.
Create ’inputs’ and ’option’ files for workflow submission. From inside your sample directory, run createInputWGS.py script. It creates wgs_output dir inside the sample dir and two files WGGSS_NA12878_20k.inputs.json and WGGSS_NA12878_20k.option.json.
Here is how to do it:
#Go inside the sample catalogue
cd /gpfs/hpc/projects/egv_hg38/wgs_output/test_catalogue_wgs
#From inside test_catalogue_wgs
createInputWGS.py -i ./
Output:
CATALOGUE PATH: /path/to/sample/catalogue/
WGS Input created: /path/to/sample/catalogue/wgs_output/test_catalogue_wgs/NA12878_20k/WGGSS_NA12878_20k.inputs.json
WGS Options created: /path/to/sample/catalogue/wgs_output/test_catalogue_wgs/NA12878_20k/WGGSS_NA12878_20k.option.json
Submit the workflow for sample ’NA12878_20k’
submit-WGS-batch.sh NA12878_20k | tee -a cromshell_logs
Output:
Workflow will run only for samples: NA12878_20k
WGGSS workflow is submitted for: NA12878_20k
sample_dir: /gpfs/hpc/projects/egv_hg38/wgs_output/test_catalogue_wgs/NA12878_20k
Sub-Command: submit
Submitting job to server: 172.17.63.1:15000
{"id":"xxxxxxxx-xxxx-xxxx-xxxx","status":"Submitted"}
# check the last workflow status
cromshell status
# Display a list of jobs submitted through cromshell
cromshell list -c
# Check completion status of all unfinished jobs
cromshell list -u
Running your own workflow¶
Below is example, how to run your own workflow.
-
Go inside your sample directory with all samples
Warning
If you don't have write permission inside the sample directory, look if
wgs_outputdir exists already and you have write permission, if you don't, ask the owner to createwgs_outputdir for you and give you permissions forwgs_outputdir. -
Run
createInputWGS.py -i ./from inside your sample directory. It createswgs_outputin the sample directory, if it doesn't exist already, and sample directories withinputs.json&option.jsonfiles for all samples insidewgs_output.<sample-catalogue>/wgs_output/<sample-name>is location of all output files after the workflow finishes.Warning
Make sure to configure cromshell before continuing with the next steps.
-
Submit a batch of samples with
submit-WGS-batch.shscript. Please, ensure thatsample_namehas no/character at the end.submit-WGS-batch.shtakes ’sample_names’ not directories.Submission example:
submit-WGS-batch.sh <sample_name1> <sample_name2> <sample_name3> <sample_name4> | tee -a cromshell_logsImportant
cromshell_logsis an important file to identify sample name and workflow ID later, in case you have several samples to submit.
List of commands to see the submission status:
# check the last workflow status
cromshell status
# Display a list of jobs submitted through cromshell
cromshell list -c
# Check completion status of all unfinished jobs
cromshell list -u
Automation tools¶
Create inputs.json & option.json¶
createInputWGS.py is a python script looks for wgs_output dir in the sample directory and creates a sample dir inside wgs_output, which contains inputs.json and option.json for a given sample. The tool assumes that samples are in a directory of samples as shown below. Please, note submit-WGS-batch.sh also relies on the following structure.
Here is a structure example of a sample directory:
samples-catalogue/
├── V00000
│ ├── H7G2M.1.bam
│ ├── H7G2M.2.bam
│ ├── H7G2M.3.bam
│ ├── H7G2M.4.bam
│ ├── H7G2M.5.bam
│ ├── H7G2M.6.bam
│ └── H7G2M.7.bam
└── V00001
├── H1G1M.1.bam
├── H1G1M.2.bam
├── H1G1M.3.bam
├── H1G1M.4.bam
├── H1G1M.5.bam
├── H1G1M.6.bam
└── H1G1M.7.bam
Usage example:
#run the tool
createInputWGS.py -i <path/to/catalogue/of/samples>
# help
createInputWGS.py -h
Sample directory after running createInputWGS.py:
samples-catalogue/
├── V00000
│ ├── H7G2M.1.bam
│ ├── H7G2M.2.bam
│ ├── H7G2M.3.bam
│ ├── H7G2M.4.bam
│ ├── H7G2M.5.bam
│ ├── H7G2M.6.bam
│ └── H7G2M.7.bam
├── V00001
│ ├── H1G1M.1.bam
│ ├── H1G1M.2.bam
│ ├── H1G1M.3.bam
│ ├── H1G1M.4.bam
│ ├── H1G1M.5.bam
│ ├── H1G1M.6.bam
│ └── H1G1M.7.bam
└── wgs_output
├── V00000
│ ├── WGGSS_V00000.inputs.json
│ └── WGGSS_V00000.option.json
└── V00001
├── WGGSS_V00001.inputs.json
└── WGGSS_V00001.option.json
Submit several samples¶
submit-WGS-batch.sh is a bash script that submits several samples into Cromwell server. It must run from the root of the sample directory.
Note
It's recommended to use | tee -a cromshell_logs seen in the example below as it appends info to cromshell_logs, so it would be possible to match a ’workflowID’ with a sample name.
cd <path/to/catalogue/of/samples>
submit-WGS-batch.sh <V00000> <V00001> ... | tee -a cromshell_logs
Examples wgs.inputs.json and wgs.option.json¶
wgs.inputs.json¶
{
"WholeGenomeGermlineSingleSample.sample_and_unmapped_bams": {
"sample_name": "NA12878_20k",
"base_file_name": "NA12878_20k",
"flowcell_unmapped_bams": [
"/gpfs/space/home/<user>/test_catalogue_wgs/NA12878_20k/NA12878_A.bam",
"/gpfs/space/home/<user>/test_catalogue_wgs/NA12878_20k/NA12878_B.bam",
"/gpfs/space/home/<user>/test_catalogue_wgs/NA12878_20k/NA12878_C.bam"
],
"final_gvcf_base_name": "NA12878_20k",
"unmapped_bam_suffix": ".bam"
},
"WholeGenomeGermlineSingleSample.references": {
"contamination_sites_ud": "/gpfs/hpc/databases/broadinstitute/gpc-public-data--broad-references/hg38/v0/contamination-resources/1000g/1000g.phase3.100k.b38.vcf.gz.dat.UD",
"contamination_sites_bed": "/gpfs/hpc/databases/broadinstitute/gpc-public-data--broad-references/hg38/v0/contamination-resources/1000g/1000g.phase3.100k.b38.vcf.gz.dat.bed",
"contamination_sites_mu": "/gpfs/hpc/databases/broadinstitute/gpc-public-data--broad-references/hg38/v0/contamination-resources/1000g/1000g.phase3.100k.b38.vcf.gz.dat.mu",
"calling_interval_list": "/gpfs/hpc/databases/broadinstitute/gpc-public-data--broad-references/hg38/v0/wgs_calling_regions.hg38.interval_list",
"reference_fasta": {
"ref_dict": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.dict",
"ref_fasta": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.fasta",
"ref_fasta_index": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.fasta.fai",
"ref_alt": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.fasta.64.alt",
"ref_sa": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.fasta.64.sa",
"ref_amb": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.fasta.64.amb",
"ref_bwt": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.fasta.64.bwt",
"ref_ann": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.fasta.64.ann",
"ref_pac": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.fasta.64.pac"
},
"known_indels_sites_vcfs": [
"/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz",
"/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz"
],
"known_indels_sites_indices": [
"/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi",
"/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi"
],
"dbsnp_vcf": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf",
"dbsnp_vcf_index": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx",
"evaluation_interval_list": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/wgs_evaluation_regions.hg38.interval_list",
"haplotype_database_file": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.haplotype_database.txt"
},
"WholeGenomeGermlineSingleSample.scatter_settings": {
"haplotype_scatter_count": 10,
"break_bands_at_multiples_of": 100000
},
"WholeGenomeGermlineSingleSample.wgs_coverage_interval_list": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/wgs_coverage_regions.hg38.interval_list",
"WholeGenomeGermlineSingleSample.papi_settings": {
"preemptible_tries": 3,
"agg_preemptible_tries": 3
}
}
wgs.option.json¶
{
"final_workflow_outputs_dir": "/gpfs/space/home/<user>/test_catalogue_wgs/wgs_output/NA12878_20k",
"use_relative_output_paths": "true"
}
VariantCalling manual¶
Running a test VariantCalling workflow¶
This test workflow bases on crams of the public test sample for gatk4 NA12878_20k.
Warning
Make sure to correctly configure cromshell . Check CROMWELL_URL=172.17.63.1:15000 by issuing cromshell -h command.
First, load Cromwell-tools module:
module load any/cromwell-tools
To run a test workflow, you have to copy the test directory to your project dir (Make sure that gvaramu has write permission to the dir):
cp -R /gpfs/space/software/gatk4_pipeline/cromwell/workflows/test_catalogue_wgs/NA12878_20k_cram </path/to/project/dir>
Create inputs and option files for workflow submission. From inside your sample directory containing crams, run createVCinput.py script. It creates vc_output dir with NA12878_20k sample dir and two files vc_NA12878_20k.inputs.json and vc_NA12878_20k.option.json.
Here is how to do it:
#Go inside the sample catalogue
cd </path/to/project/dir/>NA12878_20k_cram
#From inside NA12878_20k_cram
createVCinput.py NA12878_20k
SAMPLES ['NA12878_20k']
CATALOGUE PATH: </path/to/project/dir/>/NA12878_20k_cram/
vc_output dir created in CATALOGUE PATH
VC Input created: </path/to/project/dir/>/NA12878_20k_cram/vc_output/NA12878_20k/vc_NA12878_20k.inputs.json
VC Options created: </path/to/project/dir/>/NA12878_20k_cram/vc_output/NA12878_20k/vc_NA12878_20k.option.json
Submit the workflow for sample NA12878_20k
submitVC-batch NA12878_20k | tee -a cromshell_logs
Workflow will run only for samples: NA12878_20k
WGGSS workflow is submitted for: NA12878_20k
sample_dir: </path/to/project/dir/>/NA12878_20k_cram
Sub-Command: submit
Submitting job to server: 172.17.63.1:15000
{"id":"xxxxxxxx-xxxx-xxxx-xxxx","status":"Submitted"}
Here is a list of commands to see the submission status:
# check the last workflow status
cromshell status
# Display a list of jobs submitted through cromshell
cromshell list -c
# Check completion status of all unfinished jobs
cromshell list -u
Running your own VC workflow¶
Please, read the Running a test VariantCalling section as it provides more exhaustive instructions.
Warning
Make sure to correctly configure cromshell . Check CROMWELL_URL=172.17.63.1:15000 by issuing cromshell -h command.
-
Go inside your sample directory, witch is directory with all samples.
Warning
If you don't have write permission inside the sample directory, look if
vc_outputdir exists already and you have write permission, if you don't, ask the owner to createvc_outputdir for you and give you permissions forvc_outputdir. -
Run
createVCinput.py <sample name>from inside your sample directory. It createsvc_outputin the sample directory, if it doesn't already exist, and sample directories withinputs.json&option.jsonfiles for all samples insidevc_output.<sample-catalogue>/vc_output/<sample-name>is the location of all output files after workflow finishes. -
Submit a batch of samples with
submitVC-batchcommand. Please, ensure that ’sample_names’ have no extension like.cram.submitVC-batchtakes ’sample_names’ not directories. Submission example:submitVC-batch <sample_name1> <sample_name2> <sample_name3> <sample_name4> | tee -a cromshell_logsImportant
cromshell_logsis an important file to identify sample name and workflow ID later, in case you have several samples to submit.
Here is a list of commands to see the submission status:
# check the last workflow status
cromshell status
# Display a list of jobs submitted through cromshell
cromshell list -c
# Check completion status of all unfinished jobs
cromshell list -u
VC automation tools¶
Create vc.inputs.json and vc.option.json¶
createVCinput.py is a python script creates a sample dir containing vc.inputs.json and vc.option.json for a given sample inside vc_output. The tool assumes that sample crams are in the directory of samples as shown below.
Here is a structure example of a sample directory:
samples-catalogue/
├── V00000.cram
├── V00000.cram.crai
├── V00001.cram
└── V00001.cram.crai
# submit a sample
createVCinput.py <sample1> <sample2> ... <samplen>
# help
createVCinput.py -h
Sample directory after running createVCinput.py
samples-catalogue/
├── V00000.cram
├── V00000.cram.crai
├── V00001.cram
├── V00001.cram.crai
└── vc_output
├── V00000
│ ├── vc_V00000.inputs.json
│ └── vc_V00000.option.json
└── V00001
├── vc_V00001.inputs.json
└── vc_V00001.option.json
Submit samples to VC workflow¶
submitVC-batch is a bash script that submits several samples into Cromwell server. It takes care of defining all the variables required for running the VC workflow, so you have to provide only the sample name. It must run from the root of the sample directory.
Note
It's recommended to use | tee -a cromshell_logs seen in the example below as it appends info to cromshell_logs, so it would be possible to match a workflowID with a sample name.
Usage example:
cd <path/to/catalogue/of/samples>
submitVC-batch <V00000> <V00001> ... | tee -a cromshell_logs
Examples vc.inputs.json and vc.option.json¶
vc.inputs.json¶
{
"WholeGenomeGermlineSingleSample.sample_and_mapped_crams": {
"sample_name": "NA12878_20k",
"base_file_name": "NA12878_20k",
"final_gvcf_base_name": "NA12878_20k",
"input_cram": "<path/to/test/catalogue>/NA12878_20k_cram/NA12878_20k.cram",
"input_cram_index": "<path/to/test/catalogue>/NA12878_20k_cram/NA12878_20k.cram.crai"
},
"WholeGenomeGermlineSingleSample.references": {
"contamination_sites_ud": "/gpfs/hpc/databases/broadinstitute/gpc-public-data--broad-references/hg38/v0/contamination-resources/1000g/1000g.phase3.100k.b38.vcf.gz.dat.UD",
"contamination_sites_bed": "/gpfs/hpc/databases/broadinstitute/gpc-public-data--broad-references/hg38/v0/contamination-resources/1000g/1000g.phase3.100k.b38.vcf.gz.dat.bed",
"contamination_sites_mu": "/gpfs/hpc/databases/broadinstitute/gpc-public-data--broad-references/hg38/v0/contamination-resources/1000g/1000g.phase3.100k.b38.vcf.gz.dat.mu",
"calling_interval_list": "/gpfs/hpc/databases/broadinstitute/gpc-public-data--broad-references/hg38/v0/wgs_calling_regions.hg38.interval_list",
"reference_fasta": {
"ref_dict": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.dict",
"ref_fasta": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.fasta",
"ref_fasta_index": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.fasta.fai",
"ref_alt": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.fasta.64.alt",
"ref_sa": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.fasta.64.sa",
"ref_amb": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.fasta.64.amb",
"ref_bwt": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.fasta.64.bwt",
"ref_ann": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.fasta.64.ann",
"ref_pac": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.fasta.64.pac"
},
"known_indels_sites_vcfs": [
"/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz",
"/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz"
],
"known_indels_sites_indices": [
"/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi",
"/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi"
],
"dbsnp_vcf": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf",
"dbsnp_vcf_index": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx",
"evaluation_interval_list": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/wgs_evaluation_regions.hg38.interval_list",
"haplotype_database_file": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/Homo_sapiens_assembly38.haplotype_database.txt"
},
"WholeGenomeGermlineSingleSample.scatter_settings": {
"haplotype_scatter_count": 10,
"break_bands_at_multiples_of": 100000
},
"WholeGenomeGermlineSingleSample.wgs_coverage_interval_list": "/gpfs/hpc/databases/broadinstitute/references/hg38/v0/wgs_coverage_regions.hg38.interval_list",
"WholeGenomeGermlineSingleSample.papi_settings": {
"preemptible_tries": 3,
"agg_preemptible_tries": 3
}
}
vc.option.json¶
{
"final_workflow_outputs_dir": "<path/to/test/catalogue>/NA12878_20k_cram/vc_output/NA12878_20k",
"use_relative_output_paths": "true"
}
MoChA manual¶
Warning
Make sure to correctly configure cromshell . Check CROMWELL_URL=172.17.63.3:14000 by issuing cromshell -h command.
Running a test Illumina example¶
Please, read GitHub page for Illumina Example and download all necessary files to running the Illumina example. GitHub Illumina example
-
Create/change
illumina_example.jsonandillumina.option.jsonto the following examples below. Substitute paths.illumina_example.jsonexample:{ "mocha.sample_set_id": "hapmap370k", "mocha.mode": "idat", "mocha.realign": true, "mocha.max_win_size_cm": 300.0, "mocha.overlap_size_cm": 5.0, "mocha.ref_name": "GRCh38", "mocha.ref_path": "</path/to/downloaded/>GRCh38", "mocha.manifest_path": "</path/to/downloaded/>manifests", "mocha.data_path": "</path/to/downloaded/>idats", "mocha.batch_tsv_file": "</path/to/downloaded/>tsvs/hapmap370k.batch.tsv", "mocha.sample_tsv_file": "</path/to/downloaded/>tsvs/hapmap370k.sample.tsv", "mocha.ped_file": "</path/to/downloaded/>hapmap370k.ped", "mocha.docker_registry": "/tmp/singularity_img/", "mocha.do_not_check_bpm": true }Note
Make sure that gvaramu group has write permission to the project dir and workflow output dir specified in
option.json.chmod 775 <dir>illumina.option.jsonexample:{ "final_workflow_outputs_dir": "</path/to/upload/the/output>", "use_relative_output_paths": "true" } -
Load Cromwell-tools module:
module load any/cromwell-tools -
Submit the workflow to Cromwell.
$MOCHAvariable loaded with the Cromwell-tools module and it defines a path tomocha.wdltested on UTHPC cluster.cromshell submit $MOCHA illumina_example.json options.json
Running your own MoChA workflow¶
Warning
Make sure to correctly configure cromshell . Check CROMWELL_URL=172.17.63.3:14000 by issuing cromshell -h command.
- Create
inputs.jsonsimilar toillumina_example.jsonfrom ’Running a test Illumina Example’ section. - Create
options.jsonfrom the example below and replace the output dir path{ "final_workflow_outputs_dir": "</path/to/upload/the/output>", "use_relative_output_paths": "true" } - Load the Cromwell-tools and Submit the workflow
module load any/cromwell-tools cromshell submit $MOCHA <your/inputs>.json <your/options>.json
Imputation pipeline¶
Warning
Make sure to correctly configure cromshell . Check CROMWELL_URL=172.17.63.3:14000 by issuing cromshell -h command.
Running a test imputation example¶
Please, read GitHub page for Imputation Example and download all necessary files to running the Imputation example. GitHub Imputation example .
-
Create/change
impute.inputs.jsonandimpute.options.jsonto the following examples below. Substitute paths.impute.inputs.jsonexample:{ "impute.sample_set_id": "hapmap370k", "impute.mode": "pgt", "impute.target": "ext", "impute.batch_tsv_file": "</path/to/>tsvs/impute.hapmap370k.batch.tsv", "impute.max_win_size_cm": 50.0, "impute.overlap_size_cm": 5.0, "impute.target_chrs": ["chr12", "chrX"], "impute.ref_name": "GRCh38", "impute.ref_path": "</path/to/>GRCh38", "impute.data_path": "</path/to/>output", "impute.beagle": false }Note
Make sure that gvaramu group has write permission to the project dir and workflow output dir specified in ’option.json’.
chmod 775 <dir>impute.options.jsonexample:{ "final_workflow_outputs_dir": "</path/to/upload/the/output>", "use_relative_output_paths": "true" } -
Load Cromwell-tools module:
module load any/cromwell-tools -
Submit the workflow to Cromwell.
$IMPUTEvariable loaded with the Cromwell-tools module and it defines a path tomocha.wdltested on UTHPC cluster.cromshell submit $IMPUTE impute.inputs.json impute.options.json
Running your own impute workflow¶
Warning
Make sure to correctly configure cromshell . Check CROMWELL_URL=172.17.63.3:14000 by issuing cromshell -h command.
- Create
inputs.jsonsimilar toimpute.inputs.jsonfrom ’Running a test Impute Example’ section. - Create
options.jsonfrom the example below and replace the output dir path{ "final_workflow_outputs_dir": "</path/to/upload/the/output>", "use_relative_output_paths": "true" } - Load the Cromwell-tools and Submit the workflow
module load any/cromwell-tools cromshell submit $IMPUTE <your/inputs>.json <your/options>.json
Allelic Shift pipeline¶
Warning
Make sure to correctly configure cromshell . Check CROMWELL_URL=172.17.63.3:14000 by issuing cromshell -h command.
Running a test Allelic Shift example¶
Please, read GitHub page for Allelic Shift Pipeline and download all necessary files to running the Allelic Shift Pipeline example. GitHub Allelic Shift Pipeline
-
Create/change
shift.jsonandshift.options.jsonto the following examples below. Substitute paths.shift.jsonexample:{ "shift.sample_set_id": "hapmap370k", "shift.region": "chrX", "shift.samples_file": "</path/to/>tsvs/hapmap370k.mLOX.lines", "shift.batch_tsv_file": "</path/to/>tsvs/shift.hapmap370k.batch.tsv", "shift.ref_path": "</path/to/>GRCh38", "shift.data_path": "</path/to/>impute_output" }Note
Make sure that gvaramu group has write permission to the project dir and workflow output dir specified in ’option.json’.
chmod 775 <dir>shift.options.jsonexample:{ "final_workflow_outputs_dir": "</path/to/upload/the/output>", "use_relative_output_paths": "true" } -
Load Cromwell-tools module:
module load any/cromwell-tools -
Submit the workflow to Cromwell.
$SHIFTvariable loaded with the Cromwell-tools module and it defines a path toshift.wdltested on UTHPC cluster.cromshell submit $SHIFT shift.json shift.options.json
Running your own Shift workflow¶
Warning
Make sure to correctly configure cromshell . Check CROMWELL_URL=172.17.63.3:14000 by issuing cromshell -h.
- Create
inputs.jsonsimilar toshift.jsonfrom "Running a test Shift Example" section. - Create
shift.options.jsonfrom the example below and replace the output dir path{ "final_workflow_outputs_dir": "</path/to/upload/the/output>", "use_relative_output_paths": "true" } - Load the Cromwell-tools and Submit the workflow
module load any/cromwell-tools cromshell submit $SHIFT <your/inputs>.json <your/options>.json
Troubleshooting¶
If there is any issues or questions in regards to Workflows, Cromwell and/or GATK4, please, contact support@hpc.ut.ee for a support.