4. Using iCOMIC for DNA-Seq or RNA-Seq analysis¶
4.1. Analysis steps DNA-Seq¶
DNA-Seq
constitutes the Whole Genome/Exome Sequencing data analysis pipeline which permits the user to call variants from the input samples and annotate them. iCOMIC integrates a combination of 3 aligners, 4 variant callers and 2 annotators along with the tools for Quality control. The tool MultiQC is incorporated to render comprehensive analysis statistics.
Aligners
:GEM-Mapper, BWA-MEM, Bowtie2Variant callers
:GATK HC, samtools mpileup, freebayes, GATK Mutect2Annotators
:Annovar, SnpEff
4.1.1. Input Requirements¶
The significant obligation is raw fastq
files which can either be single-end or paired-end. Fastq read details can be specified in two different methods, either by uploading a folder containing the reads or using a tab-separated file describing the reads. If you choose the Upload from folder
mode, the path to the folder containing the fastq files needs to be specified. Additionally, all the files in the folder need to be named in the format specified in the section Input file format.
Alternatively, if you decide to use the Upload from table
mode, a tab-separated file consolidating particulars about the sample needs to be fed in. Refer to the Input file format section for formatting the table. An example tab separated file named units_sample.tsv
is available at this link.
Furthermore, iCOMIC demands a path to the reference genome, a fasta
file and gzipped vcf
of the known variants corresponding to the reference genome.
Once all the fields are filled, you can proceed to the Quality Control
tab using the next button.
4.1.2. Review of Input Samples¶
In the Quality Control
widget, you can examine the quality of your samples for analysis by clicking on the Quality Control Results
button. The tool MultiQC provides a consolidated report of Quality statistics generated by FastQC for all the samples. Additionally, iCOMIC permits you to trim the reads using Cutadapt if required. However one will be free to move ahead even without going through the Quality Check processes.
4.1.3. Setting up a pipeline¶
In the tool selection widget, you will be asked to choose your desired set of tools for analysis.
Aligner
You can choose a software for sequence alignment from the drop down menu. You will also need to input the genome index corresponding to the choice of aligner. No worries! iCOMIC allows you to generate the required index using the Generate index
button. One will have the permission to change the values for the mandatory parameters displayed. Moreover, if you are an expert bioinformatician, iCOMIC allows you to play around with the advanced parameters. Clicking on the Advanced
button would open a pop-up of all the parameters associated with a tool.
Variant Caller
This section permits you to choose a variant caller from the set of tools integrated. If the input sample is normal-tumor specific, then only those tools which call variants comparing the normal and tumor samples will be displayed. On the other hand, if you want to call variants corresponding to the reference genome, variant callers of that type would be displayed. iCOMIC allows you to set mandatory as well as advanced parameters for the selected tool.
Annotator
This section allows you to choose a tool for annotating your called variants and specify the parameters.
4.1.4. Initialization of the analysis¶
The Run
tab displays an Unlock
button and a Run
button. Run
is for initializing the analysis. When the analysis starts, if a warning icon pops up near the Unlock
button, you need to click the Unlock
button to unlock the working directory and then click Run
to proceed with the analysis. Progress bar present in the tab allows you to examine the progress of analysis.
4.1.5. Results: A quick check¶
Once the analysis is completed, iCOMIC will automatically take you to the Results
tab which displays three major results.
Analysis Statistics
:Displays a MultiQC consolidated report of overall analysis statistics. This includes FastQC reports, Alignment statistics and variant statistics.Variants called
:On clicking this button a pop up with thevcf
file of variants called will be displayed.Annotated variants
:Displays the annotatedvcf
file.
4.2. Analysis steps RNA-Seq¶
RNA-Seq
part allows you to identify the differentially expressed genes from RNA Sequencing data. iCOMIC integrates a combination of 2 aligners, 2 expression modellers and 2 differential expression tools along with the tools for Quality control. The tool MultiQC is incorporated to render comprehensive analysis statistics.
Aligners
:STAR, HISAT2Expression modellers
:StringTie, HTSeqDifferential expression
:DESeq2, ballgown
Available pipelines:
- HISAT2-StringTie-ballgown
- STAR-StringTie-ballgown
- HISAT2-HTSeq-DESeq2
- STAR-HTSeq-DESeq2
4.2.1. Input Requirements¶
Similar to the DNA-Seq pipeline, the major requirement here is also raw fastq
files, either single-end or paired-end. Fastq read details can be specified in two different methods, either by uploading a folder containing the reads or using a tab-separated file describing the reads. Refer to the Input file format section for preparing the input reads. Furthermore, the RNA-Seq part demands a path to the reference genome, a fasta
file, annotated file in gtf
format, and a transcript file.
Once all the fields are filled, you can proceed to the Quality Control
tab using the next button.
4.2.2. Review of Input Samples¶
In the Quality Control
widget, you can examine the quality of your samples for analysis by clicking on the Quality Control Results
button. The tool MultiQC provides a consolidated report of Quality statistics generated by FastQC for all the samples. Additionally, iCOMIC permits you to trim the reads using Cutadapt if required. However one will be free to move ahead even without going through the Quality Check processes.
4.2.3. Setting up a pipeline¶
In the tool selection widget, you will be asked to choose your desired set of tools for analysis.
Aligner
You can choose a software for sequence alignment from the drop down menu. You will also need to input the genome index corresponding to the choice of aligner. No worries! iCOMIC allows you to generate the required index using the Generate index
button. One will have the permission to change the values for the mandatory parameters displayed. Moreover, if you are an expert bioinformatician, iCOMIC allows you to play around with the advanced parameters. Clicking on the Advanced
button would open a pop-up of all the parameters associated with a tool.
Expression Modeller
This section allows you to choose an expression modeller from the integrated list of tools for counting the reads with the help of annotation file. Users will also have the freedom to set parameters corresponding to the tool.
Differential Expression tool
Here you can choose a tool for quantifying differential expression and can also set parameters.
4.2.4. Initialization of the analysis¶
The Run
tab displays an Unlock
button and a Run
button. Run
is for initializing the analysis. When the analysis starts, if a
Select warning icon pops up near the unlock button, you need to click the unlock button to unlock the working directory and then click run to proceed with the analysis. Progress bar present in the tab allows you to examine the progress of analysis.
4.2.5. Results: A quick check¶
Once the analysis is completed, iCOMIC will automatically take you to the Results
tab which displays three major results.
Analysis Statistics
:Displays a MultiQC consolidated report of overall analysis statistics. This includes FastQC reports and Alignment statistics .Differentially Expressed Genes
:On clicking this button a pop up with the list of differentially expressed genes will be displayed.Plots
:Displays differentially expressed genes in R plots such as MA plot, Heatmap, PCA plot and box plot.