4. Using iCOMIC for DNA-Seq or RNA-Seq analysis

4.1. Analysis steps DNA-Seq

DNA-Seq constitutes the Whole Genome/Exome Sequencing data analysis pipeline which permits the user to call variants from the input samples and annotate them. iCOMIC integrates a combination of 3 aligners, 4 variant callers and 2 annotators along with the tools for Quality control. The tool MultiQC is incorporated to render comprehensive analysis statistics.

  1. Aligners:GEM-Mapper, BWA-MEM, Bowtie2
  2. Variant callers:GATK HC, samtools mpileup, freebayes, GATK Mutect2
  3. Annotators:Annovar, SnpEff

4.1.1. Input Requirements

The significant obligation is raw fastq files which can either be single-end or paired-end. Fastq read details can be specified in two different methods, either by uploading a folder containing the reads or using a tab-separated file describing the reads. If you choose the Upload from folder mode, the path to the folder containing the fastq files needs to be specified. Additionally, all the files in the folder need to be named in the format specified in the section Input file format.

Alternatively, if you decide to use the Upload from table mode, a tab-separated file consolidating particulars about the sample needs to be fed in. Refer to the Input file format section for formatting the table. An example tab separated file named units_sample.tsv is available at this link.

Furthermore, iCOMIC demands a path to the reference genome, a fasta file and gzipped vcf of the known variants corresponding to the reference genome.

Once all the fields are filled, you can proceed to the Quality Control tab using the next button.

4.1.2. Review of Input Samples

In the Quality Control widget, you can examine the quality of your samples for analysis by clicking on the Quality Control Results button. The tool MultiQC provides a consolidated report of Quality statistics generated by FastQC for all the samples. Additionally, iCOMIC permits you to trim the reads using Cutadapt if required. However one will be free to move ahead even without going through the Quality Check processes.

4.1.3. Setting up a pipeline

In the tool selection widget, you will be asked to choose your desired set of tools for analysis.

Aligner

You can choose a software for sequence alignment from the drop down menu. You will also need to input the genome index corresponding to the choice of aligner. No worries! iCOMIC allows you to generate the required index using the Generate index button. One will have the permission to change the values for the mandatory parameters displayed. Moreover, if you are an expert bioinformatician, iCOMIC allows you to play around with the advanced parameters. Clicking on the Advanced button would open a pop-up of all the parameters associated with a tool.

Variant Caller

This section permits you to choose a variant caller from the set of tools integrated. If the input sample is normal-tumor specific, then only those tools which call variants comparing the normal and tumor samples will be displayed. On the other hand, if you want to call variants corresponding to the reference genome, variant callers of that type would be displayed. iCOMIC allows you to set mandatory as well as advanced parameters for the selected tool.

Annotator

This section allows you to choose a tool for annotating your called variants and specify the parameters.

4.1.4. Initialization of the analysis

The Run tab displays an Unlock button and a Run button. Run is for initializing the analysis. When the analysis starts, if a warning icon pops up near the Unlock button, you need to click the Unlock button to unlock the working directory and then click Run to proceed with the analysis. Progress bar present in the tab allows you to examine the progress of analysis.

4.1.5. Results: A quick check

Once the analysis is completed, iCOMIC will automatically take you to the Results tab which displays three major results.

  1. Analysis Statistics:Displays a MultiQC consolidated report of overall analysis statistics. This includes FastQC reports, Alignment statistics and variant statistics.
  2. Variants called:On clicking this button a pop up with the vcf file of variants called will be displayed.
  3. Annotated variants:Displays the annotated vcf file.

4.2. Analysis steps RNA-Seq

RNA-Seq part allows you to identify the differentially expressed genes from RNA Sequencing data. iCOMIC integrates a combination of 2 aligners, 2 expression modellers and 2 differential expression tools along with the tools for Quality control. The tool MultiQC is incorporated to render comprehensive analysis statistics.

  1. Aligners:STAR, HISAT2
  2. Expression modellers:StringTie, HTSeq
  3. Differential expression:DESeq2, ballgown

Available pipelines:

  1. HISAT2-StringTie-ballgown
  2. STAR-StringTie-ballgown
  3. HISAT2-HTSeq-DESeq2
  4. STAR-HTSeq-DESeq2

4.2.1. Input Requirements

Similar to the DNA-Seq pipeline, the major requirement here is also raw fastq files, either single-end or paired-end. Fastq read details can be specified in two different methods, either by uploading a folder containing the reads or using a tab-separated file describing the reads. Refer to the Input file format section for preparing the input reads. Furthermore, the RNA-Seq part demands a path to the reference genome, a fasta file, annotated file in gtf format, and a transcript file.

Once all the fields are filled, you can proceed to the Quality Control tab using the next button.

4.2.2. Review of Input Samples

In the Quality Control widget, you can examine the quality of your samples for analysis by clicking on the Quality Control Results button. The tool MultiQC provides a consolidated report of Quality statistics generated by FastQC for all the samples. Additionally, iCOMIC permits you to trim the reads using Cutadapt if required. However one will be free to move ahead even without going through the Quality Check processes.

4.2.3. Setting up a pipeline

In the tool selection widget, you will be asked to choose your desired set of tools for analysis.

Aligner

You can choose a software for sequence alignment from the drop down menu. You will also need to input the genome index corresponding to the choice of aligner. No worries! iCOMIC allows you to generate the required index using the Generate index button. One will have the permission to change the values for the mandatory parameters displayed. Moreover, if you are an expert bioinformatician, iCOMIC allows you to play around with the advanced parameters. Clicking on the Advanced button would open a pop-up of all the parameters associated with a tool.

Expression Modeller

This section allows you to choose an expression modeller from the integrated list of tools for counting the reads with the help of annotation file. Users will also have the freedom to set parameters corresponding to the tool.

Differential Expression tool

Here you can choose a tool for quantifying differential expression and can also set parameters.

4.2.4. Initialization of the analysis

The Run tab displays an Unlock button and a Run button. Run is for initializing the analysis. When the analysis starts, if a Select warning icon pops up near the unlock button, you need to click the unlock button to unlock the working directory and then click run to proceed with the analysis. Progress bar present in the tab allows you to examine the progress of analysis.

4.2.5. Results: A quick check

Once the analysis is completed, iCOMIC will automatically take you to the Results tab which displays three major results.

  1. Analysis Statistics:Displays a MultiQC consolidated report of overall analysis statistics. This includes FastQC reports and Alignment statistics .
  2. Differentially Expressed Genes:On clicking this button a pop up with the list of differentially expressed genes will be displayed.
  3. Plots:Displays differentially expressed genes in R plots such as MA plot, Heatmap, PCA plot and box plot.