Metagenomics analysis is done to understand the dynamics of microbial communities from an environmental sample and it provides classification of species level taxonomy and the projection of metabolic pathway activities from microbial samples. The Whole Metagenome sequencing (WGS) analysis generally includes sequencing of the complete metagenome samples followed by de novo assembly of multiple genomes from sequence reads of multiple species in an environmental sample. During assembly the paired end reads are compared to each other, and then overlapped reads are used to build longer contiguous sequences. The contiguous sequences are further analyzed for prediction of genes and functional annotations
Bioinformatics workflow for whole metagenome analysis:
Quality check of raw reads:
The raw reads will be subjected to quality filtration and adapter trimming using Trimmomatic software. The primer sequences, poly(A) tails and reads produced from ribosomal DNA templates will be removed. The high quality data will be used for downstream analysis.
De novo Assembly
The denovo assembly of high quality reads will be carried out using the Metavelvet assembler. The PE data will be assembled using various Kmer length, coverage cutoff, insert length, insert length standard deviation, expected coverage for scaffold assembly. The best assembly will be selected based on scaffold N50 and max scaffold length. Then final assembly will be evaluated based on scaffolds N50, assembly coverage (depth), reads participated in assembly, GC content, assembly completeness and accuracy.
Ab initio gene predictors are statistical models which are trained to ﬁnd features of genes, such start and stop codons, CDS of the genes. The assembly contiguous sequences will be used as input in Prodigal/Augustus gene prediction program to predict the coding region in the given sample.
Functional annotation is carried out using MG-RAST/Megan tool. The predicted coding regions will be annotated against NCBI non redundant protein database(Nr), Kyoto Encyclopedia of Genes and Genomes(KEGG), databases using Basic local alignment search tool (BLAST) or BLAST like Alignment tool (BLAT).
In biological pathways, coding region will mapped to reference canonical pathways in KEGG. All the coding genes classified mainly under five categories: Metabolism, Cellular processes, Genetic information processing, Environmental information processing. The output of KEGG analysis includes KEGG Orthology (KO) assignments and Corresponding Enzyme commission (EC) numbers and metabolic pathways of predicted coding genes using KEGG automated annotation server.
Taxonomical analysis is carried out using MG-RAST/Megan tool by assignment of reads to the NCBI taxonomy.
The comparative analysis will be done when we have more than one sample. The samples will be compared based on the abundance count and functionally annotated genes.
- Quality filtration of data
- Denovo assembly of high quality reads
- Phylogenetic analysis of microbial distribution
- Heat map, abundance and identification of microbial community
- Analysis for dominant population
- taxa identification
- Functional annotation using NCBI blast, KEGG pathway, SEED analysis,
- PCA and LCA plot
- Rarefraction curve and alpha diversity
- Comparative analysis (if samples is more than one)
- Comprehensive report with publication standard methodology, graphs and tables.