High-throughput profiling has generated massive amounts of data across basic, clinical and translational research fields. However, open source comprehensive web tools for analysing data obtained from different platforms and technologies are still lacking. To fill this gap and the unmet computational needs of ongoing research projects, O-miner is developed by the Barts Cancer Institute Bioinformatics Unit as a rapid, comprehensive, efficient web tool that covers all the steps required for the analysis of both transcriptomic, genomic and methylation data starting from raw image files through in-depth bioinformatics analysis and annotation to biological knowledge extraction. O-miner was developed from a biologist end-user perspective and is freely available for use. For more information on the methods used, please view user guide or look for the help and example links on individual pages.

O-miner Transcriptomics

Array-based transcriptomics workflow


Data Input: O-miner provides analysis workflows for raw and normalised data from Affymetrix expression platforms (Genechip Human Genome U133 Plus 2.0, U133A, U133B, U133A 2.0, U95 version 2, U95A-E, Genechip Mouse genome 430 2.0 and Human Gene 1.1ST). The tool also accepts normalised and unnormalised expression data from Illumina expression platforms (Human HT-12 v3 and v4 Expression BeadChip, and MouseRef-8 V2.0 Expression BeadChip), raw data from Affymetrix Genechip Human Exon array 1.0ST, and raw and unnormalised data from the Affymetrix multispecies miRNA array GeneChip miRNA 2.0 and 3.0. If raw data is uploaded to O-miner, the following analytical steps are applied to the data: Quality Control analysis, Normalisation, Filtering of normalised expression matrix, differential gene expression analysis with the R package LIMMA. Probes that are differentially expressed and meet user suppiled cutoffs for log2 fold change and adjusted p values are reported. Optional analyses to identify statistically significant Gene Ontology terms from the differentially expressed results are available. All of the data generated is available for download in a zipped format. Survival analyses are available for all platforms. For data from Affymetrix gene expression platforms, estimates of tumor purity are available using the ESTIMATE algorithm. Meta-analyses can be run using O-miner for datasets from the same platform, where the COMBAT algorithm may be used to combat batch effect amongst datasets.

Transcriptomics workflow Overview

Data Output: An output page is produced with separate tabs for quality control, clustering, differential expression, gene ontology, and various plots and tables.

RNA-seq workflow

Data input : O-miner is able to analyse pre-processed data from RNA-seq experiments. Data must be provided to O-miner in a matrix format of either raw read counts or as an matrix of normalised read counts data (RPKM values). Users are able to choose between two searate methods for differential expression analysis: LIMMA for both normalised and unnormalised data and edgeR for unnormalised read counts data. Genes that meet the user supplied cutoffs for fold change and adjusted P values are reported. Optionally users may choose to run Gene Ontology analysis to identify statistically significant Gene Ontology terms using the R package GoSeq. All of the analyes generated are available for downlaod in a zipped format. Meta-analyses may be performed using post-processed RNA-seq data from more than one experiment, users need to combine the data from each experiment into one large data matrix to be uploaded to O-miner.

Transcriptomics workflow Overview

Data Output: An output page is produced with separate tabs for quality control, clustering, differential expression, gene ontology and various plots and tables.

O-miner Genomics

Copy Number Binary Segmentation (CBS) workflow

Data Input: The O-miner CBS workflow takes as input raw image CEL files, log2ratios, segmented or binary coded data for a number of Affymetrix SNP arrays. Aroma.affymetrix is applied to the raw CEL files to estimate copy numbers, data normalisation, and quality control. Segmentation is applied using the Circular Binary Segmentation (CBS) model using the R package CGHweb. Data is thresholded via the quartile regression framework. Regions of gain and loss are generated and annotated from mulitiple sources. Minimal Common Regions can be calculated using the CGHregions algorithm.

CBS workflow

Data Output: An output page is produced with separate tabs for quality control, sample view (annotated files of regions of gain and loss, log2 ratio plots) and group view (frequency plots).

Allele-specific copy number analysis of tumors (ASCAT) workflow

Data Input: The ASCAT workflow only accepts as input the raw image CEL files from a number of Affymetrix SNP arrays. Log2ratios and B-allele frequencies are calculated for each of the samples in the datasets using the R package CalMaTe. These are fitted to an Allele-Specific Piecewise Constant Fitting (ASPCF) model and the algorithm is used to estimate aberrant cell fraction, tumour ploidy and absolute allele-specific copy number calls. Regions of gain, loss corrected for tumour ploidy as well as regions of LOH are annotated with a variety of optional annotatation systems provided by O-miner.

ASCAT workflow

Data Output: An output page is produced with separate tabs for quality control, sample view (annotated files of regions of gain and loss and LOH, log2ratio, B-allele Frequency plots and ASCAT profiles), and group view (frequency plots).

O-miner Methylation

Data Input: Raw data (idat files) or a normalised data matrix from Illumina Infinium HumanMethylation27 and HumanMethylation450 BeadChip may be used as input to this workflow. Quality control is performed on the raw data using the R package Champ. There is a choice of three normalisation methods: BMIQ, SWAN and PBC. The normalised matrix is then filtered and differential methylation analysis performed using LIMMA. User defined thresholds for the delta beta value and adjusted p-values are applied to the data. Diferentially methylated regions are annotated and users can choose to find statistically significant Gene Ontology terms.

Methylation workflow

Data Output: An output page is produced with separate tabs for quality control, clustering, differential methylation, gene ontology and methylation plots.