Expression analysis¶

BioTK aims to provide an alternative to the standard R/Bioconductor environment to perform run-of-the-mill differential expression analyses. Thus, BioTK has the ability to perform all the standard steps in a differential expression analysis pipeline:

Loading raw or preprocessed data
Preprocessing and normalizing the data
Finding differentially expressed probes/genes between conditions
Analyses of DE gene lists: - Performing enrichment analyses against ontologies - Visualizing expression or DE results as heatmaps or networks

There are also features for downstream analyses of and methods to take large collections of expression data, from GEO, in-house data, or a combination thereof, and use these collections for large-scale meta-analysis.

Todo

put a simple example of a complete-ish analysis here
possibly explain important data structures?

Loading expression data¶

From Affymetrix CEL files¶

From GEO¶

From RNA-seq aligned reads¶

Normalizing expression data¶

Quantile normalization¶

Differential expression¶

Currently, the available differential expression algorithms are:

t-test
ANOVA
SAM

In the future, we plan to provide either a port or a simplified Python interface to the R package limma, which is one of the most popular tools for finding DE genes.

T-test¶

ANOVA¶

SAM¶

Visualization¶

Heatmap¶

Enrichment analysis¶

Meta-analysis¶

BioTK can store large amounts of expression data from multiple experiments and even multiple organisms and efficiently perform meta-analyses on this data. Please see Transcript expression meta-analysis.