Expression analysis

BioTK aims to provide an alternative to the standard R/Bioconductor environment to perform run-of-the-mill differential expression analyses. Thus, BioTK has the ability to perform all the standard steps in a differential expression analysis pipeline:

  1. Loading raw or preprocessed data
  2. Preprocessing and normalizing the data
  3. Finding differentially expressed probes/genes between conditions
  4. Analyses of DE gene lists: - Performing enrichment analyses against ontologies - Visualizing expression or DE results as heatmaps or networks

There are also features for downstream analyses of and methods to take large collections of expression data, from GEO, in-house data, or a combination thereof, and use these collections for large-scale meta-analysis.

Todo

  • put a simple example of a complete-ish analysis here
  • possibly explain important data structures?

Loading expression data

From Affymetrix CEL files

From GEO

From RNA-seq aligned reads

Normalizing expression data

Quantile normalization

Differential expression

Currently, the available differential expression algorithms are:

  • t-test
  • ANOVA
  • SAM

In the future, we plan to provide either a port or a simplified Python interface to the R package limma, which is one of the most popular tools for finding DE genes.

T-test

ANOVA

SAM

Visualization

Heatmap

Enrichment analysis

Meta-analysis

BioTK can store large amounts of expression data from multiple experiments and even multiple organisms and efficiently perform meta-analyses on this data. Please see Transcript expression meta-analysis.