Expression analysis

BioTK aims to provide an alternative to the standard R/Bioconductor environment to perform run-of-the-mill differential expression analyses. Thus, BioTK has the ability to perform all the standard steps in a differential expression analysis pipeline:

  1. Loading raw or preprocessed data
  2. Preprocessing and normalizing the data
  3. Finding differentially expressed probes/genes between conditions
  4. Analyses of DE gene lists: - Performing enrichment analyses against ontologies - Visualizing expression or DE results as heatmaps or networks

There are also features for downstream analyses of and methods to take large collections of expression data, from GEO, in-house data, or a combination thereof, and use these collections for large-scale meta-analysis.


  • put a simple example of a complete-ish analysis here
  • possibly explain important data structures?

Loading expression data

From Affymetrix CEL files

From GEO

From RNA-seq aligned reads

Normalizing expression data

Quantile normalization

Differential expression

Currently, the available differential expression algorithms are:

  • t-test
  • SAM

In the future, we plan to provide either a port or a simplified Python interface to the R package limma, which is one of the most popular tools for finding DE genes.






Enrichment analysis


BioTK can store large amounts of expression data from multiple experiments and even multiple organisms and efficiently perform meta-analyses on this data. Please see Transcript expression meta-analysis.