Expression analysis¶
BioTK aims to provide an alternative to the standard R/Bioconductor environment to perform run-of-the-mill differential expression analyses. Thus, BioTK has the ability to perform all the standard steps in a differential expression analysis pipeline:
- Loading raw or preprocessed data
- Preprocessing and normalizing the data
- Finding differentially expressed probes/genes between conditions
- Analyses of DE gene lists: - Performing enrichment analyses against ontologies - Visualizing expression or DE results as heatmaps or networks
There are also features for downstream analyses of and methods to take large collections of expression data, from GEO, in-house data, or a combination thereof, and use these collections for large-scale meta-analysis.
Todo
- put a simple example of a complete-ish analysis here
- possibly explain important data structures?
Differential expression¶
Currently, the available differential expression algorithms are:
- t-test
- ANOVA
- SAM
In the future, we plan to provide either a port or a simplified Python interface to the R package limma, which is one of the most popular tools for finding DE genes.
T-test¶
ANOVA¶
SAM¶
Enrichment analysis¶
Meta-analysis¶
BioTK can store large amounts of expression data from multiple experiments and even multiple organisms and efficiently perform meta-analyses on this data. Please see Transcript expression meta-analysis.