===================
Expression analysis
===================

BioTK aims to provide an alternative to the standard R/Bioconductor
environment to perform run-of-the-mill differential expression analyses. Thus,
BioTK has the ability to perform all the standard steps in a differential
expression analysis pipeline:

1. Loading raw or preprocessed data
2. Preprocessing and normalizing the data
3. Finding differentially expressed probes/genes between conditions
4. Analyses of DE gene lists:
   - Performing enrichment analyses against ontologies
   - Visualizing expression or DE results as heatmaps or networks

There are also features for downstream analyses of and methods to take large
collections of expression data, from GEO, in-house data, or a combination
thereof, and use these collections for large-scale meta-analysis.

.. todo::
    - put a simple example of a complete-ish analysis here
    - possibly explain important data structures?

Loading expression data
=======================

From Affymetrix CEL files
-------------------------

From GEO
--------

From RNA-seq aligned reads
--------------------------

Normalizing expression data
===========================

Quantile normalization
----------------------

Differential expression
=======================

Currently, the available differential expression algorithms are:

- t-test
- ANOVA
- SAM  
  
In the future, we plan to provide either a port or a simplified Python
interface to the R package limma, which is one of the most popular tools for
finding DE genes.

T-test
------

ANOVA
-----

SAM
---

Visualization
=============

Heatmap
-------

Enrichment analysis
===================

Meta-analysis
=============

BioTK can store large amounts of expression data from multiple experiments and
even multiple organisms and efficiently perform meta-analyses on this data.
Please see :ref:`meta_analysis`.