Introduction

A-MADMAN is a open source web application and gene chip analysis automation framework written in python with a focus on meta-analysis of data published on public repositories (only GEO is supported at moment). It's based on the popular Django web framework and uses GNU R as a backend.

A-MADMAN tries to automate many tedious and error prone steps an investigator has to accomplish to setup a working environment for conducting meta-analyses without giving up power and flexibility.

A-MADMAN supports a collaborative working style for local or geographically dispersed teams through LAN or Internet deployment options (GNU/Linux recommended), but can be used also by a single researcher on his Windows® Personal Computer installing an all-in-one package that bundles all required dependencies except R.

Motivation

Conducting meta-analyses of public available data can be a daunting task for many reasons:

Features

A-MADMAN aims to lower the bar for starting a meta-analysis study offering these features:

In the following image a sketch representing platform integration intricacies is shown:

Usage notes

Assumptions for the following notes are

If you plan to use the standalone all-in-one package please follow the tutorial for windows users instructions instead.

Data Retrieval

Data to retrieve must be specified in a configuration file. The syntax is pure python. You define a simple data structure that specifies the names of series and samples to download.

Syntax example:

	
data={'GSE1004': {'samples': ['GSM15807', 'GSM15822', 'GSM15823', 'GSM15824', 'GSM15825', 'GSM15826', 'GSM15827', 'GSM15828', 'GSM15829', 'GSM15830', ]}, 'GSE1786': {'samples': ['GSM30842', 'GSM30843', 'GSM30844', 'GSM30836', 'GSM30837', 'GSM30838', ]}, }

You can specify the names of samples if you need only a subset of samples from a series or 'all' if you need all.

To start the download process log in to the server and issue from the shell the following command (in the directory where A-MADMAN is installed):

python manage.py geoget --georc yourconfigurationfile.georc

The download process will take a while... (depending on how many series you selected and how fast is your network).
If something goes wrong reissue the command and the process will restart from where it left.

(Meta) Data import

In the server shell (in the directory where A-MADMAN is installed) run :
python manage.py geotodb --georc georc.example --project projectname

This command will create series and samples objects in the database and will import the associated metadata (in the projectname project).

The project must already exists and to be able to manipulate data later in the web application your user has to be a member of the group which owns the project. Define your own policies with the Administration web interface

Assignment

To analyse the samples A-MADMAN needs to know which samples refer to the same patient (or cell line or whatever).

An individual in A-MADMAN is represented simply as a numerical identifier

In the trivial case each sample corresponds to a different individual.

In this case on the series page (assignment is done at the series level for each sample) you can click the link assign of an unassigned sample to reach the Assignment interface. Here you can press the auto button and each unassigned sample of the series will be assigned to a newly created individual.

In a typical situation more samples (corresponding to different cel files) will refer to the same individual.

To assign a sample click its assign link.

To reduce the probability of errors you are not prompted to fill the individual field. Instead if the sample you are assigning refers to an individual not previously seen press the new button and a new individual will be created for you.

Otherwise if the sample refer to an individual you saw before click on the individual number link (notice that the individual field near the sample name will be automatically filled) and press save.

To identify more easily samples referring to the same individual exploiting the sample title field you can filter for specific title words filling the Filter by title field and pressing the filter button.

See the image below to have a glimpse of how assigment information is crucial for production of a an integrated expression matrix. (click on the image for full resolution)

Custom Workflows

A workflow is defined as a django template that gets some python variables in input from the web application and generates at runtime R source code to conduct the analysis. We provide a basic workflow that exposes some entry points to customize the analysis. This is the default workflow:
      
{% extends "basic.rtmpl" %} {% load R %} {% block cdf_flavour %} flavour="ferrari" {% endblock %} {% block signal_reconstruction %} {% for chip_name in chip_names %} eset.{{chip_name}} <- rma(batch.{{chip_name}}) {% endfor %} {% endblock %} {% block additionalcode %} ieset=metanorm(ieset) {% endblock %}
The standard template itself extends a more basic template (basic.rtmpl) which implements the code needed to We provide three entry points to customize things:
  
{% block additionalcode %} ieset=metanorm(ieset) stuff=do_some_stuff_with_ieset(ieset) objects_to_save=c(objects_to_save,"stuff") {% endblock %}

Security Notes

Users are trusted by default and the code they inject in custom workflows is not checked for filtering malicious instructions. The code is passed as it is to the job server and executed with the privileges of the user under which is running.

Analyses freshness

Baskets are 'lazy evaluated' when needed i.e. the samples that satisfy the query are updated when needed. Instead an analysis is a still image of how the baskets looked like when it was run. So can become 'stale' if you add or remove tags that appear on the query. To check if the analysis refers to the current definition of its baskets ('fresh' in A-MADMAN terminology) press the check freshness button on the Analyses page.

Customize A-MADMAN for organisms other than Homo Sapiens.

While giving general instructions we'll walk through a practical example to support a pair of rat chips (Affymetrix Rat Expression Set 230 and 230 2.0).
The OS of reference for the example is GNU/Linux.