javascript hit 
counter

HaploPool

Reference
Kirkpatrick, B, Santos-Armendariz, C., Karp, R.M., and Halperin, E. HaploPool: Improving Haplotype Frequency Estimation through DNA Pools and Phylogenetic Modeling. Bioinformatics. 23: 3048-3055. 2007. (abstract)

What is HaploPool?
HaploPool is a program for estimating haplotype frequencies either from genotypes of individuals or from genotypes of pooled individuals. The genotypes must be for a block of bi-allelic SNPs (meaning that the SNPs should be in linkage disequilibrium with each other). The program assumes that it is given many genotypes of unrelated diploid individuals in Hardy-Weinberg equilibrium. If the genotypes are from pooled DNA, the program assumes that every pool contains the same number of individuals and the individuals were chosen at random when placed into the pools. For a reasonable running-time, the number of individuals in a pool needs to be between 2 and 4.

Download HaploPool
Requires gcc/g++, perl, and MATLAB.

Register and Download HaploPool

For the installation instructions:
$ unzip haplopool.zip
$ cat haplopool/README

Instructions
REQUIRED SOFTWARE

gcc/g++
perl
MATLAB

INSTALLATION

Unpack the zip file in a directory of your choice.
Call that directory [DIR].
At the command prompt, execute the following commands.

$ cd [DIR]
$ perl install.pl
$ make all

TESTING HAPLOPOOL

To see the command line options for HaploPool:
$ ./haplopool

To test the installation of HaploPool:
$ cd [DIR]/examples/
$ ../haplopool sample_pools2_37 result0 150 2 0 v
$ ../haplopool sample_pools2_37 result1 150 2 1 v
$ ../haplopool sample_pools2_37 result2 150 2 2 v

The different result files contain the haplotype frequency estimates. Each result# file was produced by a distinct variant of HaploPool:

  • result0 -- from perfect phylogeny model and regression
  • result1 -- from perfect phylogeny model (without regression)
  • result2 -- from regression (without perfect phylogeny)

These results can be compared to the expected results which were obtained from a correctly installed version of HaplPool. Notice that repeated executions of HaplPool will obtain slightly different results, because several of the algorithms use random numbers.

  • expected-result0
  • expected-result1
  • expected-result2
These results can also be compared to the haplotype frequencies distribution that the pools were simulated from: realFreqs_pop_37

Note: simultaneous executions of HaploPool must be executed in distinct directories, because HaploPool creates several intermediate files.

Pool-Genotype File Format
An input pool-genotype file should contain the allele counts at each SNP. Every genotype appears on a separate line of the file. The SNPs of the genotype appear consecutively and are not separated by spaces. For each SNP an integer number is which indicates the observed 1-allele count at that SNP. Remember that the data must be for bi-allelic SNPs. For a given SNP, it is irrelevant which allele is labeled with a '1', as long as the every genotype gives the allele-count of the same allele.

For example, the following file is valid input to HaploPool:

     223000022
     444000044
     334000033
     224002222
     224001122
     004003300
     223000022

Missing allele-counts should be indicated with a question-mark, '?'. For example, the above file with missing data might be:

     223000022
     4?4000044
     3340000?3
     224002222
     22?001122
     004003300
     2230?0022

References
Kirkpatrick, B. UC Berkeley Masters Thesis. 2007. (abstract, pdf)
Kirkpatrick, B, Santos-Armendariz, C., Karp, R.M., and Halperin, E. HaploPool: Improving Haplotype Frequency Estimation through DNA Pools and Phylogenetic Modeling. Bioinformatics. 23: 3048-3055. 2007. (abstract)