HaploPool |
|
| What is HaploPool? | |||
|
HaploPool is a program for estimating haplotype frequencies either from
genotypes of individuals or from genotypes of pooled individuals.
The genotypes must be for a block of bi-allelic SNPs (meaning that the SNPs
should be in linkage disequilibrium with each other).
The program assumes that it is given many genotypes of unrelated diploid
individuals in Hardy-Weinberg equilibrium. If the genotypes are from
pooled DNA, the program assumes that every pool contains the same number
of individuals and the individuals were chosen at random when placed into
the pools. For a reasonable running-time, the number of individuals
in a pool needs to be between 2 and 4.
| |||
| Download HaploPool | |||
|
Requires gcc/g++, perl, and MATLAB.
Register and Download HaploPool
For the installation instructions: | |||
| Instructions | |||
|
REQUIRED SOFTWARE
gcc/g++ INSTALLATION
Unpack the zip file in a directory of your choice.
$ cd [DIR] TESTING HAPLOPOOL
To see the command line options for HaploPool:
To test the installation of HaploPool: The different result files contain the haplotype frequency estimates. Each result# file was produced by a distinct variant of HaploPool:
These results can be compared to the expected results which were obtained from a correctly installed version of HaplPool. Notice that repeated executions of HaplPool will obtain slightly different results, because several of the algorithms use random numbers.
Note: simultaneous executions of HaploPool must be executed in distinct directories, because HaploPool creates several intermediate files. | |||
| Pool-Genotype File Format | |||
|
An input pool-genotype file should contain the allele counts at each SNP.
Every genotype appears on a separate line of the file.
The SNPs of the genotype appear consecutively and are not separated
by spaces. For each SNP an integer number is which indicates the
observed 1-allele count at that SNP. Remember that the data must be for
bi-allelic SNPs. For a given SNP, it is irrelevant which allele
is labeled with a '1', as long as the every genotype gives the
allele-count of the same allele.
For example, the following file is valid input to HaploPool:
223000022
444000044
334000033
224002222
224001122
004003300
223000022
Missing allele-counts should be indicated with a question-mark, '?'. For example, the above file with missing data might be:
223000022
4?4000044
3340000?3
224002222
22?001122
004003300
2230?0022
| |||
| References | |||
|
Kirkpatrick, B. UC Berkeley Masters Thesis. 2007. (abstract,
pdf) Kirkpatrick, B, Santos-Armendariz, C., Karp, R.M., and Halperin, E. HaploPool: Improving Haplotype Frequency Estimation through DNA Pools and Phylogenetic Modeling. Bioinformatics. 23: 3048-3055. 2007. (abstract) | |||