GWAS and SNP’s
The International HapMap project was initiated in 2002 to identify common haplotype* variants present in four distinct groups containing descendent from ancestral African, Asian, and European populations (Spencer 2002). As evidenced by analysis of mitochondrial DNA, modern humans evolved approximately 200,000 years ago (Cann, et al.). The utility of haplotype theory derives from the fact that a limited number of recombination events occur per generation, so since the divergence of modern humans ancestral population, there have been a set number of recombination events (Hartwell, et al. 2008).
The general assumption of GWAS is that SNPs generally do not necessarily represent a causal mutation, rather they are a marker for the nearby causal variant that is in linkage disequilibrium* (Wang 2010). Current SNP microarrays genotype a set of SNPs that corresponds to common haplotypes with minor allele frequency (≥5%), as well as a much smaller subset of SNPs with known pathogenesis or genetic cause that has been mined from the literature (Altschuler 2010 and 23andMe 2011).
GWAS use SNP markers that are associated with the haplotypes determined by the HapMap project. The methodology of a GWAS is to select a cohort* of individuals and then to divide that group into two groups: (1) those who have a disease or phenotypic trait and (2) those without the disease or phenotypic trait. All the SNPs in the experimental group are compared with the SNPs in the control group using computational methods, whereby each SNP is individually hypothesis-tested. SNP’s that pass the test with a p-value of less than 1E-5 are causal factors for the disease or trait (The Wellcome Trust Case Control Consortium 2007).
After the GWAS identifies potential risk factor SNPs, the relative risk* of the sub set of individuals with the particular SNP is calculated by determining the ratio of the probability a person with the SNP develops a disease to the probability a person without the SNP develops the same disease (Stewart 2002). The odds ratio*, another statistic to describe the relationship between experimental group and control group, is simply the ratio of the odds of an individual with a risk factor developing a disease to the odds of an individual without the risk factor developing the disease (Goldin 2007). Because all GWAS are performed on samples from the population, the relative risk and odds ratio are reported confidence interval*, typically 95%, that expresses the interval of the whole population odds ratio. As sample size increases, or as it approaches the size of the population, the confidence interval narrows (Bluman 2009).
An important caveat with the odds ratio is that if two separate SNP’s are implicated in the same disease, the odds ratios cannot necessarily be multiplied to determine the odds ratio of an individual with both SNPs. Rather, a separate test using both SNPs as the hypothetical cause of the disease would need to be conducted. The reason for this is that genetic interactions are often complex, and the effect of having both SNPs is difficult to predict based solely on the odds ratio of each independent SNP.
Figure 1: Manhattan Plot

Figure 1: This figure shows the set of SNPs across all 22 autosomal chromsomes and the X chromosome, and specific markers that were shown to be statistically linked to a clinically diagnosed disease. SNP markers must have a p value of less than 1E-5 to be considered a risk factor. Once genetic markers that are clear risk factors for diseae have been identified, the genomic regions immediately around the risk factor can be investigated to determine the disease’s pathogenesis (The Wellcome Trust Case Control Consortium 2007, reproduced with license from Nature Publishing Group, see addendum).
Works Cited:
23andME (2011). “23andWe Research”. [Internet] 23andMe. Available from: https://www.23andme.com/research [Accessed 4 May 2011]
Altshuler DM, et al. (2010). “Integrating common and rare genetic variation in diverse human populations”. Nature. 467, 52-8.
Bluman AG (2009). “The Central Limit Theorem”. Elementary Statistics: A Step by Step Approach. 7e, 331-337.
Cann RL, Stoneking M, and Wilson AC (1987). “Mitochondrial DNA and human evolution”. Nature. 325, 1-7.
Goldin R (2007). “Odds Ratios”. [Internet] Stats. Available from: http://stats.org/stories/2008/odds_ratios_april4_2008.html [Accessed 4 May 2011].
Hartwell LH, Hood L, Goldberg ML, Reynolds AE, Silver LM and Veres RC (2008). “Haplotype Association Studies for High-Resolution Mapping In Humans”. Genetics: From Genes to Genomes. 3e, 423-425.
Spencer, G (2002). “International Consortium Launches Genetic Variation Mapping Project: HapMap Will Help Identify Genetic Contributions to Common Diseases”. [Internet] National Human Genome Research Institute. Available from: http://www.genome.gov/10005336 [Accessed 4 May 2011].
The Welcome Trust Case Control Consortium (2007) “Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls”. Nature. 447, 661-678.
Wang K, Dickson SP, Stolle CA, Krantz ID, Goldstein DB and Hakonarso (2010). “Interpretation of association signals and identification of causal variants from genome-wide association studies”. American Journal of Human Genetics. 86, 730-42.