Thinking about it today, I realized there is a “Basic Concept” that I think I should touch upon, and that is linkage disequilibrium (LD). Notice the wiki link? I do that whenever I mention LD because it is such an essential concept for some of the evolutionary ideas which I am interested in, but often not necessarily a transparent or clear one to the lay person.
Its lack of obviousness isn’t due to complexity, LD is pretty simple, rather there are particular background ideas which one needs to firmly have in mind before one can easily grasp it. For this reason I’ve placed an image of a chromosome to the left. LD is not a purely intrachromosomal concept, but, I believe a biophysical model is important in understanding it, so I will use this image for illustrative purposes in the following post. So, you know that the human genome is divided physically into chromosomes, and each chromosome consists of two sister strands of DNA, chromatids. As you see to the left diploid organisms have two copies of a gene, alleles, at each “locus.” A locus is obviously an abstract concept, it is basically a synonym for a gene. Assuming we have “gene” under our belts we can now conceive of a strand of DNA which is saturated with various genomic regions, introns, exons, intra and intergenic regions, etc. The details aren’t particularly relevant to LD, just remember that locus 1 and locus 2 on the same chromosomal strand are a particular physical distance apart.
Now look at the image to the left. The numerical and letter notation refers to the locus and chromosomal arm position, respectively. Each of the four “slots” represents one of two diploid copies of the gene inherited from one parent. The + & – script represents, for ease of conception, functional and non-functional copies of the gene. Mendel’s Laws tell us that the identity of an allele at locus 1 should not give us any information about the identity of the corresponding allele on the same chromatid. In other words, just because on copy “A” locus 1 is – should not tell us whether there is a greater likelihood of locus 2 on copy “A” being + or -.
That is where linkage disequilibrium comes it: LD basically measures the deviation from this expectation of non-association along the genome. As I noted above, though I am using the example on one chromosome, this can apply throughout the genome (my own interest is specifically in physically continguous genomic regions, more on this below). The mathematical calculation of expectation is simple algebra, and I won’t reprise the explanation offered in the wikipedia entry for “D.” But, I will point to three cases where LD could exist.
1) Consider a circumstance where there is an epistatic interaction between two loci contingent upon the alleles. Imagine a infection which is lethal to individuals with null copies (-) of the alleles above for locus 1 and locus 2 on the same chromatid (imagine that locus 1 & locus 2 enter into cis interactions). This is a case of LD being generated by fitness consequences because of genetic combinations. If the null and functional copies exist at high frequencies (e.g., both start at around 0.5 in generation 1), then you would have a situation where the extant proportion of individuals with genotypes which are shifted toward mixed (+/-) or functional alleles (+/+) along the genome at the loci would be higher than expectation. The presence of one null allele on a given locus can immediately tell you that the other locus does not have a null allele, because that combination is lethal. Of course, over time selection would expunge the variation which generated this LD, as the null alleles would decrease in frequency.
2) Consider a circumstance where two populations, previously separated, come into contact (e.g., an isthmus connects two islands). If the populations exhibit alternative alleles to fixation on several loci, and those loci are on the same chromatid, one can envisage a situation where two alleles on two loci exhibit linkage over many generations. The issue here is that synteny takes time to be broken apart by recombination, so the genetic complexes which had fixed in the parents will carry on and be passed into the subsequent admixed generations until crossing over disrupts the physical association. To give a concrete example, imagine a locus for eye color and hair color. Imagine one population is fixed for “white” for both (100%) and another is fixed for “black” (100%). The first generation would be totally heterozygous intralocus, but, each chromatid with a “white” eye color copy would also have a “white” hair color copy, and vice versa. Over the generations recombination would result in swapping of partners and eventually one would not be able to predict whether the downstream gene was “white” or “black” based on its physical partner, but that would take time.
3) Finally, the one I am most interested in because of its evolutionary historical significance, and that is LD generated by selective sweeps. Imagine a table top that is little used. Over time it builds up a layer of dust which disrupts it smooth symmetry. Now, consider someone sliding a towel over its surface. Across the region that the towel traversed the dust will be swept away and a smooth and clear symmetrical surface will now shine, bordered still by expanses of dust. Over time the smooth region will become obscured by dust once more and fade into the background. The analogy I am making is that the dust is genetic variation, while the sweeping towel is a selection event. If a new mutant allele confers great selective benefits then it can rise in frequency precipitously. If this rise is faster than recombination can destroy genetic associations, other alleles in nearby regions can be “swept” along in a hitchhike. If one imagines a scenario where the likelihood of a “break” along with recombination occurs is equally distributed across the genome (this is not true, but accept it for simplification) then decreasing the distance between two genes along a chromatid decreases the likelihood of a recombination event separating them on any given meiosis. Since a subset of the genome is less diverse than the full genome a selective sweep which favors a coterie of neighboring genes and alleles has a homogenizing effect, generating “long haplotypes,” genomic regions cleansed of variation. Of course, this variation eventually reemerges via mutation and recombination, but it takes time.
And so there you have linkage disequilibrium. As you might notice, LD is epiphenomenal. It is a passing fad, but since it erupts periodically it can be an excellent marker for the historically contingent events which are important in evolution.