Eric Lander—Genomics and Darwin in the 21st Century

Lander began by saying he wasn’t an evolutionist — an interestingly narrow definition of the term. He’s a fan of the research, but considers himself a biomedical geneticist, as if that was something different.

Having entire genomes of many species available for quantitative analysis is going to lead to a qualitative change in the science we can do.

He gave a pocket summary of the human genome project. Mouse genome followed, then rat and dog, and now have sequence (to varying degrees of completeness) of 44 species, out of 4600 mammals. Within Homo, there’s the hapmap project and the 1000 genomes project, so at least in us we’re going for depth and breadth of coverage.

Sequencing technology is rapidly accelerating. Exponential growth in the number of nucleotides sequenced per year. Exponentially on a log scale! We’re developing a tremendous amount of data acquisition capability. We’ll be able to address mechanisms of physiology and evolution, and learning about the particulars of history.

Lander focuses on genome-wide studies. Evolutionary conservation is a guide to extracting information from the genome. Showed synteny diagrams of mouse and human, and discussed analyses that allow you to identify highly conserved pieces, bits that might have significant function.

Number of genes is low, 20,500. Early higher numbers he admitted were inflated a bit by prior expectations; when they had a good estimate of 30,000, they decided to waffle and call it 30-40,000.

If genes are counted by homology, how do we know there aren’t many more genes that don’t have homology. If that were case, the number of genes in humans would still be close to the estimated numbers in chimp and macacque.

There are also well-conserved non-coding regions in DNA. 5% of the genome is under selection: coding 1.2%, non-coding 3.8%. Found 200 gene poor regions that contain key developmental genes, and many of the conserved non-coding regions are associated with them.

Long intergenic non-coding DNA: pretty much all of the genome is transcribed, but the vast majority of this is simply noise. There about a dozen regions known where transcription of non-coding DNA seem to be conserved evolutionarily, and have some function: they be transcriptional repressors.

Mechanism of evolutionary innovation in coding genes: examples of whole genome duplication, divergence and loss, all of which can be demonstrated by comparison with an outgroup. Outgroup comparisons can demonstrate whole genome duplications.

Mechanisms of innovation in non-coding regions: about 84% of conserved DNA is shared between marsupials and placentals, suggesting that about 16% of changes are novel. About 15% of placental specific CNEs are derived from transposons.

With 29 mammalian genomes compared, they have 4 substitutions per site, a detection limit of about 10 bp, and 2.8 million features detected. We have a lot of detail that can be extracted from the data sets.

We can find evidence of positive selection. Using chicken as an outgroup, we can identify genes that have undergone major changes in humans but not chimps. Comparision across 29 mammals shows even more. What we’re finding is that these evolutionarily significant genes are enriched for developmental genes.

Analysis within the human species shows that we are a young population that expanded rapidly from a small initial population of 10,000 individuals. Can now screen for associations between single-nucleotide polymorphisms and disease. We can now screen for 2 million polymorphisms in a single pass on a chip. Have now identified 500 loci associated with common traits. Most have very modest effects and only contribute to a small part of the heritability of the trait. Where is all the missing heritability? Missing loci, missing alleles, and non-additive effects of loci.

Positive selection in human history: can use hapmap data to find 300 regions with outlier distributions that suggest they have been the target of selection. Combining statistical tests narrows the specificity of identification to a size roughly equal to a single gene making it possible to identify specific genes with an interesting selective history (work in press by Pardis Sabeti). There are themes: many of these genes are involved in resisting infectious disease.

Genomics is experiencing an explosion of data that represents a huge opportunity for future discovery.

What caused the Cambrian explosion? MicroRNA!

i-e88a953e59c2ce6c5e2ac4568c7f0c36-rb.png

No, not really — my title is a bit of a sensationalistic exploitation of the thesis of a paper by Peterson, Dietrich, and McPeek, but I can buy into their idea that microRNAs (miRNAs) may have contributed to the pattern of metazoan phylogenies we see now. It’s actually a thought-provoking concept, especially to someone who favors the evo-devo view of animal evolution. And actually, the question it answers is why we haven’t had thousands of Cambrian explosions.

In case you haven’t been keeping up, miRNAs are a hot topic in molecular genetics: they are short (21-23 nucleotides) pieces of single stranded RNA that are not translated into protein, but have their effect by binding to other strands of messenger RNA (mRNA) to which they complement, effectively down-regulating expression of that messenger. They play an important role in regulating the levels of expression of other genes.

One role for miRNAs seems to be to act as a kind of biological buffer, working to limit the range of effective message that can be operating in the cell at any one time. Some experiments that have knocked out specific miRNAs have had a very interesting effect: the range of expressed phenotypes for the targeted message gene increases. The presence or absence of miRNA doesn’t actually generate a novel phenotype, it simply fine-tunes what other genes do — and without miRNA, some genes become sloppy in their expression.

This talk of buffering expression immediately swivels a developmental biologist’s mind to another term: canalization. Canalization is a process that leads organisms to produce similar phenotypes despite variations in genotype or the environment (within limits, of course). Development is a fairly robust process that overcomes genetic variations and external events to yield a moderately consistent outcome — I can raise fish embryos at 20°C or at 30°C, and despite differences in the overall rate of growth, the resultant adult fish are indistinguishable. This is also true of populations in evolution: stasis is the norm, morphologies don’t swing too widely generation after generation, but still, we can get some rapid (geologically speaking) shifts, as if forms are switching between a couple of stable nodes of attraction.

Where the Cambrian comes into this is that it is the greatest example of a flowering of new forms, which then all began diverging down different evolutionary tracks. The curious thing isn’t their appearance — there is evidence of a diversity of forms before the Cambrian, bacteria had been flourishing for a few billion years, etc., and what happened 500 million years ago is that the forms became visible in the fossil record with the evolution of hard body parts — but that these phyla established body plans that they were then locked into, to varying degrees, right up to the modern day. What the authors are proposing is that miRNAs might be part of the explanation for why these lineages were subsequently channeled into discrete morphological pathways, each distinct from the other as chordates and arthropods and echinoderms and molluscs.

[Read more…]

Basics: Imprinting

I’ve been busy — I’m teaching genetics this term, and usually the first two thirds of the course is trivial to prepare for — we’re covering Mendelian genetics, and the early stuff is material the students have seen before and are at least generally familiar with the concepts, and all I have to do is cover them a little deeper and with a stronger quantitative component. That’s relatively easy.

The last part of the course, though, is where we start moving into uncharted waters for them, and every year I have to rethink how I’m going to cover the non-Mendelian concepts, and sometimes my ideas work well, and sometimes they don’t. If I teach it for another 20 years, I’ll eventually reach the point where every lecture has been honed into a comprehensible ideal. At least that’s my dream.

Anyway, one of the subjects we’re covering in the next lecture or two is imprinting, and I know from past experience that this can cause mental meltdowns in my students. This makes no sense if you’re used to thinking in Punnett squares! So I’ve been reworking this little corner of the class, and as long as I’m putting together a ground-up tutorial on the subject, I thought I might as well put it on the web. So here you are, a basic introduction to imprinting.

[Read more…]

Soon, we’ll all have Steve Pinker’s genome to play with

Genome sequencing is getting cheaper and faster, and more and more people are having it done. A new addition to the ranks is Steve Pinker, who contemplates the details of his personal genome in an interesting essay. It’s got to be fascinating, in a terribly self-centered way — I’d love to have a copy of mine someday. It’s an opportunity to see a manifestation of one’s own lineage, your biological history all laid out for you. There’s the ability to compare with others, and see hints of statistical correlations and associations with specific traits and even, unpleasantly, diseases. Pinker also makes the point that you are not determined by your genome — the man famously has a wild head of hair, and as it turns out, he’s also carrying a bit of sequence that seems to predispose carriers to baldness.

At the same time, there is nothing like perusing your genetic data to drive home its limitations as a source of insight into yourself. What should I make of the nonsensical news that I am “probably light-skinned” but have a “twofold risk of baldness”? These diagnoses, of course, are simply peeled off the data in a study: 40 percent of men with the C version of the rs2180439 SNP are bald, compared with 80 percent of men with the T version, and I have the T. But something strange happens when you take a number representing the proportion of people in a sample and apply it to a single individual. The first use of the number is perfectly respectable as an input into a policy that will optimize the costs and benefits of treating a large similar group in a particular way. But the second use of the number is just plain weird. Anyone who knows me can confirm that I’m not 80 percent bald, or even 80 percent likely to be bald; I’m 100 percent likely not to be bald. The most charitable interpretation of the number when applied to me is, “If you knew nothing else about me, your subjective confidence that I am bald, on a scale of 0 to 10, should be 8.” But that is a statement about your mental state, not my physical one. If you learned more clues about me (like seeing photographs of my father and grandfathers), that number would change, while not a hair on my head would be different. Some mathematicians say that “the probability of a single event” is a meaningless concept.

Another thing I should think having a copy of your genome should drive home is how much of it is incomprehensible; we simply don’t know what most of it does, and even in the example mentioned above, we don’t have a causal relationship between one variant of the rs2180439 SNP and head hair, only a rough correlation. That’s the promise of the future, that we can now get copies of this book of our genome…we just have to get to work learning how to read it.

I’ve got my eye on the progress in genome sequencing. When the price hits $1000 (which isn’t at all unlikely to occur in my lifetime), I know I’m going to have it done, just because it’s a book I’ve been waiting most of my life to read.

Copy Number Variants are not evidence of design

i-e88a953e59c2ce6c5e2ac4568c7f0c36-rb.png

The Institute for Creation Research has a charming little magazine called “Acts & Facts” that prints examples of their “research” — which usually means misreading some scientific paper and distorting it to make a fallacious case for a literal interpretation of the bible. Here’s a classic example: Chimps and People Show ‘Architectural’ Genetic Design, by Brian Thomas, M.S. (Note: this is not the peer-reviewed research paper implied by the logo to the left — that comes later.) The paper is a weird gloss on recent work on CNVs, or copy number variants. Mr Thomas makes a standard creationist inference that I have to hold up for public ridicule.

[Read more…]

My human lineage

This is a very simple, lucid video of Spencer Wells talking about his work on the Genographic Project, the effort to accumulate lots of individual genetic data to map out where we all came from.

I’ve also submitted a test tube full of cheek epithelial cells to this project, and Lynn Fellman is going to be doing a DNA portrait of me. I had my Y chromosome analyzed just because my paternal ancestry was a bit murky and messy and potentially more surprising, and my mother’s family was many generations of stay-at-home Scandinavian peasantry, so I knew what to expect there. Dad turned out to be not such a great surprise, either. I have the single nucleotide polymorphism M343, which puts me in the R1b haplogroup, which is just the most common Y haplogroup in western Europe. I share a Y chromosome with a great many other fellows from England, France, the Netherlands, etc., which is where the anecdotal family history suggested we were from (family legend has it that the first American Myers in my line was a 17th or 18th century immigrant from the Netherlands). Here’s a map of where the older members of my lineage have been from: Africa (of course!) by way of a long detour through central Asia.

i-9897d9b90311be17c7a9406d91fcf72f-M343.jpg

Hello, many-times-great-grandpa! That’s quite the long walk your family has taken. Howdy, great big extended family! We’ll have to get together sometime and keep in touch.

If you’re interested in finding out what clump of humanity you belong to, it’s easy: you can order a $100 kit, swab out a few cheek cells (just like they do on CSI or Law & Order!), mail it back, and a few weeks later, they send you your results. It’s not very detailed — they only analyze a small number of markers — but it’s enough to get a rough picture of where your branch of the family tree lies. And for a bit more, Lynn can turn it into something lovely for your wall.

By the way, Lynn and I will be talking about the science and art of human genetics in a Cafe Scientifique session in Minneapolis in February.

Epigenetics

Blogging on Peer-Reviewed Research

Epigenetics is the study of heritable traits that are not dependent on the primary sequence of DNA. That’s a short, simple definition, and it’s also largely unsatisfactory. For one, the inclusion of the word “heritable” excludes some significant players — the differentiation of neurons requires major epigenetic shaping, but these cells have undergone a terminal division and will never divide again — but at the same time, the heritability of traits that aren’t defined by the primary sequence is probably the first thing that comes to mind in any discussion of epigenetics. Another problem is the vague, open-endedness of the definition: it basically includes everything. Gene regulation, physiological adaptation, disease responses…they all fall into the catch-all of epigenetics.

[Read more…]

Evolgen disputes my explanation!

RPM of Evolgen disagrees with my definition of synteny! This is terribly distressing. Especially since, strictly speaking, he is precisely correct. The word has evolved in its usage from the pure form that RPM is describing to a more colloquial, pragmatic, somewhat sloppier sense as used by people looking at comparative genomics rather than classical Drosophila genetics.

If you read contemporary evo-devo papers, my definition is more useful in comprehending what they’re saying. If you want to read Drosophila genetics papers, you better know what RPM is talking about, or god help you (and there is no god).

Amphioxus and the evolution of the chordate genome

Blogging on Peer-Reviewed Research

This is an amphioxus, a cephalochordate or lancelet. It’s been stained to increase contrast; in life, they are pale, almost transparent.

i-56ee51e328b10451feb168cd9bab0ea5-amphioxus.jpg

It looks rather fish-like, or rather, much like a larval fish, with it’s repeated blocks of muscle arranged along a stream-lined form, and a notochord, or elastic rod that forms a central axis for efficient lateral motion of the tail…and it has a true tail that extends beyond the anus. Look closely at the front end, though: this is no vertebrate.

i-e961343b9bf5b6dfd74823fba6deeafb-amphioxus_closeup.jpg

It’s not much of a head. The notochord extends all the way to the front of the animal (in us vertebrates, it only reaches up as far as the base of the hindbrain); there’s no obvious brain, only the continuation of the spinal cord; there isn’t even a face, just an open hole fringed with tentacles. This animal collects small microorganisms in coastal waters, gulping them down and passing them back to the gill slits, which aren’t actually part of gills, but are components of a branchial net that allows water to filter through while trapping food particles. It’s a good living — they lounge about in large numbers on tropical beaches, sucking down liquids and any passing food, much like American tourists.

These animals have fascinated biologists for well over a century. They seem so primitive, with a mixture of features that are clearly similar to those of modern vertebrates, yet at the same time lacking significant elements. Could they be relics of the ancestral chordate condition? A new paper is out that discusses in detail the structure of the amphioxus genome, which reveals unifying elements that tell us much about the last common ancestor of all chordates.

[Read more…]