The junk just grows

I keep telling people there’s junk in them thar genomes, and lots of it. ERV has a nice summary of some recent work in which a detailed comparison of the distribution of LINE retrotransposons (selfish DNA that does nothing but insert copies of itself into your DNA; they make up about 20% of your genome) was made — and what they found was that variations were common, new insertions were relatively frequent, and that they can be used to map out human relatedness (although we’ve known that one for a while).

  • LINEs are super useful for determining/tracing human ancestry. As Im sure you all have heard me mention before, new bits of junk that arent ‘supposed’ to be there make excellent connect-the-dots pictures for ancestry (this isnt new, but its still neat).

  • Any two people had ~285 differences in human-specific LINEs (148 minimum, 422 maximum). I have some you dont have, you have some I dont have.

  • They did not check to see whether these LINEs were homozygous (in both copies of your genome). There is probably even more diversity here– I have some LINEs you dont have, but some of them are just on one of my two copies of DNA in each cell. Since we only pass on one copy of our DNA to our offspring, this has an effect on how fast/whether new LINEs are fixed in a population or not. Genetic drift, w00t!

  • The older the LINE, the more likely we all have it in common (we are all related!).

  • Brand spanking new LINEs pop up in ~1/140 births. So, of the ~6 billion people alive today, there are ~30 million new LINEs between us all. If all of these junk LINEs are precious and specially created by Teh Designer, all of us without the new sacred LINEs should be dead. We aint. So…

Cool stuff. There’s a lot of churning over of these things, which is one indication that they’re non-essential junk.

I ain’t afraid of no Frankenstein

They’re discussing Venter’s nifty new toy on Edge, and I’ve tossed my own contribution into the mix. It’s a response to the doomsday fears I keep seeing expressed in response to the success of this project.

I have to address one narrow point that is being discussed in the popular press and here on Edge: is Venter’s technological tour de force a threat to humanity, another atom bomb in the hands of children?

No.

There is a threat, but this isn’t it. If you want to worry, think about the teeming swarms of viruses, bacteria, fungi, and parasites that all want to eat you, that are aided (as we are defended) by the powers of natural selection–we are a delectable feast, and nature will inevitably lead to opportunistic dining. That is a far, far bigger threat to Homo sapiens, since they are the product of a few billion years of evolutionary refinement, not a brief tinkering probe into creation.

Nature’s constant attempts to kill us are often neglected in these kinds of discussions as a kind of omnipresent background noise. Technology sometimes seems more dangerous because it moves fast and creates novelty at an amazing pace, but again, Venter’s technology isn’t the big worry. It’s much easier and much cheaper to take an existing, ecologically successful bug and splice in a few new genes than to create a whole new creature from scratch…and unlike the de novo synthesis of life, that’s a technology that’s almost within the reach of garage-bound bio-hackers, and is definitely within the capacity of many foreign and domestic institutions. Frankenstein bacteria are harmless compared to the possibilities of hijacking E. coli or a flu virus to nefarious ends.

The promise and the long-term peril of the ability to synthesize new life is that it will lead to deeper understanding of basic biology. That, to me, is the real potential here: the ability to experimentally reduce the chemistry of life to a minimum, and use it as a reductionist platform to tease apart the poorly understood substrates of life. It’s a poor strategy for building a bioweapon, but a great one for understanding how biochemistry and biology work. That is the grand hope that we believe will give humanity an edge in its ongoing struggle with a dangerous nature: that we can bring forethought and deliberate, directed opposition to our fellow organisms that bring harm to us, and assistance to those that benefit us. And we need greater knowledge to do that.

Of course more knowledge brings more power, and more possibility of catastrophe. But to worry over a development that is far less immediately dangerous than, say, site-directed mutagenesis, is to have misplaced priorities and to be basically recoiling from the progress of science. We either embrace the forward rush to greater knowledge, or we stand still and die. Alea iacta est; I look forward to decades of revolutionary new ideas and discoveries and technologies. May we have many more refinements of Venter’s innovation, a flowering of novel life forms, and deeper analyses of the genome.

There’s more at the link, with contributions from Richard Dawkins, George Church, Nassim N. Taleb, Daniel C. Dennett, Dimitar Sasselov, Antony Hegarty, George Dyson, Kevin Kelly, and Freeman Dyson so far. I have to say I like Church’s response best so far, since he tries to put it into an appropriate perspective.

Junk is what junk does

Randy Stimpson is someone a few may recall here: he was a particularly repetitious and dishonest creationist who earned himself a spot in the dungeon. One of the hallmarks of his obtuse way of ‘thinking’ is that he is a computer programmer, and so he was constantly making the category error of assuming the genome was a computer program, and therefore the product of intelligent design (never noticing that he himself is an example of how programming a computer requires relatively little intelligence). He objects to the notion of junk DNA on the Panda’s Thumb, and I just have to tear apart his nonsensical assertions there.

I don’t think we should rush to conclude that highly repetitive DNA is junk. I know it would be a mistake to think that about software. If you look at software executables (like .exe and .dll files on Windows computers) they are full of repeated sequences. You may have written a program yourself. If so, you would certainly be familiar with the concept of a subroutine or a method. At the assembly level, whenever a subroutine is called registers are pushed on the stack, when one returns they are popped of the stack. The code to push and pop registers is automatically generated by the complier and is therefore not apparent at the source code level. This translates into a massive amount of simplistic repetition at the binary level. These kinds of repetitive sequences would probably be classified as SINES by geneticists trying to understand the binary code. While this kind of code doesn’t map to any kind of a program function it is essential.

You may also know that most software developers these days work with object oriented languages where inheritance and polymorphism are used to develop hierarchies of classes. At the source code level inheritance enables developers to reuse source code without retyping it. However, when source code is compiled into binary form the result is a massive amount of repetition, but of a more sophisticated nature than that of just pushing and popping registers. These kinds of repetitive sequences would probably be classified as LINES.

I am familiar with software on far more intimate terms than most: I used to write code in assembly language, and could even read simple machine code on old 8-bit processors. I hacked together a p-code disassembler once upon a time. So yeah, I know what raw code looks like, as I’ll assume Stimpson does, as well.

I also know what DNA sequences look like. I can tell that Stimpson doesn’t have the slightest clue. No, the code to push and pop registers in a routine looks nothing like SINES, not in its distribution or in its pattern. No, standard library link codes look nothing like LINES in distribution or pattern, either. Since he mentions them, I can also explain that we know exactly what LINES and SINES do — he seems to assume that biologists must be idiots who haven’t bothered to look at the function of sequences. It’s a lovely example of projection, since it is obvious that Stimpson has never bothered to look at what these sequences are.

A LINE is a Long Interspersed Nuclear Element. Some LINEs are actually a sort of functional gene that can be transcribed and translated; they are about 6500 base pairs long and encode a couple of proteins that do something very specific: they assemble into a complex that includes a strand of their own RNA (usually), migrate into the nucleus, where they nick the DNA and insert a copy of the RNA sequence into the genome. That’s all they do, over and over. They’re a kind of self-contained Xerox machine that spews more copies of themselves, which can make more copies of themselves, which can make more copies of themselves. They are not typically associated with any of your useful genes.

How many copies do they make? Your genome contains approximately 868,000 copies of various LINE genes. Over 20% of your genome is nothing but this parasitic self-copier — it’s like spam all over the place. Don’t panic, though: this is another indicator of its status as useless junk, in that almost all of the copies are nonfunctional, either because they were sloppily inserted and are broken, or because they’ve accumulated destructive mutations (there is no harm to the reproductive capacity of the organism bearing them if a LINE acquires a stop codon), and because cells actively repress these parasites by, for instance, methylation and inactivation of stretches of DNA saturated with LINEs. Out of that huge number of copies, only 20-50 are estimated to retain any activity.

If Mr Stimpson wants to consider computer analogies, I ask: what do we call a code sequence that has only one function, the repeated duplication of copies of itself in the operating system? Do we consider that a functional and useful part of the computer, or do we try to get rid of them?

SINEs, or Short Interspersed Nuclear Elements, are even more common — your genome contains 1.6 million copies of various SINEs, taking up 13% of the genome (a lower percentage because even though there are more of them, they are shorter than LINEs). And remember, you only contain about 20,000 genes total, or about 1% of the number of SINEs. A SINE is basically a truncated LINE, or any short sequence that contains regions preferentially recognized by the LINE transcriptase, so that it is carried into the nucleus and repeatedly inserted.

That’s right. A SINE is a parasite of a parasite.

Other repetitive elements are, for example, endogenous retroviruses: relics of past viral infections. These viruses make copies of themselves into the host DNA, and in ERVs we don’t just find transcriptase enzymes — we find viral coat proteins. These are sequences that also have a known function, as sites for the synthesis of infectious disease particles. So, sure, you could say they do something — it’s just not for our benefit.

Could these repetitive sequences do anything useful? Yes, to a small degree, and we even have examples of it…unfortunately, every time someone finds a rare example of a functional piece of repetitive DNA, the ignoramuses rhapsodize about how this demonstrates it could all be useful. No, it doesn’t.

For example, one role of some junk could be in position effects. We know that if a useful gene is located next to a chunk of inactivated DNA, its expression may be downregulated to some degree — it’s a kind of spillover of a passive effect of living next to a junkyard.

Since some of these junk DNA sequences are retrotransposons that insert themselves arbitrarily into the genome, they can also be a source of mutations; some may even find portions of their sequence incorporated into the product of a functional gene. An evolutionary biologist can see this as a possibly, rarely fruitful contribute to genetic diversity, but it should give no comfort to creationists, who don’t much care for chance insertions and random variation.

There are other uses for some junk. There are structural regions of the chromosome, such as the area around the centromere, that are devoid of genes but just contain many repeats of short, untranscribed sequences. These are a kind of generic handle for proteins to glom onto, and contribute in a general way to how the chromosome works in the cell. There is also a general property of cell growth, that one of the triggers for cell division is the ratio of nuclear to cytoplasmic volume, so puffing up the genome with lots of extraneous nucleotides can lead to larger cells. Both of these functions, though, are not very sequence dependent — so sure, you could say they have a rough, general role: they are the plastic boxes and styrofoam packing peanuts of the functional elements of the genome. They may do something, but it’s not specific, and it’s not particularly dependent on the code.

Junk DNA isn’t merely stuff that we don’t understand. It’s stuff that we know something about, and know how it fits into the ecosystem of the cell, and that we call junk because we know what it does — it mainly sits up in the attic, garage, and basement, gathering dust and taking up space.

Mr Stimpson: go read a decent molecular biology and genetics book, and stop relying on your irrelevant software manuals and the dishonest and ignorant pratings of your fellow creationists.

First round of ill-informed objections to the first synthetic bacterium

I’ve been following the reaction to the synthesis of a new life form by the Venter lab with some interest and amusement. There have been a couple of common directions taken, and they’re generally all wrong. This is not to say that there couldn’t be valid concerns, but that the loudest complaining voices right now are the most ignorant.

Hysteria and fear-mongering

Pearl-clutching and fretting over the consequences is fairly common, with a representative example from The Daily Mail (Stridently stupid ‘journalistic’ outlet).

But there are fears that the research, detailed in the journal Science, could be abused to create the ultimate biological weapon, or that one mistake in a lab could lead to millions being wiped out by a plague, in scenes reminiscent of the Will Smith film I Am Legend.

The article refers to that awful movie a couple of times. It’s a little baffling; were they getting kickbacks from the movie producers or something?

The complaint is misplaced. What they’ve accomplished is to synthesize a copy of an existing organism, with a few non-adaptive markers added. It’s no threat at all. We do have the potential to now modify that genome more extensively; the interesting scientific work will be to pare away genes and reduce it to a truly minimalist version, just to see how much is really essential, and the useful industrial work will be to engineer organisms with additional genes that produce proteins useful for us, but not necessarily for the mycoplasma. That’s going to compromise the competitiveness of the organism in the natural environment. I’m not worried.

Maybe someday when organisms can be built in some psychopath’s garage, then we should worry. But for now, this is an experiment that takes a lot of teamwork and money and experience to pull off.

Playing GOD!

That same Daily Mail article goes on and on about that cliche.

Pat Mooney, of the ETC group, a technology watchdog with a special interest in synthetic biology, said: ‘This is a Pandora’s box moment – like the splitting of the atom or the cloning of Dolly the sheep, we will all have to deal with the fall-out from this alarming experiment.’

Dr David King, of the Human Genetics Alert watchdog, said: ‘What is really dangerous is these scientists’ ambitions for total and unrestrained control over nature, which many people describe as ‘playing God’.

‘Scientists’ understanding of biology falls far short of their technical capabilities. We have learned to our cost the risks that gap brings, for the environment, animal welfare and human health.’

Professor Julian Savulescu, an Oxford University ethicist, said: ‘Venter is creaking open the most profound door in humanity’s history, potentially peeking into its destiny.

‘He is not merely copying life artificially or modifying it by genetic engineering. He is going towards the role of God: Creating artificial life that could never have existed.’

The Catholic church, perhaps unsurprisingly since they’ve been burned in the past by the conflict between science and religion, is taking a very cautious stance on the issues. They clearly don’t quite know what to make of it, but are prepared to offer their services if any ethical concerns arise.

Vatican and Italian church officials were mostly cautious in their first reaction to the announcement from the United States that researchers had produced a living cell containing manmade DNA. They warned scientists of the ethical responsibility of scientific progress and said that the manner in which the innovation is applied in the future will be crucial.

Since it will be a long, long time before we can synthesize lubricious altar boys, however, I don’t think there will be much call for Catholic advice on the ethics of synthetic biology. Just say no to irrelevant old perverts offering science advice. Besides, the church is also full of conservative fusspots who will spout tired stereotypes.

Another official with the Italian bishops’ conference, Bishop Domenico Mogavero, expressed concern that scientists might be tempted to play God.

“Pretending to be God and parroting his power of creation is an enormous risk that can plunge men into a barbarity,” Mogavero told newspaper La Stampa in an interview. Scientists “should never forget that there is only one creator: God.”

“In the wrong hands, today’s development can lead tomorrow to a devastating leap in the dark,” said Mogavero, who heads the conference’s legal affairs department.

There is no god. The only creators are chance and selection, and now Craig Venter.

The “playing God” noise is going to get even more tiresome, I’m sure. It’s nonsense. If what they’ve done is playing God, then god is biochemistry and molecular biology and the natural processes of physics. We’ve all been playing god every time we cook, or paint, or knit, or write, or create. It’s not a violation of the natural order, and it’s simply doing what humans always do. Apparently, being human is the same thing as being god.

Total confusion

I’m extremely disappointed by the reaction of Andrew Brown, sometimes smart guy, all too often weird apologist for religion. I have no idea what he’s trying to say, but he does try even harder than any atheist I know to tie this work to atheism. “Craig Venter’s production of an entirely artificial bacterium marks another triumph of the only major scientific programme driven from the beginning by explicit atheism”, he says, and “Atheists of the Dawkins type will take it as practical proof that there is no need to hypothesise God at all: we can make life without any miracles, and there’s no need to imagine a creator”. Say what? Venter’s program was driven by scientific curiousity, not atheism; but if Brown wants to equate science with atheism, that’s fine by me. We’ve also known all along that there is no need to hypothesize an intelligent creator, and this is only one more piece of evidence. It isn’t proof. We don’t deal with proof in science.

And then there’s this baffling statement.

But at this moment of complete victory for materialism something odd has happened: the chemical and material world turns out to be entirely shaped by something called “information”.

“Life is basically the result of an information process – a software process” says Venter, and “Starting with the information in a computer, we put it into a recipient cell, and convert it into a news species”. But though this information clearly exists in some sense, it’s impossible to say what kind of thing it is, because it isn’t a thing at all. Whatever this may be, it isn’t material, and it isn’t bound by physical laws. Information turns out to be as elusive and as omnipresent as God once was.

I don’t think so. We have tools to measure information, we can generate information, we can study information…we can’t measure, generate, or study gods. There’s nothing supernatural about information. Information is part of that chemical and material world, and we godless materialists aren’t at all distressed by its existence.

Denial

When you look at what the creationists are saying, it’s simple: they’re scrambling to find excuses to reject the significance of the experiment. Expect to see variations of these same arguments repeated endless by every creationist you ever talk to!

There’s the “it isn’t really a synthetic organism” of Billy Dembski (Intelligent Design wackaloon and fundamentalist Christian), which is what you’d expect of someone who doesn’t understand biology.

The rhetoric is interesting. What they’ve done is stuck a synthetic genome inside a nonsynthetic cell. Nonetheless, they’ve slipped into talking of a “synthetic bacterial cell.” Indeed, one headline reads “The First Self-Replicating Synthetic Bacterial Cell.” This is hype.

If something is going to be called “synthetic,” shouldn’t the whole of it be synthesized and not merely a minuscule portion of it? Also, does such a cell knowably signal design and, if so, why wouldn’t cells untouched by Synthetic Genomics do the same, i.e., implicate design?

The synthesized genome was inserted into an existing bacterial cell, with it’s extant suite of proteins and other molecules, this is entirely true. Venter and colleagues relied on the transcriptional enzymes and ribosomes and so forth already present in the cell to kick-start the activities of the DNA. However, this was only to bootstrap the genome into functionality; within 30 generations of this novel line, Venter estimates, every one of those proteins and every molecule of the cell will have been replaced with the products of the artificial genome.

So, if after a period of time, you’ve got a cell whose DNA was produced by a machine, and whose membranes, enzymes, structural proteins, and metabolic by-products were all produced by that machine-generated DNA or the protein products of that DNA, what makes it a non-synthetic cell?

The response from Answers in Genesis (Young Earth creationist clowns) is a variant of that objection. It’s the “it isn’t anything new” excuse.

Regardless of some hyped press reports, this research (brilliantly executed as it was) has nothing to do with evolution in the molecules-to-man sense. Dr. Georgia Purdom, a molecular geneticist on our Answers in Genesis (AiG) staff, notes that there has merely been an alteration within a kind (at the family, genus, or species level). Even the researchers have acknowledged that this first synthetic cell is more a re-creation of existing life — changing one simple type of bacterium into another. While Venter claimed, “We have passed through a critical psychological barrier. It has changed my own thinking, both scientifically and philosophically, about life, and how it works,” he was also quite clear that [his team] “didn’t create life from scratch.”

I can’t believe they actually weaseled in that nonsense about “kinds” again, as if their fantasyland boundaries are actually relevant.

No, it’s not something brand new, it’s a conservative starting point from which to start generating novelties. This is an argument that will not be able to survive for long, since as work proceeds and genes are removed and new genes added to the artificial genome, the results will not be something that can be called simply another mycoplasm any more. Well, rational people will realize that this is a dead argument, but the kind of people who still insist that there are no transitional fossils will continue to parrot it, looking dumber and dumber year by year.

The argument that this says nothing about evolution is wrong. The bacterium synthesized is not a version of the very first life form to exist, so it’s saying nothing about earlier forms (but that may change as we work towards reducing the synthetic bacterium to a bare-bones suite of genes), but it does say that bacteria are products of chemistry. If you honestly want to learn where the first cells come from, this work says you’d better look to biochemistry, not theology, or the pullitoutofmybuttology of AiG.

It defines a point in the middle of the evolutionary process, and says we arrived there by chemistry. Subsequent evolution, we already know, was by processes we understand (evolution!) but also denied by AiG.

Here’s another argument from Reasons to Believe (Old Earth Creationist goofballs): the “it’s too complicated to have evolved” chestnut that they’ve been chewing on for decades.

For example, Venter’s team must identify the minimal gene set required for life’s existence to re-engineer an artificial life-form from the top down. As they continue to hone in on life’s essential genes and biochemical systems, what’s most striking is the remarkable complexity of life even in its minimal form. And this basic complexity is the first clue that life requires a Creator.

This isn’t life in its most minimal form. It’s a copy of a modern prokaryotic bacterium. As I said above, this is representative of a midpoint in evolution, not its beginning. The complaint does not apply.

Furthermore, complexity does not imply design. Natural processes are quite good at generating complexity, even better than design, so pointing out that something is complex does not distinguish between the two hypotheses. If I had one magic wish and could wake up these idiots to one thing, it’s the simple fact that complexity and design are not equivalent states.

Finally, the one argument we’re probably going to hear the most from creationists in the coming years is the “the synthetic bacterium was built by design, therefore all life was designed”. (Notice that Dembski makes the same illogical claim in his quote above.)

Given the effort that went into the synthesis of the total M. genitalium genome, it’s hard to envision how unintelligent, undirected processes could have generated life from a prebiotic soup. Though not their intention, Venter’s team unwittingly provided empirical evidence that life’s components, and consequently, life itself must stem from the work of an Intelligent Designer.

Let’s play a game. I just grabbed a deck of cards and dealt myself this hand:

J K♠ 2 6 6♠

Now you grab a deck of cards and replicate my hand precisely. You had to go through the deck, card by card, search for those 5 specific cards, and then order them and lay them out in front of you, didn’t you? I just dealt out the five top cards in a shuffled deck.

Which of us had to put the most effort into getting their five cards? Does this imply that in every game of poker, the dealer has to go through the deck and hand-pick which card is given to each person? The huge amount of effort that Venter’s team put into this project does not imply that a focused team built the first mycoplasma genome by the same processes; it says that making a copy by our current technology requires that much effort. The “it was hard to make” excuse simply doesn’t apply.

This does not imply that the original successful bacterium was generated by chance, as trivially as dealing a random array of nucleotides from a deck, however. Venter and his team cheated: they copied a known winner, a genome that had been honed by a few billion years of evolution into a successful organism. They sought out a winning hand like this one and copied it.

A A♠ K K K♠

Again, you could give yourself that hand in a couple of different ways. You could go through the deck by design and pick out those cards. Or you could deal out hands repeatedly, throwing out any that don’t match the target — that would work, too, given that you’ve got billions of years to play the game. Or you could do it as evolution does, just play poker with your buddies and know that there are lots of different ways to generate winning and losing hands, and the process will result in a winner emerging with every deal.

All of the denialist arguments are basically errors based on their misunderstanding of Venter’s experiment and evolution in general. Be prepared, they will be recycled heavily.

It’s ALIVE!

Get in the mood for this bit of news, the synthesis of an artificial organism by Craig Venter’s research team.

Here’s the equivalent of that twitching hand of Frankenstein’s monster:

i-f02e362f829f5187cb195d95bc5e2f44-artificial_myc-thumb-425x185-49474.jpeg
i-e88a953e59c2ce6c5e2ac4568c7f0c36-rb.png

Those are two colonies of Mycoplasma mycoides, their nucleoids containing entirely synthesized DNA. You can tell because the synthesized DNA contained a lacZ gene for beta-galactosidase, making the pretty blue product. That’s one of the indicators that the artificial chromosome is functioning inside the cell; the DNA was also encoded with recognizable watermarks, and they also used a cell of a different species, M. capricolum, as the host for the DNA.

The experiment involved creating a strand of DNA as specified by a computer in a sequencing machine, and inserting it into a dead cell of M. capricolum, and then watching it revivify and express the artificial markers and the M. mycoides proteins. It really is like bringing the dead back to life.

It was also a lot more difficult than stitching together corpses and zapping it with lightning bolts. The DNA in this cell is over one million bases long, and it all had to be assembled appropriately with a sequencing machine. That was the first tricky part; current machines can’t build DNA strands that long. They could coax sequences about a thousand nucleotides long out of the machines.

Then what they had to do was splice over a thousand of these short pieces into a complete bacterial chromosome. This was done with a combination of enzymatic reactions in a test tube, and in vivo assembly by recombination inside yeast cells. The end result is a circular bacterial chromosome that is, in its sequence, almost entirely the M. mycoides genome…but made from a sequence stored in a computer rather than a parental bacterium.

i-07e40c58a4b3e918cfd76ade697a4e76-artificial_chrom-thumb-425x436-49477.jpeg

Finally, there was one more hurdle to overcome, getting this large loop of DNA into the husk of a cell. These techniques, at least, had been worked out last year in experiments in which they had transplanted natural M. mycoides chromosomes into bacteria.

The end result is a new, functioning, replicating cell. One could argue that it isn’t entirely artificial yet, since the artificial DNA is being placed in a cell of natural origin…but give it time. The turnover of lipids and proteins and such in the cytoplasm in the membrane means that within 30 generations all of the organism will have been effectively replaced, anyway.

It’s a very small cell that has been created — the mycoplasmas have the smallest genomes of any extant cells. It’s not much, but this is a breakthrough comparable to Wöhler’s synthesis of urea. That event was a revelation, because it broke the idea that organic chemicals were somehow special and incapable of synthesis from inorganic molecules. And that led to the establishment of the whole field of organic chemistry, and we all know how big and important that has become to our culture.

Venter’s synthesis of a simple life form is like the synthesis of urea in that it has the potential to lead to some huge new possibilities. Get ready for it.

If the methods described here can be generalized, design, synthesis, assembly, and transplantation of synthetic chromosomes will no longer be a barrier to the progress of synthetic biology. We expect that the cost of DNA synthesis will follow what has happened with DNA sequencing and continue to exponentially decrease. Lower synthesis costs combined with automation will enable broad applications for synthetic genomics.

We should be aware of the limitations right now, though. It was a large undertaking to assemble the 1 million base pair synthetic chromosome for a mycoplasma. If you’re dreaming of using the draft Neandertal sequence to make your own resynthesized caveman, you’re going to have to appreciate the fact that that is a job more than three orders of magnitude greater than building a bacterium. Also keep in mind that the sequence introduced into the bacterium was not exactly as intended, but contained expected small errors that had accumulated during the extended synthesis process.

A single transplant originating from the sMmYCp235 synthetic genome was sequenced. We refer to this strain as M. mycoides JCVI-syn1.0. The sequence matched the intended design with the exception of the known polymorphisms, 8 new single nucleotide polymorphisms, an E. coli transposon insertion, and an 85-bp duplication. The transposon insertion exactly matches the size and sequence of IS1, a transposon in E. coli. It is likely that IS1 infected the 10-kb sub-assembly following its transfer to E. coli. The IS1 insert is flanked by direct repeats of M. mycoides sequence suggesting that it was inserted by a transposition mechanism. The 85-bp duplication is a result of a non-homologous end joining event, which was not detected in our sequence analysis at the 10-kb stage. These two insertions disrupt two genes that are evidently non-essential.

So we aren’t quite at the stage of building novel new multicellular plants or animals — that’s going to be a long way down the road. But it does mean we can expect to be able to build custom bacteria within another generation, I would think, and that they will provide some major new industrial potential.

I know that there are some ethical concerns — Venter also mentions them in the paper — but I’m not personally too worried about them just yet. This cell created is not a monster with ten times the strength of an ordinary cell and the brain of a madman — it’s actually more fragile and contains only genes found in naturally occurring species (and a few harmless markers). When the techniques become economically practical, everyone will be building specialized bacteria to carry out very specific biochemical reactions, and again, they’re going to be poor generalists and aren’t going to be able to compete in survival with natural species that have been honed by a few billion years of selection for fecundity and survivability.

Give it a decade or two, though, and we’ll have all kinds of new capabilities in our hands. The ethical concerns now are a little premature, though, because we have no idea what our children and grandchildren will be able to do with this power. I don’t think Wöhler could have predicted plastics from his discovery, after all: we’re going to have to sit back, enjoy the ride, and watch carefully for new promises and perils as they emerge.


Gibson et al. (2010) Creation of a Bacterial Cell Controlled by a Chemically Synthesized Genome. Science Express.

Lartigue et al. (2009) Creating Bacterial Strains from Genomes That Have Been Cloned and Engineered in Yeast. Science 325:1693-1696.

Venter has done it

We’re hearing the first stirrings of a big story: Craig Venter may have created the first organism with an artificially synthesized genome. Conceptually, building a strand of DNA and inserting it into a cell stripped of its genome is completely unsurprising — of course it will work, a cell is just chemistry — but it is a huge technical accomplishment.

Carl Zimmer has more background. I want to see the paper.

Junk DNA is still junk

i-e88a953e59c2ce6c5e2ac4568c7f0c36-rb.png

The ENCODE project made a big splash a couple of years ago — it is a huge project to not only ask what the sequence of a strand of human DNA was, but to analyzed and annotate and try to figure out what it was doing. One of the very surprising results was that in the sections of DNA analyzed, almost all of the DNA was transcribed into RNA, which sent the creationists and the popular press into unwarranted flutters of excitement that maybe all that junk DNA wasn’t junk at all, if enzymes were busy copying it into RNA. This was an erroneous assumption; as John Timmer pointed out, the genome is a noisy place, and coupled with the observations that the transcripts were not evolutionarily conserved, it suggested that these were non-functional transcripts.

Personally, I fall into the “it’s all junk” end of the spectrum. If almost all of these sequences are not conserved by evolution, and we haven’t found a function for any of them yet, it’s hard to see how the “none of it’s junk” view can be maintained. There’s also an absence of support for the intervening view, again because of a lack of evidence for actual utility. The genomes of closely related species have revealed very few genes added from non-coding DNA, and all of the structural RNA we’ve found has very specific sequence requirements. The all-junk view, in contrast, is consistent with current data.

Larry Moran was dubious, too — the transcripts could easily by artifactual.

The most widely publicized result is that most of the human genome is transcribed. It might be more correct to say that the ENCODE Project detected RNA’s that are either complimentary to much of the human genome or lead to the inference that much of it is transcribed.

This is not news. We’ve known about this kind of data for 15 years and it’s one of the reasons why many scientists over-estimated the number of humans genes in the decade leading up to the publication of the human genome sequence. The importance of the ENCODE project is that a significant fraction of the human genome has been analyzed in detail (1%) and that the group made some serious attempts to find out whether the transcripts really represent functional RNAs.

My initial impression is that they have failed to demonstrate that the rare transcripts of junk DNA are anything other than artifacts or accidents. It’s still an open question as far as I’m concerned.

I felt the same way. ENCODE was spitting up an anomalous result, one that didn’t fit with any of the other data about junk DNA. I suspected a technical artifact, or an inability of the methods used to properly categorize low frequency accidental transcription in the genome.

Creationists thought it was wonderful. They detest the idea of junk DNA — that the gods would scatter wasteful garbage throughout our precious genome by intent was unthinkable, so any hint that it might actually do something useful is enthusiastically siezed upon as evidence of purposeful design.

Well, score one for the more cautious scientists, and give the creationists another big fat zero (I think the score is somewhere in the neighborhood of a big number requiring scientific notation to be expressed for the scientists, against a nice, clean, simple zero for the creationists). A new paper has come out that analyzes transcripts from the human genome using a new technique, and, uh-oh, it looks like most of the early reports of ubiquitous transcription were wrong.

Here’s the author’s summary:

The human genome was sequenced a decade ago, but its exact gene composition remains a subject of debate. The number of protein-coding genes is much lower than initially expected, and the number of distinct transcripts is much larger than the number of protein-coding genes. Moreover, the proportion of the genome that is transcribed in any given cell type remains an open question: results from “tiling” microarray analyses suggest that transcription is pervasive and that most of the genome is transcribed, whereas new deep sequencing-based methods suggest that most transcripts originate from known genes. We have addressed this discrepancy by comparing samples from the same tissues using both technologies. Our analyses indicate that RNA sequencing appears more reliable for transcripts with low expression levels, that most transcripts correspond to known genes or are near known genes, and that many transcripts may represent new exons or aberrant products of the transcription process. We also identify several thousand small transcripts that map outside known genes; their sequences are often conserved and are often encoded in regions of open chromatin. We propose that most of these transcripts may be by-products of the activity of enhancers, which associate with promoters as part of their role as long-range gene regulatory sites. Overall, however, we find that most of the genome is not appreciably transcribed.

So, basically, they directly compared the technique used in the ENCODE analysis (the “tiling” microarray analysis) to more modern deep sequencing methods, and found that the old results were mostly artifacts of the protocol. They also directly examined the pool of transcripts produced in specific tissues, and asked what proportion of them came from known genes, and what part came from what has been called the “dark matter” of the genome, or what has usually been called junk DNA. The cell’s machinery to transcribe genes turns out to be reasonably precise!

To assess the proportion of unique sequence-mapping reads accounted for by dark matter transcripts in RNA-Seq data, we compared the mapped sequencing data to the combined set of known gene annotations from the three major genome databases (UCSC, NCBI, and ENSEMBL, together referred to here as “annotated” or “known” genes). When considering uniquely mapped reads in all human and mouse samples, the vast majority of reads (88%) originate from exonic regions of known genes. These figures are consistent with previously reported fractions of exonic reads of between 75% and 96% for unique reads, including those of the original studies from which some of the RNA-Seq data in this study were derived. When including introns, as much as 92%-93% of all reads can be accounted for by annotated gene regions. A further 4%-5% of reads map to unannotated genomic regions that can be aligned to spliced ESTs and mRNAs from high-throughput cDNA sequencing efforts, and only 2.2%-2.5% of reads cannot be explained by any of the aforementioned categories.

Furthermore, when they looked at where the mysterious transcripts are coming from, they are most frequently from regions of DNA near known genes, not just out of deep intergenic regions. This also suggests that they’re an artifact, like an extended transcription of a gene, or from other possibly regulatory bits, like pasRNA (promoter-associated small RNAs — there’s a growing cloud of xxxRNA acronyms growing out there, but while they may be extremely useful, like siRNA, they’re still tiny as a fraction of the total genome. Don’t look for demolition of the concept of junk DNA here).

There clearly are still mysteries in there — they do identify a few novel transcripts that come up out of the intergenic regions — but they are small and rare, and the fact of their existence does not imply a functional role, since they could simply be byproducts of other processes. The only way to demonstrate that they actually do something will require experiments in genetic perturbation.

The bottom line, though, is the genome is mostly dead, transcriptionally. The junk is still junk.


van Bakel H, Nislow C, Blencowe BJ, Hughes TR (2010) Most “Dark Matter” Transcripts Are Associated With Known Genes. PLoS Biology 8(5):1-21.

Now we’ve got some big numbers to throw around, too

i-e88a953e59c2ce6c5e2ac4568c7f0c36-rb.png

Only ours are methodologically valid. It’s a common creationist tactic to fling around big numbers to ‘disprove’ evolution: for instance, I’ve had this mysterious Borel’s Law (that anything with odds worse than 1 in 1050 can never happen) thrown in my face many times, followed by the declaration that the odds of the simplest organism forming by chance are 1 in 10340,000,000. It’s complete nonsense, of course — their calculations all ignore the reality of the actual events, assuming that everything must form spontaneously and all at once, which is exactly the opposite of how probability plays a role in evolution. It’s annoying and inane, and the creationists never seem to learn…perhaps because the rubes they pander to are easily dazzled by even bogus mathematics, so they keep doing it.

We’re going to have to start firing back. Doug Theobald, a long-time contributor to Talk.Origins and the Panda’s Thumb, has written a very nice paper testing the likelihood that all life on earth is not related by common descent, and he comes up with some numbers of many digits to support evolutionary theory. Nick Matzke has a summary, and the story has been written up for National Geographic.

Basically, the idea is this: take a small set of known, conserved proteins that are shared in all organisms, not restricting ourselves to one kingdom or one phylum, but grabbing them all. In this paper, that data set consists of 23 proteins from 12 taxa in the Big Three domains: Bacteria, Archaea, and Eukarya. Then set up many different models to explain the relationships of these species. For instance, you could organize them into the classic single tree, where all are related, or you could model them as three independent origins, for each of Bacteria, Archaea, or Eukarya, or you could postulate other combinations, such as that Bacteria arose independently of Archaea and Eukarya, which share a common ancestor. Finally, you tell your computer to do a lot of statistics on the models, asking how likely it is that two independent groups would each arrive at similar sequences, rating each of the models for parsimony and accuracy against the evidence.

And the winner is…common ancestry, with one branching tree! This is what we expected, of course, and what Theobald has done is to test our assumptions, always a good thing to do.

More complicated permutations of these models were also tried. What if there were a significant amount of horizontal gene transfer? Would that make multiple origins of modern life more likely? He was testing models like the ones below, where the dotted lines represent genes that leap across taxa to confuse the issue.

i-d060a3501eaef6f6686404d4ac24ba00-origin_models.jpeg

The answer here is that they don’t. These models can also be evaluated by statistical methods, and the best fit is again the one on the right, with a single ancestral root. People might recall the infamous “Darwin was wrong” cover from New Scientist—well, these results say that New Scientist was wrong, the existence of extensive horizontal gene transfer does not negate the fact of common descent.

So what’s the big number? There are lots of them in the paper, since it evolves many comparisons, but Theobald distills it down to just the odds that bacteria have an independent origin from Archaea and eukaryotes:

But, based on the new analysis, the odds of that are “just astronomically enormous,” he said. “The number’s so big, it’s kind of silly to say it”–1 in 10 to the 2,680th power, or 10 followed by 2,680 zeros.

One in 102680? Hey, aren’t those odds a little worse than Borel’s criterion of one in 1050?

Stay tuned to the Panda’s Thumb. Apparently, once he finishes up the trifling business of wrapping up a semester’s teaching, Theobald will be putting up a synopsis of his own and answering questions online.


Theobald D (2010) A formal test of the theory of universal common ancestry. Nature 465(13):219-222.

How to make a snake

i-e88a953e59c2ce6c5e2ac4568c7f0c36-rb.png

First, you start with a lizard.

Really, I’m not joking. Snakes didn’t just appear out of nowhere, nor was there simply some massive cosmic zot of a mutation in some primordial legged ancestor that turned their progeny into slithery limbless serpents. One of the tougher lessons to get across to people is that evolution is not about abrupt transmutations of one form into another, but the gradual accumulation of many changes at the genetic level which are typically buffered and have minimal effects on the phenotype, only rarely expanding into a lineage with a marked difference in morphology.

What this means in a practical sense is that if you take a distinct form of a modern clade, such as the snakes, and you look at a distinctly different form in a related clade, such as the lizards, what you may find is that the differences are resting atop a common suite of genetic changes; that snakes, for instance, are extremes in a range of genetic possibilities that are defined by novel attributes shared by all squamates (squamates being the lizards and snakes together). Lizards are not snakes, but they will have inherited some of the shared genetic differences that enabled snakes to arise from the squamate last common ancestor.

So if you want to know where snakes came from, the right place to start is to look at their nearest cousins, the lizards, and ask what snakes and lizards have in common, that is at the same time different from more distant relatives, like mice, turtles, and people…and then you’ll have an idea of the shared genetic substrate that can make a snake out of a lizard-like early squamate.

Furthermore, one obvious place to look is at the pattern of the Hox genes. Hox genes are primary regulators of the body plan along the length of the animal; they are expressed in overlapping zones that specify morphological regions of the body, such as cervical, thoracic, lumbar, sacral/pelvic, and caudal mesodermal tissues, where, for instance, a thoracic vertebra would have one kind of shape with associated ribs, while lumbar vertebra would have a different shape and no ribs. These identities are set up by which Hox genes are active in the tissue forming the bone. And that’s what makes the Hox genes interesting in this case: where the lizard body plan has a little ribless interruption to form pelvis and hindlimbs, the snake has vertebra and ribs that just keep going and going. There must have been some change in the Hox genes (or their downstream targets) to turn a lizard into a snake.

There are four overlapping sets of Hox genes in tetrapods, named a, b, c, and d. Each set has up to 13 individual genes, where 1 is switched on at the front of the animal and 13 is active way back in the tail. This particular study looked at just the caudal members, 10-13, since those are the genes whose expression patterns straddle the pelvis and so are likely candidates for changes in the evolution of snakes.

Here’s a summary diagram of the morphology and patterns of Hox gene expression in the lizard (left) and snake (right). Let’s see what we can determine about the differences.

i-69cbf55199892732adddc5cd182dd922-nature08789-f4.2-thumb-400x294-42856.jpg
(Click for larger image)

Evolutionary modifications of the posterior Hox system in the whiptail lizard and corn snake. The positions of Hox expression domains along the paraxial mesoderm of whiptail lizard (32-40 somites, left) and corn snake (255-270 somites, right) are represented by black (Hox13), dark grey (Hox12), light grey (Hox11) and white (Hox10) bars, aligned with coloured schemes of the future vertebral column. Colours indicate the different vertebral regions: yellow, cervical; dark blue, thoracic; light blue, lumbar; green, sacral (in lizard) or cloacal (in snake); red, caudal. Hoxc11 and Hoxc12 were not analysed in the whiptail lizard. Note the absence of Hoxa13 and Hoxd13 from the corn snake mesoderm and the absence of Hoxd12 from the snake genome.

The morphology is revealing: snakes and lizards have the same regions, cervical (yellow), thoracic (blue), sacral (or cloacal in the snake, which lacks pelvic structures in most species) in green, and caudal or tail segments (red). The differences are in quantity — snakes make a lot of ribbed thoracic segments — and detail — snakes don’t make a pelvis, usually, but do have specializations in that corresponding area for excretion and reproduction.

Where it really gets interesting is in the expression patterns of the Hox genes, shown with the bars that illustrate the regions where each Hox gene listed is expressed. They are largely similar in snake and lizard, with boundaries of Hox expression that correspond to transitions in the morphology of vertebrae. But there are revealing exceptions.

Compare a10/c10 in the snake and lizard. In the snake, these two genes have broader expression patterns, reaching up into the thoracic region; in the lizard, they are cut off sharply at the sacral boundary. This is interesting because in other vertebrates, the Hox 10 group is known to have the function of suppressing rib formation. Yet there they are, turned on in the posterior portion of the thorax in the snake, where there are ribs all over the place.

In the snake, then, Hox a10 and c10 have lost a portion of their function — they no longer shut down ribs. What is the purpose of the extended domain of a10/c10 expression? It may not have one. A comparison of the sequences of these genes between various species reveals a detectable absence of signs of selection — the reason these genes happen to be active so far anteriorly is because selection has been relaxed, probably because they’ve lost that morphological effect of shutting down ribs. Those big bars are a consequence of simple sloppiness in a system that can afford a little slack.

The next group of Hox genes, the 11 group, are very similar in their expression patterns in the lizard and the snake, and that reflects their specific roles. The 10 group is largely involved in repression of rib formation, but the 11 group is involved in the development of sacrum-specific structures. In birds, for instance, the Hox 11 genes are known to be involved in the development of the cloaca, a structure shared between birds, snakes, and lizards, so perhaps it isn’t surprising that they aren’t subject to quite as much change.

The 13 group has some notable differences: Hox a13 and d13 are mostly shut off in the snake. This is suggestive. The 13 group of Hox genes are the last genes, at the very end of the animal, and one of their proposed functions is to act as a terminator of patterning — turning on the Hox 13 genes starts the process of shutting down the mesoderm, shrinking the pool of tissue available for making body parts, so removing a repressor of mesoderm may promote longer periods of growth, allowing the snake to extend its length further during embryonic development.

So we see a couple of clear correlates at the molecular level for differences in snake and lizard morphology: rib suppression has been lost in the snake Hox 10 group, and the activity of the snake Hox 13 group has been greatly curtailed, which may be part of the process of enabling greater elongation. What are the similarities between snakes and lizards that are also different from other animals?

This was an interesting surprise. There are some differences in Hox gene organization in the squamates as a whole, shared with both snakes and lizards.

i-fa000ad47d905c9af235e410cac479e7-nature08789-f1.2-thumb-400x212-42847.jpg
(Click for larger image)

Genomic organization of the posterior HoxD cluster. Schematic representation of the posterior HoxD cluster (from Evx2 to Hoxd10) in various vertebrate species. A currently accepted phylogenetic tree is shown on the left. The correct relative sizes of predicted exons (black boxes), introns (white or coloured boxes) and intergenic regions (horizontal thick lines) permit direct comparisons (right). Gene names are shown above each box. Colours indicate either a 1.5-fold to 2.0-fold (blue) or a more than 2.0-fold (red) increase in the size of intronic (coloured boxes) or intergenic (coloured lines) regions, in comparison with the chicken reference. Major CNEs are represented by green vertical lines: light green, CNEs conserved in both mammals and sauropsids; dark green, CNEs lost in the corn snake. Gaps in the genomic sequences are indicated by dotted lines. Transposable elements are indicated with asterisks of different colours (blue for DNA transposons; red for retrotransposons).

That’s a diagram of the structure of the chromosome in the neighborhood of the Hox d10-13 genes in various vertebrates. For instance, look at the human and the turtle: the layout of our Hox d genes is vary similar, with 13-12-11-10 laid out with approximately the same distances between them, and furthermore, there are conserved non-coding elements, most likely important pieces of regulatory DNA, that are illustrated in light yellow-reen and dark green vertical bars, and they are the same, too.

In other words, the genes that stake out the locations of pelvic and tail structures in turtles and people are pretty much the same, using the same regulatory apparatus. It must be why they both have such pretty butts.

But now compare those same genes with the squamates, geckos, anoles, slow-worms, and corn snakes. The differences are huge: something happened in the ancestor of the squamates that released this region of the genome from some otherwise highly conserved constraints. We don’t know what, but in general regulation of the Hox genes is complex and tightly interknit, and this order of animals acquired some other as yet unidentified patterning mechanism that opened up this region of genome for wider experimentation.

When these regions are compared in animals like turtles and people and chickens, the genomes reveal signs of purifying selection — that is, mutations here tend to be unsuccessful, and lead to death, failure to propagate, etc., other horrible fates that mean tinkering here is largely unfavorable to fecundity (which makes sense: who wants a mutation expressed in their groinal bits?). In the squamates, the evidence in the genome does not witness to intense selection for their particular arrangement, but instead, of relaxed selection — they are generally more tolerant of variations in the Hox gene complex in this area. What was found in those enlarged intergenic regions is a greater invasion of degenerate DNA sequences: lots of additional retrotransposons, like LINES and SINES, which are all junk DNA.

So squamates have more junk in the genomic trunk, which is not necessarily expressed as an obvious phenotypic difference, but still means that they can more flexibly accommodate genetic variations in this particular area. Which means, in turn, that they have the potential to produce more radical experiments in morphology, like making a snake. The change in Hox gene regulation in the squamate ancestor did not immediately produce a limbless snake, instead it was an enabling mutation that opened the door to novel variations that did not compromise viability.


Di-Po N, Montoya-Burgos JI, Miller H, Pourquie O, Milinkovitch MC, Duboule D (2010) Changes in Hox genes’ structure and function during the evolution of the squamate body plan. Nature 464:99-103.

α-actinin evolution in humans

i-e88a953e59c2ce6c5e2ac4568c7f0c36-rb.png

Perhaps your idea of the traditional holiday week involves lounging about with a full belly watching football — not me, though. I think if I did, I’d be eyeing those muscular fellows with thoughts of muscle biopsies and analyses of the frequency of α-actinin variants in their population vs. the population of national recliner inhabitants. I’m sure there’s an interesting story there.

In case you’re wondering what α-actinin is, it’s a cytoskeletal protein that’s important in anchoring and coordinating the thin filaments of actin that criss-cross throughout your cells. It’s very important in muscle, where it’s localized in the Z-disk at the boundaries of sarcomeres, the repeated contractile units of the muscle. This diagram might help you visualize it:

i-60cbd272718c8f724ef1452d79d99d6a-sarcomere.jpeg
Actin (green), myosin (red). Rod-like tropomyosin molecules (black lines). Thin filaments in muscle sarcomeres are anchored at the Z-disk by the cross-linking protein α-actinin (gold) and are capped by CapZ (pink squares). The thin-filament pointed ends terminate within the A band, are capped by tropomodulin (bright red). Myosin-binding-protein C (MyBP-C; yellow transverse lines).

The most prominent elements in the picture are the thin filaments (made of actin) and thick filaments (made of myosin) which slide past each other, driven by motor proteins, to cause contraction and relaxation of the muscle. The α-actinin proteins are the subtle orange lines in the Z disks on the left and right.

The α-actinin proteins are evolutionarily interesting. In vertebrates, there are usually four different kinds: α-actinin 1, 2, 3, and 4. 1 and 4 are ubiquitous in all cells, since all cells have a cytoskeleton, and the α-actinins are important in anchoring the cytoskeleton. α-actinin-2 and -3 are the ones of interest here, because they are specifically muscle actinins. α-actinin-2 is found in all skeletal muscle fibers, cardiac muscle, and also in the brain (no, not muscle in the brain, there isn’t any: in the cytoskeleton of neurons). Just to complicate matters a bit, α-actinin-2 is also differently spliced in different tissues, producing a couple of isoforms from a single gene. α-actinin-3 is not found in the brain or heart, but only in skeletal muscle and specifically in type II fast glycolytic muscle fibers.

Muscle fibers are specialized. Some are small diameter, well vascularized, relatively slow fibers that are optimized for endurance; they can keep contracting over and over again for long periods of time. These are the fibers that make up the dark meat in your Christmas turkey or duck. Other fibers are large diameter, operate effectively anaerobically, and are optimized for generating lots of force rapidly, but they tend to fatigue quickly — and there are more of these in the white meat of your Christmas bird. (There are also intermediate fiber types that we won’t consider here.) Just keep these straight in your head to follow along: the fast type II muscle fibers are the ones that you use to generate explosive bursts of force, and may be enriched in α-actinin-3; the slower fibers are the ones you use to keep going when you run marathons, and contain α-actinin-2. (There are other even more important differences between fast and slow fibers, especially in myosin variants, so differences in α-actinins are not major determinants of muscle type.)

Wait, what about evolution? It turns out that invertebrates only have one kind of α-actinin, and vertebrates made their suite of four in the process of a pair of whole genome duplications. We made α-actinin-2 and -3 in a duplication event roughly 250-300 million years ago, at which time they would have been simple duplicates of each other, but they have diverged since then, producing subtle (and not entirely understood) functional differences from one another, in addition to acquiring different sites of expression. α-actinin-2 and -3 in humans are now about 80% identical in amino acid sequence. What has happened in these two genes is consistent with what we know about patterns of duplication and divergence.

i-75f72fe14fd19d142167119147ebce45-duplication.jpeg
Using sarcomeric α-actinin as an example, after duplication of a gene capable of multiple interactions/functions, there are two possible distinct scenarios besides gene loss. A: Sub-functionalisation, where one interaction site is optimised in each of the copies. B: Neo-functionalisation, where one copy retains the ancestral inter- action sites while the other is free to evolve new interaction sites.

So what we’re seeing in the vertebrate lineage is a conserved pattern of specialization of α-actinin-3 to work with fast muscle fibers — it’s a factor in enhancing performance in the specific task of generating force. The α-actinin-3 gene is an example of a duplicated gene becoming increasingly specialized for a particular role, with both changes in the amino acid sequence that promoted a more specialized activity, and changes in the regulatory region of the gene so that it was only switched on in appropriate muscle fibers.

i-e5ab4d9006f937a9907d72bd7ae1aac0-actn_history.jpeg
Duplication and divergence model proposed by this paper. Before duplication the ancestral sarcomeric α-actinin had the functions of both ACTN2 and ACTN3 in terms of tissue expression and functional isoforms. After duplication, ACTN2 has conserved most of the functions of the preduplicated gene, while ACTN3 has lost many of these functions, which may have allowed it to optimise function in fast fibres.

That’s cool, but what we need is an experiment: we need to knock out the gene and see what happens. Mutations in α-actinin-2 are bad—they cause a cardiomyopathy. Losing α-actinin-4 leads to serious kidney defects (that gene is expressed in kidney tissue). What happens if we lose α-actinin-3?

It turns out you may be a guinea pig in that great experiment. Humans acquired a mutation in the α-actinin-3 gene, called R577X, approximately 40-60,000 years ago, and this mutation is incredibly common: about 50% of individuals of European and Asian descent carry it, and about 10% of individuals from African populations. Furthermore, an analysis of the flanking DNA shows relatively little recombination or polymorphism — which implies that the allele has reached this high frequency relatively recently and rapidly, which in turn implies that there has been positive selection for a nonsense mutation that destroys α-actinin-3 in us. The data suggests that a selective sweep for this variant began in Asia about 33,000 years ago, and in Europe about 15,000 years ago.

There is no disease associated with the loss of α-actinin-3. It seems that α-actinin-2 steps up to the plate and fills the role in type II fast muscle fibers, so everything functions just fine. Except…well, there is an interesting statistical effect.

The presence of a functional α-actinin-3 gene is correlated with athletic performance. A study of the frequency of the R577X mutation in athletes and controls found that there is a significant reduction in the frequency of the mutation among sprinters and power-lifters. At the Olympic level, none of the sprinters in the sample (32 individuals) carried the α-actinin-3 deficiency. Among Olympic power lifters, all had at least one functional copy of α-actinin-3.

Awesome. Now I’m wondering about my α-actinin-3 genotype, and whether I have a good biological excuse for why I always got picked last for team sports in high school gym class. This is also why I’m interested in taking biopsies of football players…both for satisfying a scientific curiosity, and for revenge.

You may be wondering at this point about something: α-actinin-3 has a clear beneficial effect in enhancing athletic performance, and its conservation in other animal species suggests that it’s almost certainly a good and useful protein. So why has there been positive selection (probably) for a knock-out mutation in the human lineage?

There is a weak correlation in that study of athletic performance that high-ranking athletes in endurance sports have an increased frequency of the R577X genotype; it was only seen in female long-distance runners, though. More persuasive is the observation that α-actinin-3 knockouts in mice also produced a shift in metabolic enzyme markers that are indicative of increased endurance capacity. The positive advantage of losing α-actinin-3 may be more efficient aerobic metabolism in muscles, at the expense of sacrificing some strength at the high end of athletic performance.

This is yet another example of human evolution in progress—we’re seeing a shift in human muscle function over the course of a few tens of thousands of years.


Lek M, Quinlan KG, North KN (2009) The evolution of skeletal muscle performance: gene duplication and divergence of human sarcomeric alpha-actinins. Bioessays 32(1):17-25. [Epub ahead of print]

MacArthur DG, Seto JT, Raftery JM, Quinlan KG, Huttley GA, Hook JW, Lemckert FA, Kee AJ, Edwards MR, Berman Y, Hardeman EC, Gunning PW, Easteal S, Yang N, North KN (2007) Loss of ACTN3 gene function alters mouse muscle metabolism and shows evidence of positive selection in humans. Nat Genet.39(10):1261-5.

Yang N, MacArthur DG, Gulbin JP, Hahn AG, Beggs AH, Easteal S, North K (2003) ACTN3 genotype is associated with human elite athletic performance. Am J Hum Genet 73(3):627-31.