We need a sociologist of science…or a philosopher

There’s another paper out debunking the ENCODE consortium’s absurd interpretation of their data. ENCODE, you may recall, published a rather controversial paper in which they claimed to have found that 80% of the human genome was ‘functional’ — for an extraordinarily loose definition of function — and further revealed that several of the project leaders were working with the peculiar assumption that 100% must be functional. It was a godawful mess, and compromised the value of a huge investment in big science.

Now W. Ford Doolittle has joined the ranks of many scientists who immediately leapt into the argument. He has published “Is junk DNA bunk? A critique of ENCODE” in PNAS.

Do data from the Encyclopedia Of DNA Elements (ENCODE) project render the notion of junk DNA obsolete? Here, I review older arguments for junk grounded in the C-value paradox and propose a thought experiment to challenge ENCODE’s ontology. Specifically, what would we expect for the number of functional elements (as ENCODE defines them) in genomes much larger than our own genome? If the number were to stay more or less constant, it would seem sensible to consider the rest of the DNA of larger genomes to be junk or, at least, assign it a different sort of role (structural rather than informational). If, however, the number of functional elements were to rise significantly with C-value then, (i) organisms with genomes larger than our genome are more complex phenotypically than we are, (ii) ENCODE’s definition of functional element identifies many sites that would not be considered functional or phenotype-determining by standard uses in biology, or (iii) the same phenotypic functions are often determined in a more diffuse fashion in larger-genomed organisms. Good cases can be made for propositions ii and iii. A larger theoretical framework, embracing informational and structural roles for DNA, neutral as well as adaptive causes of complexity, and selection as a multilevel phenomenon, is needed.

In the paper, he makes an argument similar to one T. Ryan Gregory has made many times before. There are organisms that have much larger genomes than humans; lungfish, for example, have 130 billion base pairs, compared to the 3 billion humans have. If the ENCODE consortium had studied lungfish instead, would they still be arguing that the organism had function for 104 billion bases (80% of 130 billion)? Or would they be suggesting that yes, lungfish were full of junk DNA?

If they claim that lungfish that lungfish have 44 times as much functional sequence as we do, well, what is it doing? Does that imply that lungfish are far more phenotypically complex than we are? And if they grant that junk DNA exists in great abundance in some species, just not in ours, does that imply that we’re somehow sitting in the perfect sweet spot of genetic optimality? If that’s the case, what about species like fugu, that have genomes one eighth the size of ours?

It’s really a devastating argument, but then, all of the arguments against ENCODE’s interpretations have been solid and knock the whole thing out of the park. It’s been solidly demonstrated that the conclusions of the ENCODE program were shit.

yalejunk

So why, Yale, why? The Winter edition of the Yale Medicine magazine features as a cover article Junk No More, an awful piece of PR fluff that announces in the first line “R.I.P., junk DNA” and goes on to tout the same nonsense that every paper published since the ENCODE announcement has refuted.

The consortium found biological activity in 80 percent of the genome and identified about 4 million sites that play a role in regulating genes. Some noncoding sections, as had long been known, regulate genes. Some noncoding regions bind regulatory proteins, while others code for strands of RNA that regulate gene expression. Yale scientists, who played a key role in this project, also found “fossils,” genes that date to our nonhuman ancestors and may still have a function. Mark B. Gerstein, Ph.D., the Albert L. Williams Professor of Biomedical Informatics and professor of molecular biophysics and biochemistry, and computer science, led a team that unraveled the network of connections between coding and noncoding sections of the genome.

Arguably the project’s greatest achievement is the repository of new information that will give scientists a stronger grasp of human biology and disease, and pave the way for novel medical treatments. Once verified for accuracy, the data sets generated by the project are posted on the Internet, available to anyone. Even before the project’s September announcement, more than 150 scientists not connected to ENCODE had used its data in their research.

“We’ve come a long way,” said Ewan Birney, Ph.D., of the European Bioinformatics Institute (EBI) in the United Kingdom, lead analysis coordinator for ENCODE. “By carefully piecing together a simply staggering variety of data, we’ve shown that the human genome is simply alive with switches, turning our genes on and off and controlling when and where proteins are produced. ENCODE has taken our knowledge of the genome to the next level, and all of that knowledge is being shared openly.”

Oh, Christ. Not only is it claiming that the 80% figure is for biological activity (it isn’t), but it trots out the usual university press relations crap about how the study is all about medicine. It wasn’t and isn’t. It’s just that dumbasses can only think of one way to explain biological research to the public, and that is to suggest that it will cure cancer.

As for Birney’s remarks, they are offensively ignorant. No, the ENCODE research did not show that the human genome is actively regulated. We’ve known that for fifty years.

That’s not the only ahistorical part of the article. They also claim that the idea of junk DNA has been discredited for years.

Some early press coverage credited ENCODE with discovering that so-called junk DNA has a function, but that was old news. The term had been floating around since the 1990s and suggested that the bulk of noncoding DNA serves no purpose; however, articles in scholarly journals had reported for decades that DNA in these “junk” regions does play a regulatory role. In a 2007 issue of Genome Research, Gerstein had suggested that the ENCODE project might prompt a new definition of what a gene is, based on “the discrepancy between our previous protein-centric view of the gene and one that is revealed by the extensive transcriptional activity of the genome.” Researchers had known for some time that the noncoding regions are alive with activity. ENCODE demonstrated just how much action there is and defined what is happening in 80 percent of the genome. That is not to say that 80 percent was found to have a regulatory function, only that some biochemical activity is going on. The space between genes was also found to contain sites where DNA transcription into RNA begins and areas that encode RNA transcripts that might have regulatory roles even though they are not translated into proteins.

I swear, I’m reading this article and finding it indistinguishable from the kind of bad science I’d see from ICR or Answers in Genesis.

I have to mention one other revelation from the article. There has been a tendency to throw a lot of the blame for the inane 80% number on Ewan Birney alone…he threw in that interpretation in the lead paper, but it wasn’t endorsed by every participant in the project. But look at this:

The day in September that the news embargo on the ENCODE project’s findings was lifted, Gerstein saw an article about the project in The New York Times on his smartphone. There was a problem. A graphic hadn’t been reproduced accurately. “I was just so panicked,” he recalled. “I was literally walking around Sterling Hall of Medicine between meetings talking with The Times on the phone.” He finally reached a graphics editor who fixed it.

So Gerstein was so concerned about accuracy that he panicked over an article in the popular press, but had no problem with the big claim in the Birney paper, the one that would utterly undermine confidence in the whole body of work, did not perturb him? And now months later, he’s collaborating with the Yale PR department on a puff piece that blithely sails past all the objections people have raised? Remarkable.

This is what boggles my mind, and why I hope some sociologist of science is studying this whole process right now. It’s a revealing peek at the politics and culture of science. We have a body of very well funded, high ranking scientists working at prestigious institutions who are actively and obviously fitting the data to a set of unworkable theoretical presuppositions, and completely ignoring the rebuttals that are appearing at a rapid clip. The idea that the entirety of the genome is both functional and adaptive is untenable and unsupportable; we instead have hundreds of scientists who have been bamboozled into treating noise as evidence of function. It’s looking like N rays or polywater on a large and extremely richly budgeted level. And it’s going on right now.

If we can’t have a sociologist making an academic study of it all, can we at least have a science journalist writing a book about it? This stuff is fascinating.

I have my own explanation for what is going on. What I think we’re seeing is an emerging clash between scientists and technicians. I’ve seen a lot of biomedical grad students going through training in pushing buttons and running gels and sucking numerical data out of machines, and we’ve got the tools to generate so much data right now that we need people who can manage that. But it’s not science. It’s technology. There’s a difference.

A scientist has to be able to think about the data they’re generating, put it into a larger context, and ask the kinds of questions that probe deeper than a superficial analysis can deliver. A scientist has to be more broadly trained than the person who runs the gadgetry.

This might get me burned at the stake worse than sneering at ENCODE, but a good scientist has to be…a philosopher. They may not have formal training in philosophy, but the good ones have to be at least roughly intuitive natural philosophers (ooh, I’ve heard that phrase somewhere before). If I were designing a biology curriculum today, I’d want to make at least some basic introduction to the philosophy of science an essential and early part of the training.

I know, I’m going against the grain — there have been a lot of big name scientists who openly dismiss philosophy. Richard Feynman, for instance, said “Philosophy of science is about as useful to scientists as ornithology is to birds.” But Feynman was wrong, and ironically so. Reading Feynman is actually like reading philosophy — a strange kind of philosophy that squirms and wiggles trying to avoid the hated label, but it’s still philosophy.

I think the conflict arises because, like everything, 90% of philosophy is garbage, and scientists don’t want to be associated with a lot of the masturbatory nonsense some philosophers pump out. But let’s not lose sight of the fact that some science, like ENCODE, is nonsense, too — and the quantity of garbage is only going to rise if we don’t pay attention to understanding as much as we do accumulating data. We need the input of philosophy.

It’s nice to see someone willing to live by their own advice

All the scientists and naturalists out there crying foul on behalf of the desert need to hang their intellects up for a moment and spend some time in their hearts for a while.

I get the best rebuttal yet to my piece taking down Allan Savory’s “green the deserts by filling them with cows” pseudoscience.

(And remember, when you hang your intellects up for a moment, to heed Joan Crawford’s timeless counsel.)

So what else is new?

In a classic bit of strange understatement, Gizmodo reports that HeLa cells are weird.

Recent genomic sequencing on the popular "Kyoto" HeLa line reveals known errors common to cancer cells like extra copies of certain chromosomes, but also shows unexpected mutations like strong expression of certain genes and segment reshuffling on many chromosomes.

Uh, “recent”? HeLa cells were isolated from a cancer. Cancer cells have these common features, like genomic instability, aneuploidies, and loss of cell cycle control that we all know about. These particular cells were selected for properties that differ from healthy undisrupted human cells.

I also don’t know anyone studying them as models for humans (although I have heard animal rights people claim they’re adequate substitutes for mice, which is just as ridiculous).

So no surprises, and no understanding of cell culture research. We’re done!

How about if we just retire Dollo’s Law altogether?

Earlier this month, there was a flurry of headlines in the pop-sci press that exasperated me. “Have scientists discovered reversible evolution?” was one; “Evidence of Reverse Evolution Seen in Dust Mites” was another. They failed because they always tried to express a subtle idea in a fluffy way that screwed up a more fundamental concept in evolution — it was one step forward in trying to explain a legitimate science paper, and ten steps back in undermining understanding of evolution. This was just awful:

Researchers who deny the idea that evolutionary traffic can only move forward saw their arguments bolstered this week with the publication of a study suggesting that house dust mites may have evolved from free-living creatures into full-time parasites, only to abandon that evolutionary track and go back the way they came, reverting to the free-living creatures that live invisibly in your carpet, bed, and other places in your home that it’s probably best not to think about them living.

“Evolutionary traffic can only move forward”? Please, define “forward” in this context for me. Evolution doesn’t have a direction. You can talk about a temporal sequence of historical changes in a gene, for instance, but from the point of view of the process, there’s no “forward” or “backwards”, only change over time. Is a genetic deletion a backwards step? Is a duplication a forward step? If a mutation changes a cytosine to an adenine, is that going forward, and if there is a revertant, a mutation that changes that adenine back to a cytosine, is that going backwards? I keep hearing this talk about directions, and it doesn’t even fit into my understanding of the process of evolution. Direction is always something people infer retrospectively.

The paper all this comes from, Is Permanent Parasitism Reversible?–Critical Evidence from Early Evolution of House Dust Mites, by Klimov and O’Connor, isn’t that bad, but still it has some bits that annoy me.

Long-term specialization may limit the ability of a species to respond to new environmental conditions and lead to a higher likelihood of extinction. For permanent parasites and other symbionts, the most intriguing question is whether these organisms can return to a free-living lifestyle and, thus, escape an evolutionary “dead end.” This question is directly related to Dollo’s law, which stipulates that a complex trait (such as being free living vs. parasitic) cannot re-evolve again in the same form. Here, we present conclusive evidence that house dust mites, a group of medically important free-living organisms, evolved from permanent parasites of warm-blooded vertebrates. A robust, multigene topology (315 taxa, 8942 nt), ancestral character state reconstruction, and a test for irreversible evolution (Dollo’s law) demonstrate that house dust mites have abandoned a parasitic lifestyle, secondarily becoming free living, and then speciated in several habitats. Hence, as exemplified by this model system, highly specialized permanent parasites may drastically de-specialize to the extent of becoming free living and, thus escape from dead-end evolution. Our phylogenetic and historical ecological framework explains the limited cross-reactivity between allergens from the house dust mites and “storage” mites and the ability of the dust mites to inhibit host immune responses. It also provides insights into how ancestral features related to parasitism (frequent ancestral shifts to unrelated hosts, tolerance to lower humidity, and pre-existing enzymes targeting skin and keratinous materials) played a major role in reversal to the free-living state. We propose that parasitic ancestors of pyroglyphids shifted to nests of vertebrates. Later the nest-inhabiting pyroglyphids expanded into human dwellings to become a major source of allergens.

It’s actually rather interesting that these mites have a phylogenetic history that shows some dramatic changes in lifestyle. Parasitism is a specialized pattern that typically involves a loss of shedding of generalized abilities that allow for autonomous living; they can get rid of functions that won’t be needed in the conditions they’ll be living in. A mammalian parasite is swimming in a sea of nutrients provided by the host; it can lose genes for the synthesis of many amino acids, for instance, and still survive because it’s immersed in those amino acids, provided by the mammalian bloodstream. But that makes it difficult to leave the parasitic life — if it moves out to the more limited diet available in the external world, it may find itself starving to death, unable to synthesize essential building blocks. Yet here they have evidence that mites shifted from parasitism to free-living.

But I have two complaints. One is this framing as a refutation of Dollo’s Law — I really don’t give a damn about Dollo’s “Law” at all. The second is that they haven’t really shown any evidence of molecular/genetic reversibility.

I just roll my eyes at papers that talk about Dollo’s Law anymore. Do people realize that it was a macroevolutionary hypothesis formulated in the 1890s, before anyone had a clue about how genetics worked, much less how genetics and evolution worked together? It was a reasonable prediction about how traits would distribute over time. A horse, for instance, runs on a single robust toe on each leg, the other digits reduced to vestigial splints; Dollo’s law says that those splints won’t re-expand to reform toes identical to those found in horse ancestors. Why, he didn’t know.

A modern understanding of the principle, informed by the underlying genetics, would instead say that a complex character involving multiple genetic changes and relying on a particular background for its expression is statistically unlikely to be reconstituted by stochastic changes in a different genetic background, in exactly the same way. It’s not a ‘law’, it’s a consequence of probability.

The authors have only found reversion to an ancestral pattern on a very coarse scale: there are a great many ways to be a free-living organism, and there are a great many ways to be a parasite. They can say on a very gross level that mites have changed their niches in their evolutionary history, but they can’t claim there has been an evolutionary reversal: if we compared the ancestral free-living form (pre-parasite phase) to the modern free-living form (post-parasite phase), I have no doubt, and there’s nothing in the paper to contradict me, that there would be significant differences in form, physiology, biochemistry and genome, and further, that the parasitic phase would have left evolutionary scars in that genome.

Dollo’s Law is archaic and superficial, and I have no problem agreeing that Klimov and O’Connor have refuted it. But the more interesting principle, founded in a modern understanding of microevolutionary and genetic events, has not been refuted at all — it’s just confusing that we’re still calling that Dollo’s Law, and that we mislead further by talking about a direction for evolution and ‘reversibility’ and all that nonsense. The only source of direction in this process is time’s arrow, and that doesn’t go backwards.

How not to read a graph

This ought to be on Skepchick’s Bad Chart Thursday. The Daily Mail — hey, why are you already groaning? — put up a graph to prove that global warming forecasts are WRONG. They say:

The graph on this page blows apart the ‘scientific basis’ for Britain reshaping its entire economy and spending billions in taxes and subsidies in order to cut emissions of greenhouse gases. These moves have already added £100 a year to household energy bills.

The estimates – given with 75 per cent and 95 per cent certainty – suggest only a five per cent chance of the real temperature falling outside both bands.

But when the latest official global temperature figures from the Met Office are placed over the predictions, they show how wrong the estimates have been, to the point of falling out of the ‘95 per cent’ band completely.

Now here’s the graph. Let’s see if you can detect where they mangled the interpretation.

mailgraph

(Note: I haven’t looked to see whether the underlying data is correctly presented. I’m only examining the Mail’s ability to read their own chart.)

One error of interpretation is the claim that the ‘predictions’ were plotted in retrospect…as if the scientists had just made up the data. That’s not true — what they did was enter the same kinds of measurements available in the past as we have now, plug them into the computer as inputs, and let it generate predictions. This is an important part of testing the validity of the model — if it gave a poor fit to past data, we’d know not to trust it. That it worked well when giving the past 50 years worth of data is a positive result.

The big error of interpretation is to look at that graph and claim it demonstrates a “spectacular miscalculation.” To the contrary, it shows that the predictions so far have been right. As Lance Parkin says,

It’s an argument presented entirely in their own terms, using only data they presented, framed in language of their choosing. It’s been spun and distorted and shaped as much as they possibly can to get the result they want to get and it still says that the scientists who have consistently and accurately predicted that the world is warming were right. That’s their best shot? It’s rubbish.

Need a cleanser after seeing that? Here are ten charts interpreted correctly and demonstrating the reality of climate change.

People actually read the Daily Mail in the UK, huh? I guess it’s like the US’s Fox News…unaccountably popular.

Nightmare fuel

It’s morning here, so it’s probably safe to post this now. I read this article just before bed last night, and then I had a nightmare.

I dreamt that I walked into my classroom, and 50 pairs of eyes all turned to me, and they were all wearing Google Glass, and there were all these little red cyborg lights blinking at me. And there I was torn between the horror of my every word and expression being uploaded to Google’s servers, and…wanting one myself.

Don’t worry, though, I knew it was a dream, so I just flooded the whole room with salt water and shorted out their gadgets, and then I turned them all into mermaids and we…well, you don’t need to know.

But still! After the conversation about privacy yesterday, it was a bit worrisome.