#PLOSGenetics: The Case for Junk DNA


This is the paper to read: Palazzo & Gregory’s The Case for Junk DNA. It clearly and logically lays out the complete argument from evidence and theory for the thesis that most of the genome is junk. It’s not revolutionary or radical, though: the whole story is based on very fundamental population genetics and molecular biology, and many decades of accumulated observations. And once you know a little bit of those disciplines — you don’t need to be a genius with a great depth of understanding — the conclusion is both obvious and in some ways, rather trivial.

Here’s that conclusion:

For decades, there has been considerable interest in determining what role, if any, the majority of the DNA in eukaryotic genomes plays in organismal development and physiology. The ENCODE data are only the most recent contribution to a long-standing research program that has sought to address this issue. However, evidence casting doubt that most of the human genome possesses a functional role has existed for some time. This is not to say that none of the nonprotein-coding majority of the genome is functional—examples of functional noncoding sequences have been known for more than half a century, and even the earliest proponents of “junk DNA” and “selfish DNA” predicted that further examples would be found. Nevertheless, they also pointed out that evolutionary considerations, information regarding genome size diversity, and knowledge about the origins and features of genomic components do not support the notion that all of the DNA must have a function by virtue of its mere existence. Nothing in the recent research or commentary on the subject has challenged these observations.

The whole ENCODE debacle, in which hundreds of millions of dollars was sunk into an effort to identify the function of every bit of the genome, was a PR disaster. Larry Moran asks how Nature magazine dealt with the errors; the answer seems to be with denial. Authors of the ENCODE report are claiming they were “misunderstood & misreported” and that they aren’t “backing away from anything”.

I’m not too dismayed that science journalists didn’t understand how the claims of ENCODE conflicted with evolutionary biology, since I don’t expect journalists to have the same focus on the science (this is not a knock on science journalism; I have a lot of respect for the good practitioners of the art, but just that they have different priorities than the working scientists who have to deal with the background details). But what really shocks me is that big-name genomics researchers, people who get awarded lots of money to study the structure of the genome, don’t understand the fundamentals laid out for them in the Palazzo & Gregory paper. It’s not that I expect every scientist to know the entirety of a gigantic field — heck, I get confused and lost every time I read a bioinformatics paper — but these are scientists paid in big money and prestige to study genome function who don’t have a grasp on the evolutionary constraints on genome function, which seems to be a rather critical omission. And these scientists without a clue get elected to the Fellowship of the Royal Society.

How does that happen? I had this fantasy that science was a meritocracy and that great scientists advanced by having deep knowledge and doing great work, but it seems another way to succeed is leap into a new field and bamboozle everyone with technology.

I am so disillusioned.

Comments

  1. birgerjohansson says

    “these are scientists paid in big money and prestige to study genome function who don’t have a grasp on the evolutionary constraints on genome function, which seems to be a rather critical omission.”

    In large organisations (business, government) these kind of mistakes happen all the time. And the careers of those who enabled these mistakes rarely suffer.

  2. David Marjanović says

    It’s not uncommon that molecular biologists have little of a clue about evolution. In the early 2000s, The Alberts happily talked about higher eukaryotes, meaning all except yeast!

  3. Marc Abian says

    Alternative hypothesis: People are happy to publish and publicly defend eye catching stuff even if they don’t believe it.

  4. says

    Maybe this can start spawning, “You only use 10% of your DNA” meme rather than the “10% of your brain” meme.

  5. ambassadorfromverdammt says

    If all that inactive DNA were snipped away, what kind of organism would result?

  6. says

    If all that inactive DNA were snipped away, what kind of organism would result?

    Most likely an organism pretty much indistinguishable from the original type. Something like this was done with mice once and they turned out fine.
    http://www.nature.com/news/2004/041018/full/news041018-7.html

    Of course, there’s always the possibility that some sequence that appears to be junk really serves some obscure or very subtle function. However, it seems pretty damn likely that most of it really is just junk.

  7. Donnie says

    could that junk DNA have served a purpose during fetal development? “things” (the scientific limit of my knowledge” get turned on and off during during development. Maybe the DNA was needed at one point in order to get through a development phase. However, per #7, that may be just me mixing my “things” up.

  8. Nentuaby says

    Donnie, genes that only play a role during development are already 100% accounted for in the existing catalogue of functional genes. It just means those that play some role, some time.

  9. ambassadorfromverdammt says

    @7 LykeX

    Thanks, I didn’t know if experiments had already been conducted. Hypotheses are ok, but verification rules.

  10. lochaber says

    Something I’ve briefly thought about before (but never bothered to look up…), and that popped up on the latest creationist troll thread, was that junk DNA can serve as a buffer to deleterious mutations.

    If, say, radiation or something alters a flat percentage of nucleotides, then having a large amount of useless (junk) DNA, would mean that more of those mutations would happen in the junk DNA, and be less likely to alter something vital to the critter’s survival.

    I’m sure I’m missing something obvious, but even so, why is it that so many people seem to find the idea of junk DNA offensive? I don’t quite get it…

  11. David Marjanović says

    could that junk DNA have served a purpose during fetal development?

    Uh, no – then it’d consist of ordinary genes that just aren’t turned on in the adult, which you couldn’t even tell from their sequences!

  12. David Marjanović says

    If, say, radiation or something alters a flat percentage of nucleotides

    Of course it doesn’t. Mutations happen per number of nucleotides, not per genome.

    Also, google for “onion test”. :-)

  13. Robert B. says

    In other words, lochaber, your important genes are just as likely to catch a transcription error or a gamma photon whether they’re surrounded by junk DNA or not. Having junk DNA does mean that more mutations happen in the junk, but it does not mean that fewer mutations happen in the functional genome. (A smaller percentage of all mutations, maybe, but no fewer in absolute number.)

  14. lochaber says

    David, Robert>

    hmm… yeah, I didn’t think that through completely. :/ I was thinking any given mutation would be less likely to affect functional DNA with a sufficiently large proportion of junk DNA, but with a large enough sample size (or time, or whatever), it should end up with equal amounts of mutation to the functional DNA, regardless of how much junk is present.

    I haven’t had coffee yet, does that count as an acceptable excuse?

  15. David Marjanović says

    I was thinking any given mutation would be less likely to affect functional DNA with a sufficiently large proportion of junk DNA

    …given genomes of identical sizes. But the sizes are not identical. Bigger genome, more mutations.

  16. says

    David Marjanovic, # 13

    Mutations happen per number of nucleotides

    Robert, # 14

    Having junk DNA … does not mean that fewer mutations happen in the functional genome.

    This isn’t exactly correct. If ionizing radiation is a cause of mutations, and a cell is exposed to a fixed flux of such radiation, then increasing the number of nucleotides in the cell is going to reduce the number of radiation-induced mutations per nucleotide – a photon can only be absorbed once.

    But we need to consider the amount of other potentially absorbing material that a photon needs to get past to reach the genome. In my estimation, this is vast compared, to the mass of the genome, so increasing the mass of the genome by a few 100% doesn’t seem likely to me to be able to make much difference to the photon flux – your fixed mutations per nucleotide will be approximately correct.

    We could stretch our imagines and try to come up with some protective mechanisms, though. For example, if adjacent ionizing events on one molecule are dis-proportionally harmful, and high-energy impacts produce secondary particles with a short mean-free path (I don’t expect there to be very many high-energy events, but there will be some – iron nuclei from the Crab Nebula and so forth) then padding between the sensitive regions with junk might be useful. Wild speculation – I’m just trying to illustrate that there can be complex mechanisms we haven’t thought of yet.

  17. rinn says

    While we are on the topic of junk DNA – is there an explanation why fugu fish have so little of it?

  18. chris61 says

    I think referring to ENCODE as a debacle is a little harsh. There’s a lot of really interesting science being done using that data. How much of the genome is functional will almost certainly depend on how you define function.

  19. gillt says

    fugu appears to have eliminated the whole genome duplication that occurred in early ray-finned fishes. So maybe there was selection in its life history for deleterious mutations that lead too weakly selected genes and loss of DNA.

    You could also ask why salamanders have such consistently large genomes as a group. It’s mostly intronic, suggesting some possible function (i.e., not the result of a duplicated genome or multiplication of repetitive sequences). What we need are Salamander and Fugu ENCODE consortiums.

  20. Amphiox says

    I think referring to ENCODE as a debacle is a little harsh. There’s a lot of really interesting science being done using that data.

    The “ENCODE debacle” does not refer the scientific work done in that project, for the most part. It refers to the reporting of the alleged results of that scientific work.

  21. Amphiox says

    I mean, take a look at this PLOS link, and notice that it directly references quite a lot of ENCODE data as supportive of its key arguments.

  22. johnharshman says

    Amphiox: your link isn’t there.

    GillT: what makes you say that fugu have eliminated the whole-genome duplication? Do they, unlike other teleosts, have only four Hox clusters? I’m going to hazard a guess that this is not true, and that their small genomes result from having way less junk DNA than most other vertebrates. Do you have evidence against that hypothesis?

  23. johnharshman says

    If ionizing radiation is a cause of mutations, and a cell is exposed to a fixed flux of such radiation, then increasing the number of nucleotides in the cell is going to reduce the number of radiation-induced mutations per nucleotide – a photon can only be absorbed once.

    Ah, but a larger genome has a larger volume, and so will encounter a greater number of said photons than a smaller genome. Anyway, ionizing radiation isn’t the primary cause of mutation.

  24. gillt says

    @21: The usual suspects are plenty critical of ENCODE science.

    T. Ryan Gregory

    Well, the ENCODE papers go wrong in the abstract and the first item in the list of findings by using either an inflated figure of 80% and/or a misleading definition of “function”. That formed the basis of the (mis)communication with the media.

    Larry Moran:

    The ENCODE leaders are taking their results and using them to draw conclusions about the evolution of genomes. In doing so, they pretty much ignored thirty years of data on the subject of junk DNA. most of the ENCODE leaders are biochemists and bioinformaticians.

    Dan Graur:

    The entire edifice of ENCODE was based on the assumption that whatever happens in cancer cell lines (some very very old) is relevant to human cells. Well, guess what? They aren’t and with the exception of the hype, nothing will be left of ENCODE.

  25. says

    Large amounts of junk DNA may provide an evolutionary advantage. More DNA to duplicate during cell division means more time and energy consumed. The extended time it would then take a cancerous cell to divide enough times for the tumor to become fatal would allow time for the organism to reproduce, the sin qua non of evolutionary success.

  26. ChasCPeterson says

    fugu appears to have eliminated the whole genome duplication that occurred in early ray-finned fishes.

    ? ze citation, please?
    I guess it could happen, some sort of autodiploid parthenogenesis event? A lucky one?

  27. David Marjanović says

    How much of the genome is functional will almost certainly depend on how you define function.

    The ENCODE definition of “functional DNA” is “anything ever binds to it”.

    You could also ask why salamanders have such consistently large genomes as a group.

    Lungfish, too; and don’t get started on amoebozoans.

    Ah, but a larger genome has a larger volume, and so will encounter a greater number of said photons than a smaller genome.

    One might think that this causes a greater total number of mutations in a multicellular organism – but it doesn’t, because increasing the number of nucleotides in the cell increases the size of the cell, all else being equal. The DNA density of the cell does not increase. It’s not quite clear how this works, but the correlation between genome size and the size of a particular cell type is so good that the genome sizes of extinct vertebrates can be calculated from the sizes for the holes for bone cells in fossils.

    But, as you say, the main cause of mutations isn’t ionizing radiation anyway! It’s mistakes by the replication and repair apparatuses.

  28. gillt says

    John Harshman:

    It doesn’t really matter what your hunch is, we’ve known for 20 years that fugu have four hox clusters and diverge greatly from other teleosts, which have 7.

    Your hypothesis is not much of an explanation of anything so it’s kinda hard to leverage evidence against.

  29. David Marjanović says

    More DNA to duplicate during cell division means more time and energy consumed. The extended time it would then take a cancerous cell to divide enough times for the tumor to become fatal would allow time for the organism to reproduce

    That’s a novel idea, but I can’t see it passing the onion test.

  30. gillt says

    Chas:

    speculative, of course.

    Fugu: a compact vertebrate reference genome

    If the compactness of the pufferfish genome is a derived, rather than a primitive trait, it has to be explained by accumulated deletions. Like Fugu, Drosophila has small introns and intergenic regions, and scarce pseudogenes and repeats. It was recently shown that the Hawaiian cricket, which has a genome 11 times larger than Drosophila, loses DNA 40 times more slowly [15]. Thus, a strong bias towards deletion mutations would lead to the loss of DNA that was too weakly selected to overcome the bias, or was extraneous. Such a bias would account for the small genomes of pufferfish. Investigation of insertion/deletion frequencies in the close relatives of Fugu which have larger genomes should confirm if there is indeed a deletion bias in the Fugu.

  31. says

    johnharshman, #24

    Ah, but a larger genome has a larger volume, and so will encounter a greater number of said photons than a smaller genome.

    If its volume is proportional to the number of nucleotides, then its cross-sectional area is not. This will grow more slowly, so the flux per nucleotide will be smaller for a larger genome. But as you suggest, I think it’s somewhat academic.

  32. Amphiox says

    Amphiox: your link isn’t there.

    But “this link”, I meant the link in the OP.

  33. Amphiox says

    More DNA to duplicate during cell division means more time and energy consumed. The extended time it would then take a cancerous cell to divide enough times for the tumor to become fatal would allow time for the organism to reproduce

    Considering that we already have empirical evidence that an average tumor cell from a modern organism, with all the junk DNA and everything, can complete a cell cycle in around 24h in ideal conditions, this seems unlikely.

    ie, You’re never going to be able to extend the time a tumor takes to grow by expanding the amount of DNA needed to duplicate because that just isn’t the critical rate limiting step in tumor growth.

  34. Amphiox says

    Ah, but a larger genome has a larger volume, and so will encounter a greater number of said photons than a smaller genome.

    Pertaining to the putative advantage of having more junk DNA from the perspective of mutation protection, we must keep in mind that ionizing radiation isn’t the only or the most abundant source of germ-line mutations.

    The most abundant source of germ-line mutations is probably DNA replication errors, and those occur at a per-base-pair rate, and thus can not be ameliorated with additional non-coding DNA, no matter how vast.

  35. Christopher says

    Excess, non-functional cruft is what you’d expect from unguided evolution.

    An article from 1998 about using evolutionary theories to evolve an FPGA design that could distinguish between a 1khz and 10khz signal:

    http://discovermagazine.com/1998/jun/evolvingaconscio1453

    Strangely, Thompson has been unable to pin down how the chip was accomplishing the task. When he checked to see how many of the 100 cells evolution had recruited for the task, he found no more than 32 in use. The voltage on the other 68 could be held constant without affecting the chip’s performance. A chip designed by a human, says Thompson, would have required 10 to 100 times as many logic elements—or at least access to a clock—to perform the same task. This is why Thompson describes the chip’s configuration as flabbergastingly efficient.

    It wasn’t just efficient, the chip’s performance was downright weird. The current through the chip was feeding back and forth through the gates, swirling around, says Thompson, and then moving on. Nothing at all like the ordered path that current might take in a human-designed chip. And of the 32 cells being used, some seemed to be out of the loop. Although they weren’t directly tied to the main circuit, they were affecting the performance of the chip. This is what Thompson calls the crazy thing about it.

  36. Marc Abian says

    Alright, what about transposable genetic elements? The more junk, the less likely they are to insert into something useful.

  37. neuroguy says

    I had this fantasy that science was a meritocracy and that great scientists advanced by having deep knowledge and doing great work…

    BWAHAHAHAHA…….. Rosalind Franklin. I rest my case, your honor.

    Back on topic, I defer to the experts, but wouldn’t there be evolutionary pressure to get rid of “junk” DNA if in fact its existence were deleterious to survival or reproduction? If its existence were beneficial, it wouldn’t be “junk” by definition. If its existence were neutral, it could conceivably disappear via genetic drift, correct? This is not my field, but naively it seems there was more alleged “junk” than evolutionary theory would actually predict, so it’s not surprising that some functions have been found for some of it.

  38. Bernard Bumner says

    The rate of DNA insults caused by e.g. oxidative damage by free radicals, metabolites, cosmic rays, background radiation, UV, is much greater than the than the replication error rate, isn’t it? The estimates for rates of oxidative modifications of DNA vary massively (I’ve seen figures ranging from 100s to millions per day per cell, and that commonly quoted number of 10,000, which I admit I’ve never bothered to read the reference for). In that case, smaller functional content by proportion of genome could be advantageous, I think?

    Hmm. I used to share a lab with a DNA repair group, but I’ve essentially forgotten almost everything I knew on the subject, and any small crumb of information is hopelessly out of date.

  39. gillt says

    The ENCODE definition of “functional DNA” is “anything ever binds to it”.

    The ENCODE paper:

    Operationally, we define a functional element as a discrete genome segment that encodes a defined product (for example, protein or non-coding RNA) or displays a reproducible biochemical signature (for example, protein binding, or a specific chromatin structure)

    I see nothing wrong with that or recent framing

    The major contribution of ENCODE to date has been high-resolution, highly-reproducible maps of DNA segments with biochemical signatures associated with diverse molecular functions. We believe that this public resource is far more important than any interim estimate of the fraction of the human genome that is functional.

  40. johnharshman says

    gillt:

    It doesn’t really matter what your hunch is, we’ve known for 20 years that fugu have four hox clusters and diverge greatly from other teleosts, which have 7.

    If we’ve known that for 20 years, it seems odd that I can’t find it anywhere in the literature and can in fact find statements that fugu have 7 Hox clusters. Can you cite something?

    Your hypothesis is not much of an explanation of anything so it’s kinda hard to leverage evidence against.

    It seems that the source you’ve quoted supports this claim at least partially, so I’m surprised you say that. “Like Fugu, Drosophila has small introns and intergenic regions, and scarce pseudogenes and repeats.” I would certainly count introns (barring the splice signals and the occasional regulatory element) as junk; pseudogenes and repeats too.

  41. David Marjanović says

    BWAHAHAHAHA…….. Rosalind Franklin. I rest my case, your honor.

    To be fair, that was sixty years ago.

    Back on topic, I defer to the experts, but wouldn’t there be evolutionary pressure to get rid of “junk” DNA if in fact its existence were deleterious to survival or reproduction? If its existence were beneficial, it wouldn’t be “junk” by definition. If its existence were neutral, it could conceivably disappear via genetic drift, correct? This is not my field, but naively it seems there was more alleged “junk” than evolutionary theory would actually predict, so it’s not surprising that some functions have been found for some of it.

    There is some degree of selection pressure for losing junk DNA, because smaller genomes can be replicated faster. But there’s no mechanism to selectively use junk DNA! You have to wait for random deletion mutations that happen not to take out anything useful.

    Operationally, we define a functional element as a discrete genome segment that encodes a defined product (for example, protein or non-coding RNA) or displays a reproducible biochemical signature (for example, protein binding, or a specific chromatin structure)

    I see nothing wrong with that

    Emphasis added.

    Of the many proteins that bind to DNA, none binds truly exclusively to a single sequence. All bind, at varying strength, to DNA in general. This is chemistry we’re talking about, not macroscopic mechanics.

    That bit about “a specific chromatin structure” is also strange. What if certain sequences are folded up tightly precisely because they’re not in use?

  42. johnharshman says

    Alright, what about transposable genetic elements? The more junk, the less likely they are to insert into something useful.

    The problems with that are at least twofold. First, it doesn’t pass the onion test. Why should onions need four times the defense agains transposable elements as humans? Second, a high proportion of junk DNA is transposable elements. Is junk DNA then a defense against junk DNA?

  43. Marc Abian says

    First, it doesn’t pass the onion test. Why should onions need four times the defense agains transposable elements as humans? Second, a high proportion of junk DNA is transposable elements. Is junk DNA then a defense against junk DNA?

    I’m thinking in terms of incidental benefits only. With such a view, the answer to your second question is yes, provided junk DNA makes detrimental insertions less likely.

    An onion might appreciate 4 times the protection. Who knows? As for the onion test itself, it remains to be demonstrated it’s reliable. Consider bacteria. They have small genomes, and very little junk. If you look at the bacteria that have evolved to become intracellular symbionts, their genome size is markedly reduced. Clearly there’s a higher degree of selective pressure to get rid of DNA, especially junk DNA, in bacteria than in higher eukaryotes (eukaryotes which are not yeast). If we accept that ya gots different strokes for different folks, then why is the assumption that onions and humans have the same selective pressures (or even requirements, though I doubt much of what is considered junk now is required) with respect to junk DNA?

    Anyway, just something to think about. I’d be surprised if we differ in our opinions about junk DNA.

  44. yubal says

    Let me start of with the notion that this:

    Nevertheless, they also pointed out that evolutionary considerations, information regarding genome size diversity, and knowledge about the origins and features of genomic components do not support the notion that all of the DNA must have a function by virtue of its mere existence.

    Doesn’t actually makes any sense whatsoever on a molecular level.

    Although I am getting tired of pointing this out, the entire debate is flawed by the views of scientists who work on genetics. Yes. I said it. True “Junk DNA” is by definition not genetic and by simple consideration (just estimate how much ATP a cell has to pay to maintain/replicate that useless extra DNA for one cycle in respect to the net available ATP and you are in the game, the lower numbers are still ridiculously high) harmful.

    What geneticist generally don’t get is that their neat (and pretty cool) cause and effect relationships over myriads of regulatory cascades of gene expression and response, factoring environmental factors or not, actually have nothing to do with the C-value paradox.

    By focusing on the coding and regulatory sequences of the genome, people came up with “function” relationships. Which are good to know and often times really cool but the underlying problem with “Junk DNA” is that people still stick to a primitive concept of what DNA is supposed to do. Yes, DNA carries sequence information, but it does not function like that by itself, it functions like that as being assembled and managed by proteins and RNA that derive from the coding function of itself.

    You know what DNA really does by itself? It excludes volume(space). It forms gel-like structures following a complicated law that are intrinsically dynamic and extremely hard to visualize. The result is an ultra complex landscape of excluded (or enclosed!) solvent volumes that interact with the protein matrix of a cell, or in the eukaryotic case in the the nucleus. Every organism maintains and organizes its own genome in similar but different manners. Our cells pack the DNA in protein filaments and pack them even more to save space. But if they open it up to read it out they require even more space, let alone the maintenance machinery that need to interact with the DNA. If you pack all genetic information as tight as you could (6 possible open reading frames and creative usage can minimize a genome even more, but that would require some SERIOUS intelligence to design) you will run into serious problems. Not genetically, structurally.

    Take a yeast or something like that and delete all Junk DNA. What do you see? Are there periodic patterns if you delete increasing fractions in the same area of “junk”? What is the average persistence length of dsDNA in conserved “junk” sequences compare to the whole genome/random sequences?

    Is anybody testing for the obvious questions here?

  45. chris61 says

    The “onion” test presupposes that 80% of the onion genome also demonstrates evidence of biochemical/biological function. As far as I know that hasn’t been shown. The issue of whether cancer cell lines are relevant to what goes on in normal cells I agree is valid but presumably further experimentation will resolve that.

  46. Amphiox says

    I had this fantasy that science was a meritocracy and that great scientists advanced by having deep knowledge and doing great work…

    The scientific method is designed so that the IDEAS with the most merit eventually rise to the top, while those with less merit are systematically eliminated.

    But it has jack-all to say or do about crediting the people who propose those ideas.

    The “onion” test presupposes that 80% of the onion genome also demonstrates evidence of biochemical/biological function. As far as I know that hasn’t been shown.

    Are you proposing then that the onion has much less than 80% of its genome with biochemical/biological function, to account for the much larger genome compared to humans? If so, the non-functional DNA must still constitute the majority of the DNA in many species, including the onion, just not humans, in which case, you are presupposing that humans are somehow special with a special genome.

    Then of course the onion test doesn’t just refer to onions. It refers to the WHOLE range of diversity of genome sizes among eukaryotes. What then of the species with tiny genomes, many times smaller than that of humans? If the 80% of the human genome that ENCODE finds biochemical function for were important, then how are all these species with much, much, much less than 80% of the DNA that humans have managing with so little DNA? Are you suggesting that they are ALL several TIMES less complex than humans? That too is a presupposition.

  47. Amphiox says

    There is a bigger implication to this whole Junk DNA issue.

    If the Nearly Neutral Theory is true, then it almost follows tautologically, based on the known mechanisms of mutation, that the majority of DNA in eukaryotic species MUST be non-functional.

    If a majority of the DNA of most eukaryotic species is actually functional, then the Nearly Neutral Theory must be wrong.

    But of course how then do we explain all the other genomic and evolutionary phenomenon that the Nearly Neutral Theory explains, and no other sub-theory of evolution explains as well?

  48. Amphiox says

    I had this fantasy that science was a meritocracy and that great scientists advanced by having deep knowledge and doing great work…

    BWAHAHAHAHA…….. Rosalind Franklin. I rest my case, your honor.

    It should be noted for the record, however, that while Rosaline Franklin did not get the full credit for the amount of work she did on the structure of DNA, she DID get at least some credit and accolades and rewards and career advancement for a lot of the other work she did during the course of her career. Not every institution she worked for in her career had same degree of ingrained misogyny as King’s College.

  49. yubal says

    @ David Marjanović #42

    There is some degree of selection pressure for losing junk DNA, because smaller genomes can be replicated faster.

    Time is not as much of an issue considering the energetic cost.

    But there’s no mechanism to selectively use junk DNA! You have to wait for random deletion mutations that happen not to take out anything useful.

    Again, time is not of an essence and if junk would be neutral the build up and deletion of junk should converge to an low equilibrium over time, not at those high ratios we observe. Do I need to factor in something else here??

    Operationally, we define a functional element as a discrete genome segment that encodes a defined product (for example, protein or non-coding RNA) or displays a reproducible biochemical signature (for example, protein binding, or a specific chromatin structure)

    That is the point I would always dispute anyone about. Why do you define it like that? Is there any reason to do so?

    Of the many proteins that bind to DNA, none binds truly exclusively to a single sequence.

    Endonucleases lacking star activity? At least they act on specific sequences, not sure about binding.

    All bind, at varying strength, to DNA in general. This is chemistry we’re talking about, not macroscopic mechanics.

    First order of attraction is the charge. First wave DNA binding proteins tend to have a positive charge that attracts negatively charger DNA. Later ones require protein-protein interaction to land on a specific site marked by another protein. (I know you knew that)

    That bit about “a specific chromatin structure” is also strange. What if certain sequences are folded up tightly precisely because they’re not in use?

    Or loosely? Exactly my point. Follow the white rabbit.

  50. Bernard Bumner says

    What do the binding constants look like? More to the point, what is the effective concentration of DNA in any given cell?

  51. Ichthyic says

    I had this fantasy that science was a meritocracy

    wait… surely that died even back when you were a grad student? If not, your university did not do its job.

    Berkeley was fantastic at teaching the lesson that science is often anything but a meritocracy, especially within academia.

  52. chris61 says

    Amphiox @47

    “Are you proposing then that the onion has much less than 80% of its genome with biochemical/biological function, to account for the much larger genome compared to humans? If so, the non-functional DNA must still constitute the majority of the DNA in many species, including the onion, just not humans, in which case, you are presupposing that humans are somehow special with a special genome.”

    I’m just saying that ENCODE is one project to annotate the human genome. There are other projects to annotate other genomes. What do they say? The mouse project for example. Is any part of all this biochemical activity associated with non coding sequence conserved between human and mouse? If it isn’t then that to me would seem to support the idea that most of that activity is non functional. If it is, then maybe evolutionary theory will have to rethink its assumptions.

  53. David Chapman says

    28
    David Marjanović

    It’s not quite clear how this works, but the correlation between genome size and the size of a particular cell type is so good that the genome sizes of extinct vertebrates can be calculated from the sizes for the holes for bone cells in fossils.

    Wow! That’s kinda cool. :)

    If the relationship is that rigid, doesn’t that suggest a possible reason for all the junk DNA? I mean if there were some other, unrelated reason why our human cells worked well, functioned efficiently or whatever if they were relatively large, then all that DNA could be there simply to induce large cells.
    I’m presuming of course that as a rule, the larger the genome, the larger the particular cell type. But it’s also possible that all that junk everywhere might be so that just a few cell types are large, if this was of benefit. Say for example brain cells worked better if they were larger. And of course all the junk would have to be reproduced all over the body just to ensure neural largeness. So long as cell-largeness is not too detrimental to something else, elsewhere, this would work evolutionarily.

    So junk DNA would ‘code’ after all — pardon me pissing around with the technical terms — it would code for large cells.

  54. gillt says

    That is the point I would always dispute anyone about. Why do you define it like that? Is there any reason to do so?

    Without making excuses for overhyping, I would say large consortiums such as ENCODE aren’t interested or equipped to functionally characterize or validate their results–and why they define functional more as just reproducible. They’re creating a database and making it publicly available. It’s hubris for people who’ve never used ENCODE and never will use ENCODE to say it’s a boondoggle. How could they know?

    Can you cite something?

    http://www.cell.com/current-biology/pdf/S0960-9822(06)00284-3.pdf

    Why should onions need four times the defense agains transposable elements as humans?

    I see little value in armchair philosophizing like this. If the hypothesis is that they do you go looking for evidence for or against.

    It seems that the source you’ve quoted supports this claim at least partially…

    Your claim was that fugu have less junk which you seem to think is contra to the claim that fugu no longer has a duplicated genome. I don’t know how I’m supposed to respond to that. Again, here’s the paper I cited.

    fugu: a compact vertebrate reference genome
    http://www.sciencedirect.com/science/article/pii/S0014579300016598#BIB8

  55. gillt says

    It’s not quite clear how this works, but the correlation between genome size and the size of a particular cell type is so good that the genome sizes of extinct vertebrates can be calculated from the sizes for the holes for bone cells in fossils.

    Is it cell size or nucleus size? I thought that’s the reason why some salamanders (with enormous genomes) have enucleated red blood cells as a prevention against blood clotting.

  56. Paul Brown says

    I spend a fair bit of my time hanging out with folk who loosely classify themselves as bio-informatics people. The big idea is to use computers to wring “the truth” out of the genes. I’m a pure CS / Math type myself so my Biology is far from top shelf. But I acknowledge this, and don’t stretch myself in that direction.

    Yet what I do notice is a bias in the way CS / programming types in Bio-IT frame the problems they study. There is assumed to be “an answer” in there somewhere, and that answer can be figured out from the de-contextualized data. As Shakira might put it, “the SNIPs don’t lie”. Whatever the computer tells you must be right / significant / interesting. But of course, if you torture data enough, it will tell you anything you want to hear.

    From the fact that you can *find* correlations, it doesn’t follow that what you’ve found reveals anything. We can bring all kinds of mathematical gee-wizzery to bear on the data. But quite what, for example, the eigenvalues of adjacency matrices *means* in a Bio-IT context (other than you can write a paper with lots of greek letters in it and by making your data and methods public tick your reproducibility requirement) is beyond me.

    Someone once said that the greatest mistake a crafter can make is to fetishize their tools. Based on my (limited and probably distorted) exposure to Bio-IT and Bio-IT types, they’re at serious risk of committing this error.

  57. johnharshman says

    Can you cite something?

    http://www.cell.com/current-biology/pdf/S0960-9822(06)00284-3.pdf

    Far out of date, I’m afraid. Try this: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC314266/

    Your claim was that fugu have less junk which you seem to think is contra to the claim that fugu no longer has a duplicated genome.

    Ah, I see. Yes, they are compatible, but one should easily be able to distinguish a simple reduction in junk from an unduplication of the genome. For example, according to my hypothesis, deletions should be concentrated in the junk bits, while according to yours, not. And according to mine, fugu should have 7 Hox clusters (or maybe even 8), while according to yours, they should have 4. I think the data fit mine better than yours.

    Also, what mechanism would result in de-polyploidization? Are you just referring to random loss and/or divergence of extra gene copies here or to something more strange?

    Is it cell size or nucleus size?

    Cell size. It works for plants too: http://www.sciencemag.org/content/264/5157/421

  58. David Marjanović says

    The “onion” test presupposes that 80% of the onion genome also demonstrates evidence of biochemical/biological function.

    Scroll back up to the ENCODE definition of function.

    There is some degree of selection pressure for losing junk DNA, because smaller genomes can be replicated faster.

    Time is not as much of an issue considering the energetic cost.

    The time required to replicate the genome is commonly a limiting factor for the speed of reproduction in bacteria. It isn’t in eukaryotes, AFAIK – but energy may not be either; we have mitochondria, we have energy to burn.

    (…Obvious test: there are a few eukaryotes without mitochondria. I have no idea about their genome sizes.)

    if junk would be neutral the build up and deletion of junk should converge to an low equilibrium over time, not at those high ratios we observe

    I don’t understand why.

    Endonucleases lacking star activity? At least they act on specific sequences, not sure about binding.

    Binding is of course less specific. They may bind more strongly to their target sequences than to others, but ENCODE counted any binding.

    That bit about “a specific chromatin structure” is also strange. What if certain sequences are folded up tightly precisely because they’re not in use?

    Or loosely? Exactly my point. Follow the white rabbit.

    I don’t understand what that means.

    If the relationship is that rigid, doesn’t that suggest a possible reason for all the junk DNA? I mean if there were some other, unrelated reason why our human cells worked well, functioned efficiently or whatever if they were relatively large, then all that DNA could be there simply to induce large cells.

    The size of cells does have some degree of correlation to metabolism (because of the ratio of surface to volume). This may be why dinosaurs and bats have small genomes for tetrapod measures – but it completely fails to explain the fugu.

    Importantly, however, this effect is completely independent of the sequence of the DNA. It can’t cause any selection pressure to conserve any particular part of junk DNA.

    Say for example brain cells worked better if they were larger. And of course all the junk would have to be reproduced all over the body just to ensure neural largeness.

    Many plants have extra-large cells – epidermal hairs come to mind – that are huge single cells. These cells (and not the rest of the organism) are massively polyploid; Arabidopsis hairs get to 32 haploid genomes per cell.

    It’s hubris for people who’ve never used ENCODE and never will use ENCODE to say it’s a boondoggle. How could they know?

    …We’re talking about the way the ENCODE team interprets its results. The word game they play with functional is a boondoggle; there’s no way around that.

  59. David Marjanović says

    Many plants have extra-large cells […] that are huge single cells.

    …eh… yeah.

  60. gillt says

    The 2004 Amores paper says

    Results showed that the organization of pufferfish Hox clusters is similar to that of other teleosts (Amores et al. 1998; Naruse et al. 2000), refuting the hypothesis that morphological simplification is a direct result of the reduction in number of Hox clusters….Both species have at least seven Hox clusters, including two copies of Hoxb and Hoxd clusters, a single Hoxc cluster, and at least two Hoxa clusters, with a portion of a third Hoxa cluster in fugu.

    But the earlier Amores 1998 paper says

    The divergent Fugu Hoxd cluster (5) branches with high bootstrap value (965) with the HOXAclusters of other vertebrates (Fig. 2D). We conclude that Fugu has two orthologs of the tetrapod HOXAcluster and no described Hoxd cluster.

  61. johnharshman says

    Yes. The earlier paper is incorrect; science advances. At least, we now know that fugu has both Hoxda and Hoxdb. Interestingly, though it has 7 Hox clusters, it lacks one of the zebrafish clusters and has the one zebrafish is missing, which demonstrates nicely that there were originally 8 Hox clusters in teleosts.

    And incidentally shows that de-polyploidization, by whatever mechanism you suppose, is at least not complete.

  62. David Chapman says

    59
    David Marjanović

    The size of cells does have some degree of correlation to metabolism (because of the ratio of surface to volume). This may be why dinosaurs and bats have small genomes for tetrapod measures

    Interesting! :)

    …….( What are tetrapod measures? )

    – but it completely fails to explain the fugu.

    Fuck the fugu!

    (Ohhohhh those neurotoxins! )

    Importantly, however, this effect is completely independent of the sequence of the DNA. It can’t cause any selection pressure to conserve any particular part of junk DNA.

    But the selection pressure is there to conserve junk DNA in general, if there is any benefit to having large cells in various tissues.

    I suppose some types of white blood cells would benefit from being fairly butch, the better to deal with the heftier weight divisions of microbe….

  63. Amphiox says

    Is any part of all this biochemical activity associated with non coding sequence conserved between human and mouse? If it isn’t then that to me would seem to support the idea that most of that activity is non functional.

    If they use ENCODE’s definition of biochemical activity, then no matter what they find, it will support no such idea, as ENCODE’s definition of biochemical activity is simply too broad. So broad that even conservation of such activity in significant proportion would not be distinguishable from chance.

  64. Amphiox says

    Simply put, because they screwed up the definition, that part of ENCODE simply says nothing and cannot be used or extrapolated or compared or anything whatsoever with respect to the question of whether or not non-coding DNA has biologically significant function.

  65. chris61 says

    Amphiox @64

    “If they use ENCODE’s definition of biochemical activity, then no matter what they find, it will support no such idea, as ENCODE’s definition of biochemical activity is simply too broad. So broad that even conservation of such activity in significant proportion would not be distinguishable from chance.”

    What I meant was if specific biochemical activities are conserved between human and mouse. As in specific patterns of histone post-translational modifications or specific transcription factor binding etc. DNA sequence analysis does a fairly good job of identifying coding regions but regulatory regions of DNA, not so much. Histone post-translation modifications and specific transcription factor binding do a better job of identifying regulatory regions so it seems to me that if one wants to look for evolutionary conservation in non coding sequence that is where one should look rather than at DNA sequence.

  66. johnharshman says

    How can you say the specific transcription factor binding sites are conserved (in location, not sequence) if you can’t even align the sequences to determine homologous sites?

  67. chris61 says

    johnharshman @67 Sequences are aligned relative to coding regions. Non coding regions are examined for common patterns of transcription factor binding sites and histone modifications.

  68. Nerd of Redhead, Dances OM Trolls says

    I’m on and off reading Koonin’s The Logic of Chance, The Nature and Origin of Biological Evolution, and in Chapter 8, where I am, he discusses this topic, of why for plants and animals, the genome isn’t constantly purified of duplications and junk DNA. What I’ve read so far, indicates that since evolution for anything above bacteria works more on development, rather than individual proteins, that having extra genes floating around that might become useful with a little tweaking is more important than a tight genome. I could be wrong, of course.

  69. gillt says

    The point of contention revolves around the employment of the word “function” and people opposed to ENCODE’s definition decided that because of that disagreement the entire ENCODE project is a boondoggle.

    Also, what mechanism would result in de-polyploidization? Are you just referring to random loss and/or divergence of extra gene copies here or to something more strange?

    Not sure what the case is with fugu, but there’s a negative correlation between recombination rates and genome size.

    In addition, salamander genomes are largely the result of increased intronic space (most of which is jun) and may or may not correspond to slower overall metabolic rate. On a tangential note, don’t indels become less deleterious as coding density decreases according to mutational equilibrium models?

  70. johnharshman says

    chris61:

    Sequences are aligned relative to coding regions. Non coding regions are examined for common patterns of transcription factor binding sites and histone modifications.

    So you’re saying that any transcription factor binding site a similar distance from an exon should be considered homologous? How similar? Remember that indels are constantly hanging the lengths of sequences in most parts of the genome. Your idea of alignment seems so vague as to be untestable. Now, I suppose if you got a whole lot of data on binding of many particular transcription factors, and if the rough ordering of binding sites were constant for some region roughly the same distance from an exon, I might allow that there might be some sort of conservation. Is that what you mean? My expectation would be that there would be no more conservation of that than of sequence, but who knows? Whatever the function of such things in human and mouse, though, it seems odd that fugu do without them.

    gillt:

    Not sure what the case is with fugu, but there’s a negative correlation between recombination rates and genome size.

    Was that a response to the bit you quoted from me? Because I can’t see any connection. Let me ask again: what does “no longer has a duplicated genome” mean? What process would unduplicated a genome?

  71. chris61 says

    johnharshman @71

    “Now, I suppose if you got a whole lot of data on binding of many particular transcription factors, and if the rough ordering of binding sites were constant for some region roughly the same distance from an exon, I might allow that there might be some sort of conservation. Is that what you mean?

    Yes, that’s what I mean. Not necessarily order but binding sites that co-localize. I assume if such regions show up that some percentage of them would be tested empirically and would either identify regulatory regions or fail to do so. Either way it tests the hypothesis.

    I’m not saying that all this biochemical activity identified by ENCODE translates into biologically significant function but I don’t think one can dismiss it on the grounds that current evolutionary theory predicts that it shouldn’t have function. Function is testable. So let’s test some of it and see what kind of function it has.

  72. johnharshman says

    I think one can dismiss it on grounds that we have very good evidence for the lack of function in most of the human genome. And non-conservation of sequence is part of that evidence. Further, it does appear that random DNA shows about as much “biochemical activity” as ENCODE found; shouldn’t that make you suspicious?

  73. says

    Yeah, OK, but given that 1% of the world owns something like 60% of the wealth, not one of them is a scientist, half of them are in the energy industry (total guess) and those guys pay millions a year to a denialist machine to tell the masses that being a scientist makes you rich etc. etc. etc. and that’s why they “believe in global warming” please don’t imply (twice!) that any particular kind of scientist is rolling in dough handed over to them because they do science!

    Thank you that is all.

  74. gillt says

    Was that a response to the bit you quoted from me? Because I can’t see any connection. Let me ask again: what does “no longer has a duplicated genome” mean? What process would unduplicated a genome?

    Simple. The process I mentioned is correlated with an incredible shrinking genome. Again, nobody knows for sure how the fugu genome contracted. One explanation is a reduction in large insertions and a higher rate of deletions, which may include a reduction in redundant duplicated genes.

  75. Amphiox says

    If during the process of meiosis in an organism that previously had a genome duplication there was an error that resulted in one additional round of cell division, (without a matched DNA duplication), and the resultant gamete was viable, then the offspring that results from that gamete would “no longer have a duplicated genome”.

    If in one of those organisms that have a life cycle that includes both a diploid and haploid phases that are free living, and a mutational event occurs that allows the haploid phase to continue propagating without going into the diploid phase, and if the original organism had a genome duplication, the newly independent “haploid” organism would “no longer have a duplicated genome.”

    I’m not aware if any actual examples of the above processes are known, but it seems to me that both are theoretically possible.

    It also seems possible for an organism, after a genome duplication, to sequentially lose all or at least most of the redundant copies to individual deletion mutations, large and small, and if enough are lost, that too would count as a genome that is “no longer duplicated”.

  76. chris61 says

    johnharshman @73

    “I think one can dismiss it on grounds that we have very good evidence for the lack of function in most of the human genome. And non-conservation of sequence is part of that evidence. Further, it does appear that random DNA shows about as much “biochemical activity” as ENCODE found; shouldn’t that make you suspicious?”

    There is plenty of evidence and theoretical grounds as well for believing conservation of sequence implies function but I’m not convinced that the converse is true. Also what other evidence is there that most of the human genome lacks function? Also do you have a link to random DNA showing as much “biochemical activity” as ENCODE found? Whether or not that makes me suspicious depends on the details of the experiments.

  77. David Marjanović says

    What are tetrapod measures?

    I mean in comparison to other tetrapods (limbed vertebrates).

    But the selection pressure is there to conserve junk DNA in general, if there is any benefit to having large cells in various tissues.

    That would do nothing against huge turnover rates in junk DNA.

    In addition, salamander genomes are largely the result of increased intronic space (most of which is jun) and may or may not correspond to slower overall metabolic rate.

    I’m not sure exactly how low the metabolic rates of salamanders are, but they do have huge cells, which are, all else being equal, expected to cause lower metabolic rates because they have less surface per volume.

    On a tangential note, don’t indels become less deleterious as coding density decreases according to mutational equilibrium models?

    …Are you still assuming that mutations (indels in this case) happen a certain number of times per genome instead of per number of nucleotides?!?

    It also seems possible for an organism, after a genome duplication, to sequentially lose all or at least most of the redundant copies to individual deletion mutations, large and small, and if enough are lost, that too would count as a genome that is “no longer duplicated”.

    Reference, please! :-)

    Also what other evidence is there that most of the human genome lacks function?

    Well, for one thing, over half of it consists of retrovirus corpses in all stages of decay, and most of the rest consists of repeats.

  78. David Marjanović says

    chris61: did you read Palazzo & Gregory’s review?

    *lightbulb moment* Gregory! T. Ryan “onion test” Gregory!

    I’m reading it now. It’s in open access and mentions the onion test. :-)

  79. David Marjanović says

    …They forgot the unit on the x-axis of figure 1. o_O It must be a million basepairs, given that the human genome is a bit over 3 billion.

  80. gillt says

    I’m not sure exactly how low the metabolic rates of salamanders are, but they do have huge cells, which are, all else being equal, expected to cause lower metabolic rates because they have less surface per volume.

    Simply put smaller ovrerall cell size means lower metabolic demands and a faster growing organism. This is why I originally asked if it’s more nucleus rather than overall cell volume that matters.

    Genome size, cell size, and the evolution of enucleated erythrocytes in attenuate salamanders.

    …Are you still assuming that mutations (indels in this case) happen a certain number of times per genome instead of per number of nucleotides?!?

    Since this is the first time I’ve brought it up your confusing me with someone else. But now I’m confused, are you ignorant of the idea or don’t like it?

    Deletion Rate Evolution and Its Effect on Genome Size and Coding Density

  81. David Marjanović says

    It also seems possible for an organism, after a genome duplication, to sequentially lose all or at least most of the redundant copies to individual deletion mutations, large and small, and if enough are lost, that too would count as a genome that is “no longer duplicated”.

    Reference, please! :-)

    *facepalm* That was a deletion mutation in my comment, specifically the deletion of a sequence overlapping with the one I actually wanted to delete.

    What I wanted to say is that this process is thought to have happened to yeast. The reason for this, however, is that the reversal of the duplication wasn’t perfect, so that some genes are still duplicated.

    Genome size, cell size, and the evolution of enucleated erythrocytes in attenuate salamanders.

    Oh, specifically miniaturized plethodontid salamanders. Anything that makes blood cells pass through miniaturized vessels will be selected for in those, so it’s not surprising they’ve managed to evolve enucleated red blood cells.

    Deletion Rate Evolution and Its Effect on Genome Size and Coding Density

    Interesting. However, that paper talks about “deletion rates per nucleotide” (p. 1422), so you weren’t assuming that deletions happen per genome instead; I misinterpreted you.

    I don’t agree with part of the methods of the paper, however. My biggest problem with the paper is that it assumes deletion and insertion rates can be selected for. Deletions and insertions happen only, as far as I knew, by chromosome-scale events and by slippage of DNA polymerase, except for the activity of retroviruses and (retro)transposons; there are no DNases on the loose in a nucleus, the whole genome of a bacterium is protected against its restrictive endonucleases, I can’t think of a way that double-strand break repair could improve, and I can’t imagine a way how the spindle apparatus could change in such a way to increase only the deletion rate. (Admittedly, I have next to no idea about how plasmids are distributed to daughter cells, except that they’re very easily lost in general. But then, plasmids are tiny and irrelevant to eukaryotic junk DNA.) If I’m not grossly off here, only individual deletions and insertions can be selected for or against. If the whole genome is under strong selection for small size, as it is in (say) Escherichia coli, any individual deletion of anything that doesn’t provide an even stronger selective advantage will be selected for; in your average eukaryote, where the mitochondria provide energy to burn and where generation time doesn’t depend on genome replication time, that won’t generally be the case.

    The paper does briefly mention this problem and mentions, on p. 1429, one additional mechanism for deletion (and not insertion) that probably can mutate in ways that can be selected for. But it’s only known in E. coli, and I don’t think anything similar occurs in eukaryotes. That’s too bad, because, as far as I can see, this is the only way that the deletion rate and the insertion rate could vary independently (as assumed by equations 1 and 2). Otherwise, the same enzyme does both.

  82. chris61 says

    johnharshman @79

    Yes I did read Palazzo & Gregory’s review. I started to make a long reply but instead I will suggest you read Kellis et al. PNAS 111, 6131 (no idea how to post a link, sorry). To quote from the abstract, these authors “review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies.”

  83. gillt says

    so it’s not surprising they’ve managed to evolve enucleated red blood cells.

    I’m surprised your not surprised at the evolved novelty of actively enucleating RBCs.

    Yeah, the paper’s explanation wasn’t very convincing.

    Here, deletion rate mutations are selected on the fitness effects resulting from the random mutations that they cause throughout the genome sequences. This will influence the system in ways that depend on the frequency of intergenomic recombination.

    p.1421

    However, we know DNA polymerase creates more deletions than insertions, so selection for certain DNA polymerases seems reasonable.

    Here’s another way of looking at it.

    Our results are compatible with that recombination by some mechanism introduce deletion mutations. While the often seen (e.g. humans, Drosophila) positive correlation between recombination rate and levels of within-species genetic diversity [44]–[47] could potentially be interpreted to reflect that recombination is mutagenic also for point mutations, recombination reduces the effect of selection at linked loci thereby acting towards maintenance of genetic variation. On the other hand, support for a neutral link between recombination and nucleotide substitution has been provided by the observation in humans and Drosophila that regions of the genome with low recombination rate also show reduced rates of between-species divergence [45], [48], [49]. However, this remains a contentious issue because several contradictory conclusions have been claimed [50]–[54].

    Recombination Drives Vertebrate Genome Contraction