In one sense, I am happy that there is enough interest in the concept of “junk DNA” (and by extension, my area of research in genome size evolution) that the subject gets regular media attention. A few months ago, it was all about the ENCODE project and its “finding” of “function” for 80% of the human genome. This week, it’s a story that has the exact opposite message: that large amounts of so-called “junk DNA” can be deleted without apparent consequence. This most recent story was prompted by the publication of the genome sequence of the carnivorous plant known as the floating bladderwort. This plant is of interest because it has a very small genome that is nearly devoid of transposable elements and other non-coding DNA, while also containing more protein-coding genes than the human genome and exhibiting signs of past genome duplication events. We’ve known that the genome was small for several years, but having the genome sequence provides some important insights into what a genome this size contains, and (most interestingly) what it doesn’t.
In typical style, Ed Yong has written up a very nice summary of the paper and the potential implications for the junk DNA debate. Following the lead of the original paper and the associated press release, many media reports similarly took the “this plant can get rid of junk DNA, so maybe it isn’t functional after all” line (a few examples: here, here, and here).
I was quoted in Ed Yong’s article as follows:
“The study further challenges simplistic accounts of genome biology that assume functions for most or all DNA sequences, without addressing the enormous variability in genome size among plants and animals,” says T. Ryan Gregory, who studies the evolution of genome sizes at the University of Guelph.
In 2007, Gregory coined the “Onion Test” to challenge anyone who thinks that non-coding DNA isn’t junk. If that DNA is important, why is it that the onion needs so much more of it than a human, or even other closely related plants? “The Onion Test could just as easily have been called the Bladderwort Test,” he says. “If non-coding DNA is vital for gene regulation or some similar function, then how can a plant such as the bladderwort get by with so little of it?”
For me, the logic of the authors of the paper is straightforward. Here we have a complex plant with a lot of genes but very little non-coding DNA, and this calls into question the idea that you need a lot of non-coding DNA to regulate genes in a complex organism. Jonathan Eisen, on the other hand, has objected in his usual snarky way, awarding MSNBC and the authors of the bladderwort genome paper one of his “Twisted Tree of Life Awards”. As he summarizes the claim,
So – basically – if ONE FUCKING ORGANISM DELETES SOME OF IT’S [sic] NON PROTEIN CODING PORTIONS OF ITS GENOME THEN THIS MEANS THAT ALL NON CODING DNA IS USELESS.
In the comments thread on his blog post, he expanded on what he sees as the problem with this argument:
The fact that a plant can function without much non coding DNA really says nothing about the function or role of such non coding DNA in other species. All it says it that such non coding DNA is not absolutely essential for a plant to function. But this plant lineage could have evolved new means of regulation or other functions that were found in the non coding DNA of its ancestors. Or, in other words, a plant with a small genome says as much about non coding DNA in other plants and in humans as a fish with no eyes says about the role of eyes in vertebrates that see. Or should I try another? This says as much about the role of non coding DNA in other plants as the existence of snakes say about the role of legs. And so on.
And – there is no doubt that eyeless fish and limbless reptiles tell us an enormous amount. They tell us, for example, that eyes are non absolutely necessary for fish to function. And their adaptations to being eyeless tell us all sorts of great things about senses. But the existence of eyeless fish does not tell us that eyes are useless in fish.
Here is how I see the logic:
Most plants have junk DNA
One lineage doesn’t and the plants seem pretty OK.
Therefore junk DNA is useless
Most reptiles have legs
One lineage doesn’t have legs and these seem pretty OK.
Therefore legs are useless.
Isn’t that the logic here?
No, that isn’t the logic, and the legless snakes or eyeless cave fishes analogy is flawed. Why?
1. We know that legs and eyes are functional, and we know what they are functional for (walking and seeing, respectively). By contrast, we do not have strong evidence that non-coding DNA is functional or what it may be functional for. Worse, the very existence of so much non-coding DNA itself is taken as “evidence” that it must be doing something. Therefore, the observation of a plant that lacks a substantial amount of non-coding DNA but gets by just fine suggests that this kind of DNA isn’t strictly necessary in order to make a complex plant.
2. If most of the non-coding DNA in a larger genome does serve an important regulatory function, then it means this plant with a tiny genome must have evolved a totally different system for regulating its genes. This strikes me as a rather large assumption — and in any case, it’s one for which we have no evidence. As such, I would argue that it is at least as parsimonious to take this small genome as evidence that non-coding DNA in general does not serve a key regulatory function for the most part.
3. When snakes lost their legs or cave fishes lost their eyes, they also lost the specific ability that legs or eyes provided. Legless snakes can’t walk, because the function of legs is walking. Eyeless fishes can’t see, because the function of eyes is seeing. The proposed function for non-coding DNA is gene regulation. Unlike the snake or fish example, the bladderwort has lost most of its non-coding DNA but it can still regulate all of its genes just fine.
I think Jonathan raises a valid point about the dangers of overzealous extrapolation, but I think his criticism of the authors (much less its tone) in this case is unwarranted.
I’ve already mentioned the nonsensical paper “published” in (surprise, surprise) arXiv in which the authors claim that the origin of life occurred long before the origin of the Earth based on the application of Moore’s Law to DNA. I won’t go into all the reasons that this is silly — for that, you can see critiques by PZ Myers and Massimo Pigliucci. Suffice it to say that the data, the analysis, and the interpretation are all problematic.
Notably, the authors present this figure, which more or less sums up what is wrong with the entire paper.
As I saw this, I couldn’t help but feel like it reminded me of some other extrapolation I had seen years ago. And today it came to me — cooking a turkey by dropping it off a roof! Or rather, by converting potential energy into kinetic energy. Here’s the figure from the very funny article, which was published in the Journal of Irreproducible Results.
Here’s a short compilation I made for use in a recent presentation on ENCODE and the claim that 80% of the human genome is functional. These are quotes from ENCODE project leaders and the senior editor of Nature. It is not surprising that the story presented by the media was that ENCODE had destroyed the concept of “junk DNA”, given that this is what the researchers themselves said.
There has been lots of talk (including some in the media; see here and here and here) about the Graur et al. (2013) paper in GBE which was critical of ENCODE, much of it focusing on the tone of the paper. While the Graur et al. (2013) paper certainly doesn’t pull any punches in terms of ENCODE’s outrageous claims and incredible media hype, it also contains a number of important criticisms of the science underlying the project. Graur et al. (2013) were not the only ones to publish peer-reviewed critiques, and I expect that the list will continue to expand.
Here is a list of the papers that have appeared to date:
Say what you want about the tone of the Graur et al. (2013) paper in Genome Biology and Evolution, but it has people talking. Including Ewan Birney, the lead scientist of the ENCODE project and the primary spokesperson for ENCODE in the media fiasco describing the “death of junk DNA”. Most recently, Birney was interviewed by Quentin Cooper on the BBC Radio 4 program Material World, along with Oxford biologist Chris Ponting. You can listen to the show here.
I have also transcribed the parts in which Birney discusses the ENCODE findings and the flap around the claims of “80% function” of the human genome. Again, remember that Birney himself said several times in prior interviews that the ENCODE results undermine the idea of junk DNA and that the genome is jam packed with “switches” (see here, here, here, and here).
Quentin Cooper (host): Ok, well, Ewan, we’ll get on to what this paper says and doesn’t say, but can you just give us a quick precis of why ENCODE’s findings are in such contrast to a lot of conventional thinking and clearly a lot of current thinking about junk DNA?
Ewan Birney (Lead scientist, ENCODE): I don’t think the findings are in such stark contrast. It’s more about the interpretation of the words, in particular when the words get used in, um, out of context, out of the scientific paper context and propagated. So, what exactly do we mean by this word “junk”?.
Quentin Cooper: But are you sure it’s all about the context. Because I mean, one of the reasons why these findings attracted so much interest was because it didn’t seem to fit the conventional thinking. It can’t just be down to a bit of phraseology and saying actually confirms what we already knew, can it?
Ewan Birney: Ah, so, I don’t — It’s interesting to reflect back on this. For me, the big important thing of ENCODE is that we found that a lot of the genome had some kind of biochemical activity. And we do describe that as “biochemical function”, but that word “function” in the phrase “biochemical function”is the thing which gets confusing. If we use the phrase “biochemical activity”, that’s precisely what we did, we find that the different parts of the genome, [??] 80% have some specific biochemical event we can attach to it. I was often asked whether that 80% goes to 100%, and that’s what I believe it will do. So, in other words, that number is much more about the coverage of what we’ve assayed over the entire genome. In the paper, we say quite clearly that the majority of the genome is not under negative selection, and we say that most of the elements are not under pan-mammalian selection. So that’s negative selection we can detect between lots of different
mammals. [??} really interesting question about what is precisely going on in the human population, but that’s — you know, I’m much closer to the instincts of this kind of 10% to 20% sort of range about what is under, sort of what evolution cares about under selection.
Quentin Cooper: But this paper that’s appeared in the journal Genome Biology and Evolution, Dan Graur from the University of Texas, professor there, all these lines, do you think they’re reacting more to the press coverage around the story rather than what’s actually in your paper itself?
Ewan Birney: That’s my belief. And of course in some sense our responsibility to help that press coverage do the right thing and that’s what I worked very hard in September, in particular in the UK context where I’m based, to try and make that press coverage work. It is quite complicated, because you want to be excited about what you’re doing, but you don’t want people to get the wrong — draw the wrong interpretation of it. So, it’s not an easy job to do, but I believe that most of the heat in this debate is about the definitions of the words, and not the data or the interpretation of the data.
Chris Ponting (Oxford University): Ewan, my question to you is, how much of the genome do you think is vital for life?
Ewan Birney: Yeah, I don’t, I would do that on a, as you know I’ve blogged about this. You know, certainly everything that’s under negative selection in the human population, that evolution cares about right now in humans, that if it changes then the person has less reproductive fitness, that’s clearly vital for life. I think there is a chance of there being a small amount of additional things that we find interesting about differences between people that are not under selection. So, those may be phenotypes that are late-onset such as neurodegenerative diseases or things like that. So, I think there’s a little addition to that but those are the kind of boundary components to that.
Chris Ponting: So I think we can probably agree between us that between 10% and say 20% is vital for life.
Ewan Birney: I mean, I think we would agree with that. I think, you know, refining that percentage down is quite interesting. I think also the other components that we — biochemical events that we see in the genome, sort of, each one of them are equally likely to be part of that 10% to 20% that we’re looking for. It’s important to realize that it’s not the case that we can spot the 10% to 20% just by looking harder. Each of these different places in the genome that have some biochemical activity associated with it, when there’s some phenotype screen that’s directed there or some evolutionary screen that’s directed to that point, ENCODE can now say “Ah ha! Here is a biochemical thing that this piece of DNA looks like it could be doing”.
Quentin Cooper: Ewan, just briefly, what about another aspect of the criticism, the idea that the ENCODE project lacked anyone with any real knowledge of evolutionary biology?
Ewan Birney: Well, I’m sure we could have, um, ahhh, had, um, many other people join us. There were a number of us who have worked — I do know quite a bit about evolutionary biology. Whether everybody considers me to be an expert, I don’t, you know, that’s for other people to say. If you read the paper, and not the press reports, there is a lot detail spent on what is under different aspects of evolutionary biology, ah, selection. So, I think again, a lot of this is not about the paper, but is more about the words used to describe the paper.
Quentin Cooper: Finally though, Ewan, is there any of the criticism you do take on the chin, think well perhaps we did let the story get a bit away from us at times?
Ewan Birney: Well, hindsight is a fairly cruel thing. And one of the things which I regret about being in this situation, arguing about words in the press, is I just wish there was a way of us not talking about this. I think ENCODE already is being used by many, many different groups, in particular disease biology groups having their phenotypes screened against it, and other people worldwide. And the whole point of this for the data by the project to be used by many, many other groups. I’m really happy about that, um, and hindsight being such a cruel thing, makes me think about what I could have done to minimize this kind of rather heated debate.
I expect that we will be seeing several harsh critiques of ENCODE’s extraordinary claims about function in the human genome and the equally incredible mega-hype associated with the project. I know of at least one more that is forthcoming from a heavy-hitter in the field, but as a snarky smackdown, it will be very tough to beat the recent paper by Dan Graur and colleagues published in Genome Biology and Evolution. The paper is open-access, so go ahead and read it yourself here. Meantime, enjoy the following zingers:
ENCODE adopted a strong version of the causal role definition of function, according to which a functional element is a discrete genome segment that produces a protein or an RNA or displays a reproducible biochemical signature (for example, protein binding).
Oddly, ENCODE not only uses the wrong concept of functionality, it uses it wrongly and inconsistently.
…the ENCODE authors singled out transcription as a function, as if the passage of RNA polymerase through a DNA sequence is in some way more meaningful than other functions. But, what about DNA polymerase and DNA replication? Why make a big fuss about 74.7% of the genome that is transcribed, and yet ignore the fact that 100% of the genome takes part in a strikingly “reproducible biochemical signature”—it replicates!
Ward and Kellis (2012) confirmed that ~5% of the genome is interspecifically conserved, and by using intraspecific variation, found evidence of lineage-specific constraint suggesting that an additional 4% of the human genome is under selection (i.e., functional), bringing the total fraction of the genome that is certain to be functional to approximately 9%. The journal Science used this value to proclaim “No More Junk DNA” Hurtley 2012), thus, in effect rounding up 9% to 100%.
ENCODE chose to bias its results by excessively favoring sensitivity over specificity. In fact, they could have saved millions of dollars and many thousands of research hours by ignoring selectivity altogether, and proclaiming a priori that 100% of the genome is functional. Not one functional element would have been missed by using this procedure.
Interestingly, ENCODE, which is otherwise quite miserly in spelling out the exact function of its “functional” elements, provides putative functions for each of its 12 histone modifications. For example, according to ENCODE, the putative function of the H4K20me1 modification is “preference for 5’ end of genes.” This is akin to asserting that the function of the White House is to occupy the lot of land at the 1600 block of Pennsylvania Avenue in Washington, D.C.
In a miraculous feat of “next generation” science, the ENCODE authors were able to determine the frequencies of nonexistent derived alleles.
… a surprisingly large number of scientists have had their knickers in a twist over “junk DNA” ever since the term was coined by Susumu Ohno (1972).
In dissecting common objections to “junk DNA,” we identified several misconceptions, chief among them (1) a lack of knowledge of the original and correct sense of the term, (2) the belief that evolution can always get rid of nonfunctional DNA, and (3) the belief that “future potential” constitutes “a function.”
We urge biologists not be afraid of junk DNA. The only people that should be afraid are those claiming that natural processes are insufficient to explain life and that evolutionary theory should be supplemented or supplanted by an intelligent designer (e.g., Dembski
1998; Wells 2004). ENCODE’s take-home message that everything has a function implies purpose, and purpose is the only thing that evolution cannot provide. Needless to say, in light of our investigation of the ENCODE publication, it is safe to state that the news concerning the death of “junk DNA” have been greatly exaggerated.
ENCODE’s biggest scientific sin was not being satisfied with its role as data provider; it assumed the small-science role of interpreter of the data, thereby performing a kind of textual hermeneutics on a 3.5-billion-long DNA text. Unfortunately, ENCODE disregarded the rules of scientific interpretation and adopted a position common to many types of theological hermeneutics, whereby every letter in a text is assumed a priori to have a meaning.
So, what have we learned from the efforts of 442 researchers consuming 288 million dollars? According to Eric Lander, a Human Genome Project luminary, ENCODE is the “Google Maps of the human genome” (Durbin et al. 2010). We beg to differ, ENCODE is considerably worse than even Apple Maps.
We conclude that the ENCODE Consortium has, so far, failed to provide a compelling reason to abandon the prevailing understanding among evolutionary biologists according to which most of the human genome is devoid of function. The ENCODE results were
predicted by one of its lead authors to necessitate the rewriting of textbooks (Pennisi 2012). We agree, many textbooks dealing with marketing, mass-media hype, and public relations may well have to be rewritten.
I’ll just let this soup sandwich of an abstract speak for itself:
We find that the global relationships among species should be of circular phylogeny, which is quite different from common sense based on phylogenetic trees. A domain can be defined by a distinct phylogenetic circle, which is a global and stable characteristic of the living system. The mechanism in genome size evolution has been clarified; hence the main component questions on C-value enigma can be explained. We find the intrinsic relationship between genome size evolution and protein length evolution; that is the genome size and non-coding DNA ratio can be calculated based on protein length distributions.
(These are the same authors who brought us thisturd gem).