I’ve already mentioned the nonsensical paper “published” in (surprise, surprise) arXiv in which the authors claim that the origin of life occurred long before the origin of the Earth based on the application of Moore’s Law to DNA. I won’t go into all the reasons that this is silly — for that, you can see critiques by PZ Myers and Massimo Pigliucci. Suffice it to say that the data, the analysis, and the interpretation are all problematic.
Notably, the authors present this figure, which more or less sums up what is wrong with the entire paper.
As I saw this, I couldn’t help but feel like it reminded me of some other extrapolation I had seen years ago. And today it came to me — cooking a turkey by dropping it off a roof! Or rather, by converting potential energy into kinetic energy. Here’s the figure from the very funny article, which was published in the Journal of Irreproducible Results.
Here’s a short compilation I made for use in a recent presentation on ENCODE and the claim that 80% of the human genome is functional. These are quotes from ENCODE project leaders and the senior editor of Nature. It is not surprising that the story presented by the media was that ENCODE had destroyed the concept of “junk DNA”, given that this is what the researchers themselves said.
There has been lots of talk (including some in the media; see here and here and here) about the Graur et al. (2013) paper in GBE which was critical of ENCODE, much of it focusing on the tone of the paper. While the Graur et al. (2013) paper certainly doesn’t pull any punches in terms of ENCODE’s outrageous claims and incredible media hype, it also contains a number of important criticisms of the science underlying the project. Graur et al. (2013) were not the only ones to publish peer-reviewed critiques, and I expect that the list will continue to expand.
Here is a list of the papers that have appeared to date:
Say what you want about the tone of the Graur et al. (2013) paper in Genome Biology and Evolution, but it has people talking. Including Ewan Birney, the lead scientist of the ENCODE project and the primary spokesperson for ENCODE in the media fiasco describing the “death of junk DNA”. Most recently, Birney was interviewed by Quentin Cooper on the BBC Radio 4 program Material World, along with Oxford biologist Chris Ponting. You can listen to the show here.
I have also transcribed the parts in which Birney discusses the ENCODE findings and the flap around the claims of “80% function” of the human genome. Again, remember that Birney himself said several times in prior interviews that the ENCODE results undermine the idea of junk DNA and that the genome is jam packed with “switches” (see here, here, here, and here).
Quentin Cooper (host): Ok, well, Ewan, we’ll get on to what this paper says and doesn’t say, but can you just give us a quick precis of why ENCODE’s findings are in such contrast to a lot of conventional thinking and clearly a lot of current thinking about junk DNA?
Ewan Birney (Lead scientist, ENCODE): I don’t think the findings are in such stark contrast. It’s more about the interpretation of the words, in particular when the words get used in, um, out of context, out of the scientific paper context and propagated. So, what exactly do we mean by this word “junk”?.
Quentin Cooper: But are you sure it’s all about the context. Because I mean, one of the reasons why these findings attracted so much interest was because it didn’t seem to fit the conventional thinking. It can’t just be down to a bit of phraseology and saying actually confirms what we already knew, can it?
Ewan Birney: Ah, so, I don’t — It’s interesting to reflect back on this. For me, the big important thing of ENCODE is that we found that a lot of the genome had some kind of biochemical activity. And we do describe that as “biochemical function”, but that word “function” in the phrase “biochemical function”is the thing which gets confusing. If we use the phrase “biochemical activity”, that’s precisely what we did, we find that the different parts of the genome, [??] 80% have some specific biochemical event we can attach to it. I was often asked whether that 80% goes to 100%, and that’s what I believe it will do. So, in other words, that number is much more about the coverage of what we’ve assayed over the entire genome. In the paper, we say quite clearly that the majority of the genome is not under negative selection, and we say that most of the elements are not under pan-mammalian selection. So that’s negative selection we can detect between lots of different
mammals. [??} really interesting question about what is precisely going on in the human population, but that’s — you know, I’m much closer to the instincts of this kind of 10% to 20% sort of range about what is under, sort of what evolution cares about under selection.
Quentin Cooper: But this paper that’s appeared in the journal Genome Biology and Evolution, Dan Graur from the University of Texas, professor there, all these lines, do you think they’re reacting more to the press coverage around the story rather than what’s actually in your paper itself?
Ewan Birney: That’s my belief. And of course in some sense our responsibility to help that press coverage do the right thing and that’s what I worked very hard in September, in particular in the UK context where I’m based, to try and make that press coverage work. It is quite complicated, because you want to be excited about what you’re doing, but you don’t want people to get the wrong — draw the wrong interpretation of it. So, it’s not an easy job to do, but I believe that most of the heat in this debate is about the definitions of the words, and not the data or the interpretation of the data.
Chris Ponting (Oxford University): Ewan, my question to you is, how much of the genome do you think is vital for life?
Ewan Birney: Yeah, I don’t, I would do that on a, as you know I’ve blogged about this. You know, certainly everything that’s under negative selection in the human population, that evolution cares about right now in humans, that if it changes then the person has less reproductive fitness, that’s clearly vital for life. I think there is a chance of there being a small amount of additional things that we find interesting about differences between people that are not under selection. So, those may be phenotypes that are late-onset such as neurodegenerative diseases or things like that. So, I think there’s a little addition to that but those are the kind of boundary components to that.
Chris Ponting: So I think we can probably agree between us that between 10% and say 20% is vital for life.
Ewan Birney: I mean, I think we would agree with that. I think, you know, refining that percentage down is quite interesting. I think also the other components that we — biochemical events that we see in the genome, sort of, each one of them are equally likely to be part of that 10% to 20% that we’re looking for. It’s important to realize that it’s not the case that we can spot the 10% to 20% just by looking harder. Each of these different places in the genome that have some biochemical activity associated with it, when there’s some phenotype screen that’s directed there or some evolutionary screen that’s directed to that point, ENCODE can now say “Ah ha! Here is a biochemical thing that this piece of DNA looks like it could be doing”.
Quentin Cooper: Ewan, just briefly, what about another aspect of the criticism, the idea that the ENCODE project lacked anyone with any real knowledge of evolutionary biology?
Ewan Birney: Well, I’m sure we could have, um, ahhh, had, um, many other people join us. There were a number of us who have worked — I do know quite a bit about evolutionary biology. Whether everybody considers me to be an expert, I don’t, you know, that’s for other people to say. If you read the paper, and not the press reports, there is a lot detail spent on what is under different aspects of evolutionary biology, ah, selection. So, I think again, a lot of this is not about the paper, but is more about the words used to describe the paper.
Quentin Cooper: Finally though, Ewan, is there any of the criticism you do take on the chin, think well perhaps we did let the story get a bit away from us at times?
Ewan Birney: Well, hindsight is a fairly cruel thing. And one of the things which I regret about being in this situation, arguing about words in the press, is I just wish there was a way of us not talking about this. I think ENCODE already is being used by many, many different groups, in particular disease biology groups having their phenotypes screened against it, and other people worldwide. And the whole point of this for the data by the project to be used by many, many other groups. I’m really happy about that, um, and hindsight being such a cruel thing, makes me think about what I could have done to minimize this kind of rather heated debate.
I expect that we will be seeing several harsh critiques of ENCODE’s extraordinary claims about function in the human genome and the equally incredible mega-hype associated with the project. I know of at least one more that is forthcoming from a heavy-hitter in the field, but as a snarky smackdown, it will be very tough to beat the recent paper by Dan Graur and colleagues published in Genome Biology and Evolution. The paper is open-access, so go ahead and read it yourself here. Meantime, enjoy the following zingers:
ENCODE adopted a strong version of the causal role definition of function, according to which a functional element is a discrete genome segment that produces a protein or an RNA or displays a reproducible biochemical signature (for example, protein binding).
Oddly, ENCODE not only uses the wrong concept of functionality, it uses it wrongly and inconsistently.
…the ENCODE authors singled out transcription as a function, as if the passage of RNA polymerase through a DNA sequence is in some way more meaningful than other functions. But, what about DNA polymerase and DNA replication? Why make a big fuss about 74.7% of the genome that is transcribed, and yet ignore the fact that 100% of the genome takes part in a strikingly “reproducible biochemical signature”—it replicates!
Ward and Kellis (2012) confirmed that ~5% of the genome is interspecifically conserved, and by using intraspecific variation, found evidence of lineage-specific constraint suggesting that an additional 4% of the human genome is under selection (i.e., functional), bringing the total fraction of the genome that is certain to be functional to approximately 9%. The journal Science used this value to proclaim “No More Junk DNA” Hurtley 2012), thus, in effect rounding up 9% to 100%.
ENCODE chose to bias its results by excessively favoring sensitivity over specificity. In fact, they could have saved millions of dollars and many thousands of research hours by ignoring selectivity altogether, and proclaiming a priori that 100% of the genome is functional. Not one functional element would have been missed by using this procedure.
Interestingly, ENCODE, which is otherwise quite miserly in spelling out the exact function of its “functional” elements, provides putative functions for each of its 12 histone modifications. For example, according to ENCODE, the putative function of the H4K20me1 modification is “preference for 5’ end of genes.” This is akin to asserting that the function of the White House is to occupy the lot of land at the 1600 block of Pennsylvania Avenue in Washington, D.C.
In a miraculous feat of “next generation” science, the ENCODE authors were able to determine the frequencies of nonexistent derived alleles.
… a surprisingly large number of scientists have had their knickers in a twist over “junk DNA” ever since the term was coined by Susumu Ohno (1972).
In dissecting common objections to “junk DNA,” we identified several misconceptions, chief among them (1) a lack of knowledge of the original and correct sense of the term, (2) the belief that evolution can always get rid of nonfunctional DNA, and (3) the belief that “future potential” constitutes “a function.”
We urge biologists not be afraid of junk DNA. The only people that should be afraid are those claiming that natural processes are insufficient to explain life and that evolutionary theory should be supplemented or supplanted by an intelligent designer (e.g., Dembski
1998; Wells 2004). ENCODE’s take-home message that everything has a function implies purpose, and purpose is the only thing that evolution cannot provide. Needless to say, in light of our investigation of the ENCODE publication, it is safe to state that the news concerning the death of “junk DNA” have been greatly exaggerated.
ENCODE’s biggest scientific sin was not being satisfied with its role as data provider; it assumed the small-science role of interpreter of the data, thereby performing a kind of textual hermeneutics on a 3.5-billion-long DNA text. Unfortunately, ENCODE disregarded the rules of scientific interpretation and adopted a position common to many types of theological hermeneutics, whereby every letter in a text is assumed a priori to have a meaning.
So, what have we learned from the efforts of 442 researchers consuming 288 million dollars? According to Eric Lander, a Human Genome Project luminary, ENCODE is the “Google Maps of the human genome” (Durbin et al. 2010). We beg to differ, ENCODE is considerably worse than even Apple Maps.
We conclude that the ENCODE Consortium has, so far, failed to provide a compelling reason to abandon the prevailing understanding among evolutionary biologists according to which most of the human genome is devoid of function. The ENCODE results were
predicted by one of its lead authors to necessitate the rewriting of textbooks (Pennisi 2012). We agree, many textbooks dealing with marketing, mass-media hype, and public relations may well have to be rewritten.
I’ll just let this soup sandwich of an abstract speak for itself:
We find that the global relationships among species should be of circular phylogeny, which is quite different from common sense based on phylogenetic trees. A domain can be defined by a distinct phylogenetic circle, which is a global and stable characteristic of the living system. The mechanism in genome size evolution has been clarified; hence the main component questions on C-value enigma can be explained. We find the intrinsic relationship between genome size evolution and protein length evolution; that is the genome size and non-coding DNA ratio can be calculated based on protein length distributions.
(These are the same authors who brought us thisturd gem).