Were introns immediately dismissed as useless junk?

In a recent paper, Morris and Mattick (2014) claim that:

“The discovery of introns in 1977  was perhaps the biggest surprise in the history of molecular biology, as no one expected that the genes of higher organisms would be mosaics of coding and non-coding sequences, all of which are transcribed. However, the prevailing concept of the flow of genetic information was not overly disturbed, as the removal of the intervening sequences (that is, introns) and the reconstruction of a mature mRNA by splicing preserved the conceptual status quo; that is, genes still made proteins. In parallel, it was assumed that the excised intronic RNAs were simply degraded, although the technology of the time was too primitive to confirm this. In any case, introns were immediately and universally dismissed as genomic debris, and their presence was rationalized as evolutionary remnants involved in the prebiotic modular assembly of protein-coding RNAs that have remained (and been expanded by transposition) in complex organisms. This notion was consistent, at least superficially, with the implication of the C-value enigma that eukaryotes contained varying amounts of DNA ‘baggage’. It is also in agreement with the accompanying conclusion that retrotransposon sequences are mainly ‘selfish’, parasitic DNA.”

Morris, K.V. and J.S. Mattick (2014). The rise of regulatory RNA. Nature Reviews Genetics, in press.

So, let’s check the actual literature of the time and see if their version is correct:

“Perhaps the most surprising discovery in the initial studies of eukaryotic gene structure has been that many genes contain interruptions in the coding sequences. The origin and the function of these intervening sequences (IVS or introns) are not yet well understood but are the subject of intense investigation.”

Wallace, R.B., P.F. Johnson, S. Tanaka, M. Schöld, K. Itakura, and J. Abelson. 1980. Directed deletion of a yeast transfer RNA intervening sequence. Science 209: 1396-1400.

“Since the discovery that many eukaryotic genes are discontinuous, a number of studies have been directed towards identifying a function for intervening sequences (IVSs).”

Johnson, P.F. and J. Abelson. 1983. The yeast tRNA(tyr) gene intron is essential for correct modification of its tRNA product. Nature 302: 681-687.

“It is possible that the relationship between the location of the splice junction in the gene at the surface of the protein confers a biological advantage and hence is a result of natural selection. Introns and their associated splicing systems could be exploited in many ways during the evolution of a protein.”

Craik, C.S., S. Sprang, R. Fletterick, and W.J. Rutter. 1982. Intron-exon splice junctions map at protein surfaces. Nature 299: 180-182.

“We conclude from this experiment that the intron in the yeast actin gene does not have an observable function. It is possible that the role of the intron is too subtle to be observed in laboratory conditions of growth or that the intron, while having evolutionary significance, has no present role. To conclude that this is true for all yeast genes that contain introns would of course be premature, but there exist strains in which mitochondrial introns have been removed with no observable effect.”

Ng, R., H. Domdey, G. Larson, J.J. Rossi, and J. Abelson. 1985. A test for intron function in the yeast actin gene. Nature 314: 183-184.

“Solutions to problems of how introns are dealt with by cells do not address the question of why introns are there at all, questions about intron function. Some introns in some genes perform clearly regulatory roles, since splicing factors specific to the tissue or developmental stage decide when and where splicing should occur (Breitbart et al. 1985). In addition, some introns in some genes contain enhancers or modulators of the expression of those genes (Slater et al. 1985). However, the great majority of introns in protein-coding genes have no such “functions.” Direct experimental as well as indirect comparative data show that most introns can be removed from genes without phenotypic effect (Blake 1985). Thus, in terms of beneficial effects on the fitnesses of organisms, we almost certainly cannot account for the presence of the majority of individual introns, nor for the propensity to have introns at all, even though introns may on the average represent as much as 90% of the length of a gene and perhaps as much as half of the total DNA in some complex eukaryotes such as humans.”

Doolittle, W.F. 1987. The origin and function of intervening sequences in DNA: a review. American Naturalist 130: 915-928.

“Ever since the discovery of split genes, there has been a debate about why they are split. This can be resolved into three separate problems: the origin of the introns that split the genes (separating exons from each other), the role of introns in evolution, and their present function, if any.”

Rogers, J. 1985. Exon shuffling and intron insertion in serine protease genes. Nature 315: 458-459.

“These conserved sequences, especially those found in the introns, suggest a role for internal sequences in the regulation of β-actin gene expression.”

Ng, S.-Y., P. Gunning, R. Eddy, P. Ponte, J. Leavitt, T. Shows, and L. Kedes. 1985. Evolution of the functional human β-actin gene and its multi-pseudogene family: conservation of noncoding regions and chromosomal dispersion of pseudogenes. Molecular and Cellular Biology 5: 2720-2732.

“The advantage to the organism to remove intron 1 last is unclear but could point to some as yet undetermined function for this intron. In support of this, we have found that a DNA probe derived from intron 1 hybridizes to a single fragment in a Southern blot of total mouse genomic DNA indicating that the sequences in this intron may be conserved, whereas a DNA probe derived from intron 2 does not hybridize.”

Wells, D., D. Hoffman, and L. Kedes. 1987. Unusual structure, evolutionary conservation of non-coding sequences and numerous pseudogenes characterize the human H3.3 histone multigene family. Nucleic Acids Research 15: 2871-2889.

1993 Nobel Prize in Physiology or Medicine
to Richard J. Roberts and Phillip A. Sharp
For their discovery of split genes

“Roberts’ and Sharp’s discovery has changed our view on how genes in higher organisms develop during evolution. The discovery also led to the prediction of a new genetic process, namely that of splicing, which is essential for expressing the genetic information. The discovery of split genes has been of fundamental importance for today’s basic research in biology, as well as for more medically oriented research concerning the development of cancer and other diseases.”

“As a consequence of the discovery that genes are often split, it seems likely that higher organisms in addition to undergoing mutations may utilize another mechanism to speed up evolution: rearrangement (or shuffling) of gene segments to new functional units. This can take place in the germ cells through crossing-over during pairing of chromosomes. This hypothesis seems even more attractive following the discovery that individual exons in several cases correspond to building modules in proteins, so-called domains, to which specific functions can be attributed. An exon in the genome would thus correspond to a particular subfunction in the protein and the rearrangement of exons could result in a new combination of subfunctions in a protein. This kind of process could drive evolution considerably by rearranging modules with specific functions.”


Susumu Ohno did not coin the term “junk DNA” — a must-read by Dan Graur.


Anyone interested in the topic of “junk DNA” should go and read this fine piece of detective work by Dan Graur immediately!



Some big news about Evolution: Education and Outreach.

lAs many of my academic friends know, I started out as a member of the Editorial Board when the journal was launched, and then became an Associate Editor as well as a guest Editor for a special issue on eye evolution. When the journal ceased to be open-access, I resigned from the Editorial Board in protest, but have since returned now that free access has been restored. Since then, I have been Senior Handling Editor (which is more or less the same as Associate Editor in practice).

Well, after some discussion with Niles Eldredge, who has been Co-Editor-in-Chief along with his son Greg since the journal’s inception, it looks like my position with the journal is changing again.

You see, Niles feels that he’s ready to cut back on his involvement now that he is of retirement age (and, I must say, well deserving of a rest!).

And so, effective fairly soon, I will be the new Co-Editor-in-Chief of the journal, with a special emphasis on the science content (Greg will continue with his focus on educational topics).

Fellow biologist friends: be forewarned that I will be pestering you soon to contribute a manuscript to the journal…

Genome reduction in bladderworts vs. leg loss in snakes.

In one sense, I am happy that there is enough interest in the concept of “junk DNA” (and by extension, my area of research in genome size evolution) that the subject gets regular media attention. A few months ago, it was all about the ENCODE project and its “finding” of “function” for 80% of the human genome. This week, it’s a story that has the exact opposite message: that large amounts of so-called “junk DNA” can be deleted without apparent consequence. This most recent story was prompted by the publication of the genome sequence of the carnivorous plant known as the floating bladderwort. This plant is of interest because it has a very small genome that is nearly devoid of transposable elements and other non-coding DNA, while also containing more protein-coding genes than the human genome and exhibiting signs of past genome duplication events. We’ve known that the genome was small for several years, but having the genome sequence provides some important insights into what a genome this size contains, and (most interestingly) what it doesn’t.

In typical style, Ed Yong has written up a very nice summary of the paper and the potential implications for the junk DNA debate. Following the lead of the original paper and the associated press release, many media reports similarly took the “this plant can get rid of junk DNA, so maybe it isn’t functional after all” line (a few examples: here, here, and here).

I was quoted in Ed Yong’s article as follows:

“The study further challenges simplistic accounts of genome biology that assume functions for most or all DNA sequences, without addressing the enormous variability in genome size among plants and animals,” says T. Ryan Gregory, who studies the evolution of genome sizes at the University of Guelph.

In 2007, Gregory coined the “Onion Test” to challenge anyone who thinks that non-coding DNA isn’t junk. If that DNA is important, why is it that the onion needs so much more of it than a human, or even other closely related plants? “The Onion Test could just as easily have been called the Bladderwort Test,” he says. “If non-coding DNA is vital for gene regulation or some similar function, then how can a plant such as the bladderwort get by with so little of it?”

For me, the logic of the authors of the paper is straightforward. Here we have a complex plant with a lot of genes but very little non-coding DNA, and this calls into question the idea that you need a lot of non-coding DNA to regulate genes in a complex organism. Jonathan Eisen, on the other hand, has objected in his usual snarky way, awarding MSNBC and the authors of the bladderwort genome paper one of his “Twisted Tree of Life Awards”. As he summarizes the claim,


In the comments thread on his blog post, he expanded on what he sees as the problem with this argument:

The fact that a plant can function without much non coding DNA really says nothing about the function or role of such non coding DNA in other species. All it says it that such non coding DNA is not absolutely essential for a plant to function. But this plant lineage could have evolved new means of regulation or other functions that were found in the non coding DNA of its ancestors. Or, in other words, a plant with a small genome says as much about non coding DNA in other plants and in humans as a fish with no eyes says about the role of eyes in vertebrates that see. Or should I try another? This says as much about the role of non coding DNA in other plants as the existence of snakes say about the role of legs. And so on.

And – there is no doubt that eyeless fish and limbless reptiles tell us an enormous amount. They tell us, for example, that eyes are non absolutely necessary for fish to function. And their adaptations to being eyeless tell us all sorts of great things about senses. But the existence of eyeless fish does not tell us that eyes are useless in fish.

Here is how I see the logic:

Most plants have junk DNA
One lineage doesn’t and the plants seem pretty OK.
Therefore junk DNA is useless

Most reptiles have legs
One lineage doesn’t have legs and these seem pretty OK.
Therefore legs are useless.

Isn’t that the logic here?

No, that isn’t the logic, and the legless snakes or eyeless cave fishes analogy is flawed. Why?

1. We know that legs and eyes are functional, and we know what they are functional for (walking and seeing, respectively). By contrast, we do not have strong evidence that non-coding DNA is functional or what it may be functional for. Worse, the very existence of so much non-coding DNA itself is taken as “evidence” that it must be doing something. Therefore, the observation of a plant that lacks a substantial amount of non-coding DNA but gets by just fine suggests that this kind of DNA isn’t strictly necessary in order to make a complex plant.

2. If most of the non-coding DNA in a larger genome does serve an important regulatory function, then it means this plant with a tiny genome must have evolved a totally different system for regulating its genes. This strikes me as a rather large assumption — and in any case, it’s one for which we have no evidence. As such, I would argue that it is at least as parsimonious to take this small genome as evidence that non-coding DNA in general does not serve a key regulatory function for the most part.

3. When snakes lost their legs or cave fishes lost their eyes, they also lost the specific ability that legs or eyes provided. Legless snakes can’t walk, because the function of legs is walking. Eyeless fishes can’t see, because the function of eyes is seeing. The proposed function for non-coding DNA is gene regulation. Unlike the snake or fish example, the bladderwort has lost most of its non-coding DNA but it can still regulate all of its genes just fine.

I think Jonathan raises a valid point about the dangers of overzealous extrapolation, but I think his criticism of the authors (much less its tone) in this case is unwarranted.

Moore’s Law, the origin of life, and dropping turkeys off a building.

I’ve already mentioned the nonsensical paper “published” in (surprise, surprise) arXiv in which the authors claim that the origin of life occurred long before the origin of the Earth based on the application of Moore’s Law to DNA. I won’t go into all the reasons that this is silly — for that, you can see critiques by PZ Myers and Massimo Pigliucci. Suffice it to say that the data, the analysis, and the interpretation are all problematic.

Notably, the authors present this figure, which more or less sums up what is wrong with the entire paper.

07-05-2013 3-10-28 PM

As I saw this, I couldn’t help but feel like it reminded me of some other extrapolation I had seen years ago. And today it came to me — cooking a turkey by dropping it off a roof! Or rather, by converting potential energy into kinetic energy. Here’s the figure from the very funny article, which was published in the Journal of Irreproducible Results.

07-05-2013 3-09-04 PM

07-05-2013 3-26-18 PM

ENCODE quote compilation.

Here’s a short compilation I made for use in a recent presentation on ENCODE and the claim that 80% of the human genome is functional. These are quotes from ENCODE project leaders and the senior editor of Nature. It is not surprising that the story presented by the media was that ENCODE had destroyed the concept of “junk DNA”, given that this is what the researchers themselves said.






Critiques of ENCODE in peer-reviewed journals.

There has been lots of talk (including some in the media; see here and here and here) about the Graur et al. (2013) paper in GBE which was critical of ENCODE, much of it focusing on the tone of the paper. While the Graur et al. (2013) paper certainly doesn’t pull any punches in terms of ENCODE’s outrageous claims and incredible media hype, it also contains a number of important criticisms of the science underlying the project. Graur et al. (2013) were not the only ones to publish peer-reviewed critiques, and I expect that the list will continue to expand.

Here is a list of the papers that have appeared to date:

Doolittle, W.F. (2013). Is junk DNA bunk? A critique of ENCODE. Proceedings of the National Academy of Sciences USA 110: 5294–5300.

Eddy, S.R. (2012). The C-value paradox, junk DNA and ENCODE. Current Biology 22: R898–R899.

Eddy, S.R. (2013). The ENCODE project: Missteps overshadowing a success. Current Biology 23: R259–R261.

Graur, D., Y. Zheng, N. Price, R.B.R. Azevedo, R.A. Zufall, and E. Elhaik. (2013). On the immortality of television sets: “Function” in the human genome according to the evolution-free gospel of ENCODE. Genome Biology and Evolution 5: 578-590.

Niu, D.-K. and L. Jiang. (2013). Can ENCODE tell us how much junk DNA we carry in our genome? Biochemical and Biophysical Research Communications 430: 1340-1343.

Hurst, L.D. (2013). Open questions: A logic (or lack thereof) of genome organization. BMC Biology 11: 58.

Some of these authors have written blogs, in some cases to go after each other (stay tuned — there may be more debate coming).

Sean Eddy: ENCODE says what?

Dan Graur: Sean Eddy knows on which side the bread is buttered: Better be on the side of “good-science funding” than on the side of “good science”.

Dan Graur: Laurence Hurst’s error: the inability to distinguish between a stupid animal, a dead animal, and the elephant in the room

My new favourite student complaint.

An actual complaint from one of the students in my evolution course:

“He insists on using big, scientific words which are just annoying.”


BBC interview with Ewan Birney.

Say what you want about the tone of the Graur et al. (2013) paper in Genome Biology and Evolution, but it has people talking. Including Ewan Birney, the lead scientist of the ENCODE project and the primary spokesperson for ENCODE in the media fiasco describing the “death of junk DNA”. Most recently, Birney was interviewed by Quentin Cooper on the BBC Radio 4 program Material World, along with Oxford biologist Chris Ponting. You can listen to the show here.

I have also transcribed the parts in which Birney discusses the ENCODE findings and the flap around the claims of “80% function” of the human genome. Again, remember that Birney himself said several times in prior interviews that the ENCODE results undermine the idea of junk DNA and that the genome is jam packed with “switches” (see here, here, here, and here).

Quentin Cooper (host): Ok, well, Ewan, we’ll get on to what this paper says and doesn’t say, but can you just give us a quick precis of why ENCODE’s findings are in such contrast to a lot of conventional thinking and clearly a lot of current thinking about junk DNA?

Ewan Birney (Lead scientist, ENCODE): I don’t think the findings are in such stark contrast. It’s more about the interpretation of the words, in particular when the words get used in, um, out of context, out of the scientific paper context and propagated. So, what exactly do we mean by this word “junk”?.

Quentin Cooper: But are you sure it’s all about the context. Because I mean, one of the reasons why these findings attracted so much interest was because it didn’t seem to fit the conventional thinking. It can’t just be down to a bit of phraseology and saying actually confirms what we already knew, can it?

Ewan Birney: Ah, so, I don’t — It’s interesting to reflect back on this. For me, the big important thing of ENCODE is that we found that a lot of the genome had some kind of biochemical activity. And we do describe that as “biochemical function”, but that word “function” in the phrase “biochemical function”is the thing which gets confusing. If we use the phrase “biochemical activity”, that’s precisely what we did, we find that the different parts of the genome, [??] 80% have some specific biochemical event we can attach to it. I was often asked whether that 80% goes to 100%, and that’s what I believe it will do. So, in other words, that number is much more about the coverage of what we’ve assayed over the entire genome. In the paper, we say quite clearly that the majority of the genome is not under negative selection, and we say that most of the elements are not under pan-mammalian selection. So that’s negative selection we can detect between lots of different
mammals. [??} really interesting question about what is precisely going on in the human population, but that’s — you know, I’m much closer to the instincts of this kind of 10% to 20% sort of range about what is under, sort of what evolution cares about under selection.

Quentin Cooper: But this paper that’s appeared in the journal Genome Biology and Evolution, Dan Graur from the University of Texas, professor there, all these lines, do you think they’re reacting more to the press coverage around the story rather than what’s actually in your paper itself?

Ewan Birney: That’s my belief. And of course in some sense our responsibility to help that press coverage do the right thing and that’s what I worked very hard in September, in particular in the UK context where I’m based, to try and make that press coverage work. It is quite complicated, because you want to be excited about what you’re doing, but you don’t want people to get the wrong — draw the wrong interpretation of it. So, it’s not an easy job to do, but I believe that most of the heat in this debate is about the definitions of the words, and not the data or the interpretation of the data.

Chris Ponting (Oxford University): Ewan, my question to you is, how much of the genome do you think is vital for life?

Ewan Birney: Yeah, I don’t, I would do that on a, as you know I’ve blogged about this. You know, certainly everything that’s under negative selection in the human population, that evolution cares about right now in humans, that if it changes then the person has less reproductive fitness, that’s clearly vital for life. I think there is a chance of there being a small amount of additional things that we find interesting about differences between people that are not under selection. So, those may be phenotypes that are late-onset such as neurodegenerative diseases or things like that. So, I think there’s a little addition to that but those are the kind of boundary components to that.

Chris Ponting: So I think we can probably agree between us that between 10% and say 20% is vital for life.

Ewan Birney: I mean, I think we would agree with that. I think, you know, refining that percentage down is quite interesting. I think also the other components that we — biochemical events that we see in the genome, sort of, each one of them are equally likely to be part of that 10% to 20% that we’re looking for. It’s important to realize that it’s not the case that we can spot the 10% to 20% just by looking harder. Each of these different places in the genome that have some biochemical activity associated with it, when there’s some phenotype screen that’s directed there or some evolutionary screen that’s directed to that point, ENCODE can now say “Ah ha! Here is a biochemical thing that this piece of DNA looks like it could be doing”.

Quentin Cooper: Ewan, just briefly, what about another aspect of the criticism, the idea that the ENCODE project lacked anyone with any real knowledge of evolutionary biology?

Ewan Birney: Well, I’m sure we could have, um, ahhh, had, um, many other people join us. There were a number of us who have worked — I do know quite a bit about evolutionary biology. Whether everybody considers me to be an expert, I don’t, you know, that’s for other people to say. If you read the paper, and not the press reports, there is a lot detail spent on what is under different aspects of evolutionary biology, ah, selection. So, I think again, a lot of this is not about the paper, but is more about the words used to describe the paper.

Quentin Cooper: Finally though, Ewan, is there any of the criticism you do take on the chin, think well perhaps we did let the story get a bit away from us at times?

Ewan Birney: Well, hindsight is a fairly cruel thing. And one of the things which I regret about being in this situation, arguing about words in the press, is I just wish there was a way of us not talking about this. I think ENCODE already is being used by many, many different groups, in particular disease biology groups having their phenotypes screened against it, and other people worldwide. And the whole point of this for the data by the project to be used by many, many other groups. I’m really happy about that, um, and hindsight being such a cruel thing, makes me think about what I could have done to minimize this kind of rather heated debate.

Graur et al. to ENCODE: Zing!

I expect that we will be seeing several harsh critiques of ENCODE’s extraordinary claims about function in the human genome and the equally incredible mega-hype associated with the project. I know of at least one more that is forthcoming from a heavy-hitter in the field, but as a snarky smackdown, it will be very tough to beat the recent paper by Dan Graur and colleagues published in Genome Biology and Evolution. The paper is open-access, so go ahead and read it yourself here. Meantime, enjoy the following zingers:

ENCODE adopted a strong version of the causal role definition of function, according to which a functional element is a discrete genome segment that produces a protein or an RNA or displays a reproducible biochemical signature (for example, protein binding).
Oddly, ENCODE not only uses the wrong concept of functionality, it uses it wrongly and inconsistently.

…the ENCODE authors singled out transcription as a function, as if the passage of RNA polymerase through a DNA sequence is in some way more meaningful than other functions. But, what about DNA polymerase and DNA replication? Why make a big fuss about 74.7% of the genome that is transcribed, and yet ignore the fact that 100% of the genome takes part in a strikingly “reproducible biochemical signature”—it replicates!

Ward and Kellis (2012) confirmed that ~5% of the genome is interspecifically conserved, and by using intraspecific variation, found evidence of lineage-specific constraint suggesting that an additional 4% of the human genome is under selection (i.e., functional), bringing the total fraction of the genome that is certain to be functional to approximately 9%. The journal Science used this value to proclaim “No More Junk DNA” Hurtley 2012), thus, in effect rounding up 9% to 100%.

ENCODE chose to bias its results by excessively favoring sensitivity over specificity. In fact, they could have saved millions of dollars and many thousands of research hours by ignoring selectivity altogether, and proclaiming a priori that 100% of the genome is functional. Not one functional element would have been missed by using this procedure.

Interestingly, ENCODE, which is otherwise quite miserly in spelling out the exact function of its “functional” elements, provides putative functions for each of its 12 histone modifications. For example, according to ENCODE, the putative function of the H4K20me1 modification is “preference for 5’ end of genes.” This is akin to asserting that the function of the White House is to occupy the lot of land at the 1600 block of Pennsylvania Avenue in Washington, D.C.

In a miraculous feat of “next generation” science, the ENCODE authors were able to determine the frequencies of nonexistent derived alleles.

… a surprisingly large number of scientists have had their knickers in a twist over “junk DNA” ever since the term was coined by Susumu Ohno (1972).

In dissecting common objections to “junk DNA,” we identified several misconceptions, chief among them (1) a lack of knowledge of the original and correct sense of the term, (2) the belief that evolution can always get rid of nonfunctional DNA, and (3) the belief that “future potential” constitutes “a function.”

We urge biologists not be afraid of junk DNA. The only people that should be afraid are those claiming that natural processes are insufficient to explain life and that evolutionary theory should be supplemented or supplanted by an intelligent designer (e.g., Dembski
1998; Wells 2004). ENCODE’s take-home message that everything has a function implies purpose, and purpose is the only thing that evolution cannot provide. Needless to say, in light of our investigation of the ENCODE publication, it is safe to state that the news concerning the death of “junk DNA” have been greatly exaggerated.

ENCODE’s biggest scientific sin was not being satisfied with its role as data provider; it assumed the small-science role of interpreter of the data, thereby performing a kind of textual hermeneutics on a 3.5-billion-long DNA text. Unfortunately, ENCODE disregarded the rules of scientific interpretation and adopted a position common to many types of theological hermeneutics, whereby every letter in a text is assumed a priori to have a meaning.

So, what have we learned from the efforts of 442 researchers consuming 288 million dollars? According to Eric Lander, a Human Genome Project luminary, ENCODE is the “Google Maps of the human genome” (Durbin et al. 2010). We beg to differ, ENCODE is considerably worse than even Apple Maps.

We conclude that the ENCODE Consortium has, so far, failed to provide a compelling reason to abandon the prevailing understanding among evolutionary biologists according to which most of the human genome is devoid of function. The ENCODE results were
predicted by one of its lead authors to necessitate the rewriting of textbooks (Pennisi 2012). We agree, many textbooks dealing with marketing, mass-media hype, and public relations may well have to be rewritten.

UPDATE: Listen to a discussion of this topic by Dan Graur and Michael Eisen