A slightly different response to today’s ENCODE hype.

If you read many of the media reports that came out today, the ENCODE project has demonstrated that 80% of the DNA in our genome has a biological function. This runs counter to traditional views of the genome, the story goes, because most of the genome had been dismissed as useless junk. I have blogged about why this cliché is historically inaccurate so many times that I just can’t bring myself to rehash it again right now. Instead, I’ll just direct you to the Junk DNA: Quotes of Interest series and you can see for yourself what was written in the scientific literature.

Some reports have been better than others. New Scientist ran a story today that presents a rather balanced treatment of the debate regarding how much of the genome is functional. (It so happens that I was quoted in that story, so that’s a bonus!). Ed Yong — one of the most reliable science writers around — wrote about it on his blog Not Exactly Rocket Science as well. Larry Moran didn’t like it, but to be fair I think Ed was trying to report what the ENCODE authors were claiming. No doubt he’s open to further discussion on the topic — in fact, he has invited comments from the other side. Maybe we’ll see a second post in the coming days.

The claim that “lots of the genome isn’t junk after all!” is not new — people have been using this straw man for nearly 20 years. What’s novel is that the ENCODE authors are claiming that there is now evidence that 80% of the genome shows signs function, or at least of “specific biological activity”. Many people are not convinced by this, me among them. I am especially unimpressed by this figure when I read the ENCODE project lead’s own words on the subject of “function” and the 80% figure.

Here’s Ewan Birney:

Q. Hmmm. Let’s move onto the science. I don’t buy that 80% of the genome is functional.
A. It’s clear that 80% of the genome has a specific biochemical activity – whatever that might be. This question hinges on the word “functional” so let’s try to tackle this first. Like many English language words, “functional” is a very useful but context-dependent word. Does a “functional element” in the genome mean something that changes a biochemical property of the cell (i.e., if the sequence was not here, the biochemistry would be different) or is it something that changes a phenotypically observable trait that affects the whole organism? At their limits (considering all the biochemical activities being a phenotype), these two definitions merge. Having spent a long time thinking about and discussing this, not a single definition of “functional” works for all conversations. We have to be precise about the context. Pragmatically, in ENCODE we define our criteria as “specific biochemical activity” – for example, an assay that identifies a series of bases. This is not the entire genome (so, for example, things like “having a phosphodiester bond” would not qualify). We then subset this into different classes of assay; in decreasing order of coverage these are: RNA, “broad” histone modifications, “narrow” histone modifications, DNaseI hypersensitive sites, Transcription Factor ChIP-seq peaks, DNaseI Footprints, Transcription Factor bound motifs, and finally Exons.

And again:

Q. Ok, fair enough. But are you most comfortable with the 10% to 20% figure for the hard-core functional bases? Why emphasize the 80% figure in the abstract and press release?
A. (Sigh.) Indeed. Originally I pushed for using an “80% overall” figure and a “20% conservative floor” figure, since the 20% was extrapolated from the sampling. But putting two percentage-based numbers in the same breath/paragraph is asking a lot of your listener/reader – they need to understand why there is such a big difference between the two numbers, and that takes perhaps more explaining than most people have the patience for. We had to decide on a percentage, because that is easier to visualize, and we choose 80% because (a) it is inclusive of all the ENCODE experiments (and we did not want to leave any of the sub-projects out) and (b) 80% best coveys the difference between a genome made mostly of dead wood and one that is alive with activity. We refer also to “4 million switches”, and that represents the bound motifs and footprints.

We use the bigger number because it brings home the impact of this work to a much wider audience. But we are in fact using an accurate, well-defined figure when we say that 80% of the genome has specific biological activity.

So, “functional” is a pretty big stretch here, and 80% rather than 20% was used because it generates more interest. Not surprisingly, this has irritated many biologists and thrilled anti-evolutionists.

But here’s my slightly different take on the kerfuffle, and why people who deny the existence of non-functional DNA have little reason to rejoice. Consider the following:

1) Even after 5 years, $185 million, and a massive study by hundreds of researchers, there still is only evidence of function for 80% of the human genome under the most extremely generous interpretation. That leaves 20% without any signs of function whatsoever. That’s more than 600 million base pairs, or about 200 million more than the entire pufferfish genome.

That said, people like Ewan Birney and John Mattick think the figure will actually go to 100% functional once additional cell types are analyzed. Here’s a quote from Ed Yong’s piece:

And what’s in the remaining 20 percent? Possibly not junk either, according to Ewan Birney, the project’s Lead Analysis Coordinator and self-described “cat-herder-in-chief”. He explains that ENCODE only (!) looked at 147 types of cells, and the human body has a few thousand. A given part of the genome might control a gene in one cell type, but not others. If every cell is included, functions may emerge for the phantom proportion. “It’s likely that 80 percent will go to 100 percent,” says Birney. “We don’t really have any large chunks of redundant DNA. This metaphor of junk isn’t that useful.”

2) To get that 80% figure, you have to have a very loose definition of “function” indeed. Actual evidence (which itself may not convince many experts) suggests 20% is functional in the sense of, well, having a biological function. The 80% value refers only to “specific biological activity”. Some comments from the interwebs sum up the critique of this criterion rather nicely:

Michael Eisen: “Measurable biochemical activity is a meaningless measure of functional significance.”

Leonid Kruglyak: “80% includes definitions of “activity” barely more interesting than “replicated” (e.g. transcribed).”

Sandwalk reader named “Argon“: “Basically his trigger for ‘functionality’ being ‘specific biochemical activity’ sets a pretty low bar. It’s about the lowest set-point I think you can have short of ‘having a sequence that can be digested with a DNAse’.”

Also, I haven’t read the primary papers in detail yet, but the immediate question that comes to mind is how one distinguishes “specific biological activity” that is functional for the organism from “specific biological activity” of parasitic transposable elements or their remnants. Simply having a site to which DNA can bind or being transcribed into RNA do not seem like very good evidence of an important biological role to me.

3) The onion test. Maybe 80% of the human genome is “functional”, even in a biologically meaningful sense of that word. Even so, we’d still be left with the question of why onions need so much more non-coding DNA than humans, or how pufferfishes can get by just fine with only 1/10 as much.

4) Common sense. Probably 2/3 of the human genome is made up of transposable elements and the defunct remains thereof. Some of these elements exist in millions of copies in the genome, and many are known to cause disease by their insertion activity. Some are undoubtedly functional, but it is quite a stretch to suggest that millions of these elements are needed to regulate our 20,000 genes (but not the 30,000 genes of a pufferfish). As Carl Sagan said, “extraordinary claims require extraordinary evidence”, and so far we simply do not have it when it comes to claiming that the majority of the elements in the genome have a biological function.

So, even the most rigorous efforts to find function for non-coding DNA in the human genome have come up with a figure of 80% at best, and only when they use a very flexible definition of “function”. 20% remains a more realistic number, and that would leave a heck of a lot of non-functional DNA in the human genome. If anyone should be pleased by these results, it’s those who maintain that a sizeable portion of the human genome is without a biological function at the level of the organism.

Also, there’s this:

These considerations suggest that up to 20% of the genome is actively used and the remaining 80+% is junk. But being junk doesn’t mean it is entirely useless. Common sense suggests that anything that is completely useless would be discarded. There are several possible functions for junk DNA.

That was written by D.E. Comings in 1972, in the very first detailed discussion of “junk DNA”.

15 comments to A slightly different response to today’s ENCODE hype.

  • Steven Sullivan

    “What’s novel is that the ENCODE authors are claiming that there is no evidence that 80% of the genome shows signs function, or at least of “specific biological activity”.”

    This sentence looks like it got mangled a bit in the editing process….


  • Jorge

    “But being junk doesn’t mean it is entirely useless. Common sense suggests that anything that is completely useless would be discarded. There are several possible functions for junk DNA.”
    To me this sounds like double-speak incredulity from someone refusing to accept the obvious.
    If it has function(s) then it’s not “junk”.   The quote you cite does nothing but support that.  
    Give it up – the “junk DNA” myth is all but defunct except, of course, for the Evo-Faithful diehards.  Nothing will ever get them off of Evo-top-dead-center … where would they go from there?


  • Bobo

    It’s always sad to see when scientists can’t admit that they’re wrong.  Most of the genome has a function, and most of what you’ve believed (and published) for most of your scientific career has now proved to be wrong.  The only things that are junk here are your publication record and your belief system.
    Man up.


  • And how’s your publication record, “Bobo”?


  • John Harshman

    It seems to me that only YECs have any real stake in the “no junk DNA” hypothesis. If the biota was designed by YHWH only 6000 years ago, it clearly would not have contained anything useless at that point, and would not have had the time to accumulate much junk. I don’t see how even OECs, much less any milder IDiots, would have any prediction at all, at least not related to their design hypotheses, such as they are.

    So why are the non-YEC IDiots so excited about all this? 


  • G. Spring

    @John Harshman,

    I doubt there really are milder ID people, they are just OEC trying, poorly, to cloak themselves.   In any case, both ID and OEC want to deny that evolution occurs by natural selection acting on random variation.  So they harbor a desire to find that genomes evolve by some non-random process.  The Central Dogma version of genome biology didn’t leave much room for that, so they are quick to latch onto any kind of possible supervisory mechanism built into the genome that could direct evolution in some way.  Even reverse transcriptase and horizontal gene transfer makes them a little excited.  The vast “dark matter” of the genome, though, was the stuff of their fantasies.  Maybe, they dreamed,  there is some uber-program put in all that junk by God to direct the evolution toward man, to direct the “micro-evolution” within Kinds that some of them have been forced, grudgingly, to accept.  They make these unfounded claims all the time.  Now this reckless ENCODE abstract gives them a citation.  They are going to go ape.   


  • John Harshman

    G. Spring:

    That just isn’t true. A great many IDiots, even YECs, are willing to admit that evolution can occur by natural selection acting on random variation. They’re just careful to distinguish microevolution (which does happen) from macroevolution (which doesn’t, because that would mean you’re an ape). That is, they give selection a very limited, mostly conservative role. And of course this has nothing to do with junk DNA, which isn’t subject to selection anyway. (That’s why it’s junk.) The claim that there is no junk is in fact a refusal to admit drift, not selection. I can’t see this as anything other than a claim that god would keep everything tidy.


  • G. Spring

    Should have said “deny that macro-evolution…”.  Anyway, I don’t think they can be understood by analyzing the logic of their stated positions.  Sure, the logic of OEC/ID stated positions are compatible with any status for “junk” DNA.  They aren’t in the business of following the logic of their stated positions, they are in the business of rationalizing their prior beliefs about God and propagandizing those beliefs.   Some concede more science than others, but my feeling is that they do so only grudgingly and that their heart isn’t in it.  So while their stated views do not entail any conflict with there being junk DNA, emotionally, and as a matter of propaganda, junk DNA is disturbing to their worldview and to their followers in exactly the “untidy” sense that you mention.   Maybe you mean to highlight this disconnect between their stated logic and how they react? 


  • mmammen

    Erik Lander’s review in Nature last year is outstanding background reading for this discussion.  I don’t believe we are in a position yet to determine percentage “functionality”.  We don’t know how to define that term appropriately for this context, and we are still learning.  What’s clear now is that there are approximately 20,000 protein-coding elements (lots of versions of each given alternative splicing), then a large number of control elements.  The evolution appears to mainly involve the latter, at least over the past million years or so. 


  • RexTugwell

    Now we’re arguing about the definition of function. No Darwinist bothered to take the time to define the term before  the ENCODE project. Curious. 
    “It depends on what the definition of ‘is’ is.” LMAO!
    The emperor has no clothes! 


  • N.Wells

    Rex Tugwell: you say that biologists hadn’t previously bothered to define “function”.  How about reading to the end of Ryan’s post?  (It’s well written and not that long, so I’m sure you can do it).  The first detailed discussion of “junk DNA” included discussion of possible and probable functions of “junk DNA”, by D.E. Comings in 1972.   How about checking out the quotes Ryan collects in the series that he links to early in his post? ( http://www.genomicron.evolverzone.com/2008/02/junk-dna-quotes-of-interest-series/ ).  How about getting a clue before you post?
    Also note that in common parlance “junk” does not mean absolutely and completely useless, or else junk stores and junk sales could not exist, as no one would ever purchase completely useless junk.  “Junk” is merely useless most of the time to most people. 


  • RexTugwell

    Don’t get mad at me, Ms. Wells. Take it up with The ENCODE Project Consortium and the editorial staff at Nature. They’re the ones you want to split hairs with over what  biochemical function means. I’ll tell you what. You guys come up with a natural explanation for the origin of life and we’ll call it even. 


  • Galt

    T. Ryan Gregory : You are so right! Thanks!


    If you can explain what those 80 % functional regions are for the public, that will be great. I feel as a reader I am a bit confused about what you are trying to say. Are you trying to persuade people that it is great to discover 80 % of the functional genome or you dont believe it?


Leave a Reply




You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>