80%* of the genome is functional*!

You’re going to be hearing a lot about the ENCODE project for the next little while. 30 papers were released today, and there is plenty of media attention already. Lots of it is of the standard “it’s not junk after all!” variety. Of particular note in most reports is the claim by the ENCODE authors that 80% of the genome is “functional”.

There’s a reason for the asterisks in the title of this post, though. Here’s what ENCODE project leader Ewan Birney has to say on his own blog, first about the meaning of the term “functional”:

Q. Hmmm. Let’s move onto the science. I don’t buy that 80% of the genome is functional.
A. It’s clear that 80% of the genome has a specific biochemical activity – whatever that might be. This question hinges on the word “functional” so let’s try to tackle this first. Like many English language words, “functional” is a very useful but context-dependent word. Does a “functional element” in the genome mean something that changes a biochemical property of the cell (i.e., if the sequence was not here, the biochemistry would be different) or is it something that changes a phenotypically observable trait that affects the whole organism? At their limits (considering all the biochemical activities being a phenotype), these two definitions merge. Having spent a long time thinking about and discussing this, not a single definition of “functional” works for all conversations. We have to be precise about the context. Pragmatically, in ENCODE we define our criteria as “specific biochemical activity” – for example, an assay that identifies a series of bases. This is not the entire genome (so, for example, things like “having a phosphodiester bond” would not qualify). We then subset this into different classes of assay; in decreasing order of coverage these are: RNA, “broad” histone modifications, “narrow” histone modifications, DNaseI hypersensitive sites, Transcription Factor ChIP-seq peaks, DNaseI Footprints, Transcription Factor bound motifs, and finally Exons.

In other words, the validity of the claim depends on how one defines “functional”, and ENCODE takes a very liberal view on this question.

And on that 80% figure:

Q. Ok, fair enough. But are you most comfortable with the 10% to 20% figure for the hard-core functional bases? Why emphasize the 80% figure in the abstract and press release?
A. (Sigh.) Indeed. Originally I pushed for using an “80% overall” figure and a “20% conservative floor” figure, since the 20% was extrapolated from the sampling. But putting two percentage-based numbers in the same breath/paragraph is asking a lot of your listener/reader – they need to understand why there is such a big difference between the two numbers, and that takes perhaps more explaining than most people have the patience for. We had to decide on a percentage, because that is easier to visualize, and we choose 80% because (a) it is inclusive of all the ENCODE experiments (and we did not want to leave any of the sub-projects out) and (b) 80% best coveys the difference between a genome made mostly of dead wood and one that is alive with activity. We refer also to “4 million switches”, and that represents the bound motifs and footprints.

We use the bigger number because it brings home the impact of this work to a much wider audience. But we are in fact using an accurate, well-defined figure when we say that 80% of the genome has specific biological activity.

This seems to suggest that the 80% figure is actually for sequences with “biological activity” — a term even more loosely defined than “function”. And they went with the 80% figure because a) it generates attention, and b) people are too busy to grasp a nuanced discussion of “20% potentially functional given present evidence, but up to 80% has some kind of activity that might also imply function”.

1 comment to 80%* of the genome is functional*!

  • Daniël P Melters

    A nice article on ENCODE. A bit late, but still worth a read:
    Gaur et al On the immortality of television sets: “function” in the human genome according to the 
    evolution-free gospel of ENCODE 2013 GBE
    Here is the abstract:
    “A recent slew of ENCODE Consortium publications, specifically the article signed 
    by all Consortium members, put forward the idea that more than 80% of the 
    human genome is functional. This claim flies in the face of current estimates 
    according to which the fraction of the genome that is evolutionarily conserved 
    through purifying selection is under 10%. Thus, according to the ENCODE 
    Consortium, a biological function can be maintained indefinitely without selection, 
    which implies that at least 80 – 10 = 70% of the genome is perfectly invulnerable to 
    deleterious mutations, either because no mutation can ever occur in these 
    “functional” regions, or because no mutation in these regions can ever be 
    deleterious. This absurd conclusion was reached through various means, chiefly (1) 
    by employing the seldom used “causal role” definition of biological function and 
    then applying it inconsistently to different biochemical properties, (2) by committing 
    a logical fallacy known as “affirming the consequent,” (3) by failing to appreciate 
    the crucial difference between “junk DNA” and “garbage DNA,” (4) by using 
    analytical methods that yield biased errors and inflate estimates of functionality, (5) 
    by favoring statistical sensitivity over specificity, and (6) by emphasizing statistical 
    significance rather than the magnitude of the effect. Here, we detail the many logical 
    and methodological transgressions involved in assigning functionality to almost 
    every nucleotide in the human genome. The ENCODE results were predicted by one 
    of its authors to necessitate the rewriting of textbooks. We agree, many textbooks 
    dealing with marketing, mass-media hype, and public relations may well have to be 


Leave a Reply




You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>