Some of the more prominent figures in the ENCODE project, such as Ewan Birney and John Stamatoyannopoulos, have been making statements in the media that exacerbate the hype surrounding the ENCODE results and the infamous claim that “80% of the DNA in the genome is functional”. For the most part, I have been criticizing “ENCODE” as a collective for this, but it bears noting that this initiative involved more than 400 researchers, only a few of whom have been interviewed by the media. Thankfully, some of the others are beginning to make it clear that not everyone involved with the project agrees with the “death of junk DNA” spin that has dominated the discussion.
For example, Max Libbrecht, a PhD student who worked on the ENCODE project, wrote this in the comments thread of Michael White‘s piece at Huffington Post:
Max Libbrecht from ENCODE and the ENCODE AMA on reddit here. Since I’m mentioned in the comments, I thought I’d put in that I essentially agree with this article: ENCODE did not debunk the idea of “junk” DNA, contrary to many news outlets. Here is one summary of the true results and their misinterpretation — there are many others:
Max is referring to a discussion that was held on Reddit about the project. Also taking part was “rule_30“, who is “a biology grad student who’s contributed experimental and analytical methodologies”. Rule_30 engaged in a lengthy discussion with Larry Moran and Diogenes on the topic of ENCODE’s findings and the surrounding media fiasco. You should go and read the whole thing, but here are some key statements by rule_30:
I do NOT think ANYONE has demonstrated function for most of our genome. In fact, ENCODE has not demonstrated function for ANYTHING because we published no functional studies. The only thing ENCODE has done is to find new regions on the genome that are correlated, in terms of their chemical signature (i.e. chromatin state of “openness”, transcription factor occupancy, etc.), with other regions that have been proven functional by site-directed experiments. Correlated, no more and no less. And furthermore, it is even impossible to properly set thresholds for what is a real chemical signal and what is an artifact in these assays, as MH and I have discussed elsewhere in this thread. The 80% figure is almost certainly not even real chemical signatures. If you notice, 80% of the genome is the percent of the genome that is mappable so right now, I think the 80% figure simply means that if you sequence any complex genome-wide dataset deeply enough, you will eventually return the entire genome. It’s just a signal-to-noise issue: if you keep looking, you’ll eventually get all the noise possible: the entire mappable genome. Ewan knows this: in his blog, he says that he could either have said the 80% (low-confidence) figure or the more conservative 20% figure that we are more certain is actually telling us something that’s more signal and minimal noise. But he chose the 80% figure in the end and the rest is history.
Heck, all of it could still be “junk” by ENCODE results alone (and NOW when I say “junk”, what I mean is that they don’t have a direct effect on gene expression). First of all, the 80% figure could easily include more noise than signal because it was the informatically low-confidence set of called regions, so it’s not even clear that what’s in those 80% of regions are even what’s in the cell. Second of all, it’s unclear what many of these assays mean in terms of physical reality. For example, ChIP-Seq signal size is uncorrelated with factor occupancy or “function” as we currently understand it. Yes, we see signals where we know from other experiments that there is binding, but the seemingly most biologically important sites are not the largest signals. Therefore, the informatics thresholds are probably uncorrelated with degree of occupancy; they are only correlated with how certain we are they they are not simply an informatics artifact. Third, EVEN IF we believed that most of the regions we identify are real (i.e. there is occupancy there), as is likely the case for the more conservative 20% of the genome, that only means that that chemical signature is there — it DOESN’T mean that this has anything to do with function. It is ENTIRELY possible, for example, that wherever you have open chromatin and a visible DNA motif, that a transcription factor in excess will bind to it. As long as this doesn’t mess up function, there is no reason it would be selected against.
I would’ve never made that statement [“80% of the genome is functional”], but it’s been made now and nobody can take it back. I understand that I’m in the consortium and have to own up to anything it says. But please at least understand that it was not a malicious or intentional lie. Yes, the sound bite the public got is incorrect (well, we don’t know if it’s correct or not yet), and even though we didn’t mean it that way, that’s how it was characterized, through ENCODE’s fault and others. But it’s NOT a “lie” in the sense that we intentionally misled anyone. I think Ewan made a poor choice of words when he said “functional” instead of “detectable chemical signal that is elsewhere associated with gene activity” or something more agnostic. And the worst part is that this whole % of the genome that is functional has absolutely jack-squat to do with ENCODE’s actual results, as far as I’m concerned.
I don’t know if either of these two graduate students has a blog, but if not, they are more than welcome to write a guest post here on Genomicron to clarify their own thoughts on the ENCODE project’s findings!