Big news about Evolution: Education and Outreach.

l You may recall that I was an Associate Editor of the journal Evolution: Education and Outreach from 2007-2009. I also edited the first “special issue” of the journal, on the subject of eye evolution, and I wrote a number of papers for early issues of the journal.

You may also remember that I resigned from the editorial board of the journal when the publisher, Springer, stopped making the journal available free online. I felt that this went against the intent of the journal, which from the outset was to make high-quality but accessible articles available to scientists, educators, and interested members of the public.

Well, I am very pleased to announce that Springer is planning to return to an open-access model for the journal in January 2013!

This means:

• All articles published in a SpringerOpen journal are open access and immediately accessible to anyone, anywhere, without a subscription or other paywall. In addition, all articles will be deposited in PubMed Central.

• Authors will retain copyright, licensing the article under a Creative Commons license. This means that an article can be freely redistributed and reused as long as the material is correctly attributed.

• As open access journals follow an online-only, continuous publishing model, with one volume and issue published per year, future promotion will not include printed issues of the journal.

For me, this also means that I will begin writing articles for the journal again in 2013.

I have also agreed to re-join the editorial board, this time as Senior Handling Editor. So, friends and colleagues, you can expect me to begin soliciting papers from you in the new year.

All in all, this is great news for evolution educators.

Stephen Jay Gould conference in Italy — full series of talks.

The complete series of talks is now posted from the conference on Stephen J. Gould’s Legacy: Nature, History, Society, May 10-12, 2012 at the Istituto Veneto di Scienze, Lettere ed Arti in Venice, Italy.

Telmo Pievani – Ten years without Stephen J. Gould: the scientific heritage

Niles Eldredge – Stephen Jay Gould in the 1960s and 1970s, and the Origin of “Punctuated Equilibria”

Alessandro Minelli – Individuals, hierarchies, and the levels of selection

Elisabeth Lloyd – Gould and adaptation: San Marco 33 years later

Gerd Müller – Beyond Spandrels: S.J. Gould, EvoDevo, and the Extended Synthesis

T. Ryan Gregory – A Gouldian view of the genome

Giuseppe Longo – Randomness increases biological organization

Marcello Buiatti – Biological complexity and punctuated equilibria

Ian Tattersall – Steve Gould’s intellectual legacy to anthropology

Guido Barbujani – Mismeasuring man thirty years later

Excellent opportunity for Canadian PhD students (please pass it on).

Are you an excellent Canadian student who is interested in completing a PhD? Know someone who fits that description? If so, please note (and/or pass it on). I will be happy to discuss possible projects with qualified candidates to work in my lab.

Click for larger image.

Again, this is for Canadian students only.

Dear Genome: Say what?

Here’s the first sentence from a paper published recently in Genome by Vibhu Ranjan Prasad and Karin Isler:

Gene content, the number of genes coding for proteins, is correlated with genome size in both noneukaryotes and eukaryotes (Lynch and Conery 2003; Konstantinidis and Tiedje 2004; Gregory 2002, 2005).

Say what?

The whole C-value enigma is based on the well-known discrepancy between genome size and gene number, which should be very clear from my papers (including the two that they cite).

From the first page of Gregory (2002):

Genome size bears no relationship to organismal complexity. If C-values are constant because DNA is the stuff of genes, then how could they be unrelated to gene number?

From the first page of Gregory (2005):

The discordance between genome size and organism complexity or gene number did not remain a paradox — that is, a pair of mutually exclusive truths — for very long. The discovery of non-coding DNA in the early 1970s explained the failure of DNA content to reflect the number of genes, and in so doing resolved the paradox. However, as with most significant advances in genetic knowledge, this finding raised more questions than it answered.

Not to mention this figure from Gregory (2005):

As is often the case, these authors focused only on sequenced genomes. For prokaryotes this isn’t such a problem because there is a much narrower range in genome size. The main bias in prokaryote genome sequencing has more to do with which species can be cultured in the lab. In eukaryotes, only species with small to medium sized genomes have been sequenced due to the difficulties and expense of working with large genomes.

The authors make the valid point that we should incorporate phylogenetic information where possible in doing comparisons across species, such as when assessing potential relationships between genome size and gene number. However, they are missing the much more important bias in the data, which is that it only includes eukaryotes of smallish genome sizes.

And for crying out loud, how did such a blatant miscitation make it into the journal?

Prasad, V.J. and K. Isler. (2012). Assessment of phylogenetic structure in genome size – gene content correlations. Genome 55: 391-395.

Gregory, T.R. (2005). Synergy between sequence and size in large-scale genomics. Nature Reviews Genetics 6: 699-708.

Gregory, T.R. (2002). A bird’s-eye view of the C-value enigma: genome size, cell size, and metabolic rate in the class Aves. Evolution 56: 121-130.

I didn’t like Prometheus.

I didn’t like the movie Prometheus. It thought it was incredibly lazy writing and didn’t even try to construct a coherent plot or introduce worthwhile characters. This was a real disappointment, because Alien and Aliens are great movies (Alien 3 sucked, and I couldn’t even bring myself to watch Alien: Resurrection). It seems that I am not alone in my view on this film as a confusing let-down that had little going for it besides some decent special effects. Seeing these made me feel better.

Max Libbrecht on ENCODE’s results regarding junk DNA.

Max Libbrecht is a PhD student at the University of Washington, and was one of the several hundred researchers involved in the ENCODE project. I have already mentioned Max because of his important comments on ENCODE and junk DNA elsewhere on the interwebs. I mentioned that he would be welcome to write a guest post on my blog, but he has done one better and started his own blog. Max’s inaugural post is entitled “On ENCODE’s results regarding junk DNA“. It’s an important, unambiguous statement coming from an ENCODE team member, so I am reproducing it in its entirety here. That’s it, though — I’m not re-posting all his future blog posts, so you’ll just have to head over there and subscribe yourself.

After I took part in an AMA (“Ask Me Anything”) on reddit, there has been some discussion elsewhere (such as by Ryan Gregory and in the comments of Ewan Birney’s blog) of what I and the other ENCODE scientists meant.  In response, I’d like to echo what many others have said regarding the significance of ENCODE on the fraction of the genome that is “junk” (or nonfunctional, or unimportant to phenotype, or evolutionarily unconserved).

In its press releases, ENCODE reported finding 80% of the genome with “specific biochemical activity”, which turned into (through some combination of poor presentation on the part of ENCODE and poor interpretation on the part of the media) reports that 80% of the genome is functional.  This claim is unlikely given what we know about the genome (here is a good explanation of why), so this created some amount of controversy.

I think very few members of ENCODE believe that the consortium proved that 80% of the genome is functional; no one claimed as much on the reddit AMA, and Ewan Birney has made it clear on his blog that he would not make this claim either.  In fact, I think importance of ENCODE’s results on the question of what fraction of DNA is functional is very small, and that question is much better answered with other analysis, like that of evolutionary conservation.  Lacking proof either way from ENCODE, there was some disagreement on the AMA regarding what the most likely true fraction is, but I think this stemmed from disagreements about definitions and willingness to hypothesize about undiscovered function, not misinterpretation of the significance of ENCODE’s results.

I think many members of the consortium (including Ewan Birney) regret the choice of terminology that led to the misinterpretations of the 80% number.  Unfortunately, such misinterpretations are always a danger in scientific communication (both among the scientific community and to the public).  Whether the consortium could have done a better job explaining the results, and whether we should expect the media to more accurately represent scientific results, is hard to say.

I think the contribution of ENCODE lies not in determining what DNA is functional but rather in determining what the functional DNA actually does.  This was the focus of the integration paper and the companion papers, and I would have preferred for this to be the focus of the media coverage.

ENCODE spokesperson: 40%, not 80%.

Here’s a very interesting quote, provided by Faye Flam at Knight Science Journalism.
(Emphasis mine)

I decided … I would go back to ENCODE biologist John Stamatoyannopoulos, who was quoted in the first wave of news. He said he thought the skeptics hadn’t fully understood the papers, and that some of the activity measured in their tests does involve human genes and contributes something to our human physiology. He did admit that the press conference mislead people by claiming that 80% of our genome was essential and useful. He puts that number at 40%. Otherwise he stands by all the ENCODE claims:

“What the ENCODE papers (not the main paper in Nature, but the other length papers that accompanied it) have to say about transposons is incredibly interesting. Essentially, large numbers of these elements come alive in an incredibly cell-specific fashion, and this activity is closely synchronized with cohorts of nearby regulatory DNA regions that are not in transposons, and with the activity of the genes that those regulatory elements control. All of which points squarely to the conclusion that such transposons have been co-opted for the regulation of human genes — that they have become regulatory DNA. This is the rule, not the exception….”

Some co-option of TEs has been known for a long time, and was fully expected by the authors of the “selfish DNA” papers in 1980. 40% of the genome functional in this manner, though? We shall see. Correlation is not causation, and TEs being active under different circumstances can be explained in various ways. In any case, even John Stamatoyannopoulos is now backtracking on the 80% claim, stating that the evidence is consistent with the view that the majority of the genome does not, in fact, have any functional role in the way that word is usually understood.

Student ENCODE authors show the way.

Some of the more prominent figures in the ENCODE project, such as Ewan Birney and John Stamatoyannopoulos, have been making statements in the media that exacerbate the hype surrounding the ENCODE results and the infamous claim that “80% of the DNA in the genome is functional”.  For the most part, I have been criticizing “ENCODE” as a collective for this, but it bears noting that this initiative involved more than 400 researchers, only a few of whom have been interviewed by the media.  Thankfully, some of the others are beginning to make it clear that not everyone involved with the project agrees with the “death of junk DNA” spin that has dominated the discussion.

For example, Max Libbrecht, a PhD student who worked on the ENCODE project, wrote this in the comments thread of Michael White‘s piece at Huffington Post:

Max Libbrecht from ENCODE and the ENCODE AMA on reddit here. Since I’m mentioned in the comments, I thought I’d put in that I essentially agree with this article: ENCODE did not debunk the idea of “junk” DNA, contrary to many news outlets. Here is one summary of the true results and their misinterpretation — there are many others:

Max is referring to a discussion that was held on Reddit about the project. Also taking part was “rule_30“, who is “a biology grad student who’s contributed experimental and analytical methodologies”. Rule_30 engaged in a lengthy discussion with Larry Moran and Diogenes on the topic of ENCODE’s findings and the surrounding media fiasco. You should go and read the whole thing, but here are some key statements by rule_30:

I do NOT think ANYONE has demonstrated function for most of our genome. In fact, ENCODE has not demonstrated function for ANYTHING because we published no functional studies. The only thing ENCODE has done is to find new regions on the genome that are correlated, in terms of their chemical signature (i.e. chromatin state of “openness”, transcription factor occupancy, etc.), with other regions that have been proven functional by site-directed experiments. Correlated, no more and no less. And furthermore, it is even impossible to properly set thresholds for what is a real chemical signal and what is an artifact in these assays, as MH and I have discussed elsewhere in this thread. The 80% figure is almost certainly not even real chemical signatures. If you notice, 80% of the genome is the percent of the genome that is mappable so right now, I think the 80% figure simply means that if you sequence any complex genome-wide dataset deeply enough, you will eventually return the entire genome. It’s just a signal-to-noise issue: if you keep looking, you’ll eventually get all the noise possible: the entire mappable genome. Ewan knows this: in his blog, he says that he could either have said the 80% (low-confidence) figure or the more conservative 20% figure that we are more certain is actually telling us something that’s more signal and minimal noise. But he chose the 80% figure in the end and the rest is history.

Heck, all of it could still be “junk” by ENCODE results alone (and NOW when I say “junk”, what I mean is that they don’t have a direct effect on gene expression). First of all, the 80% figure could easily include more noise than signal because it was the informatically low-confidence set of called regions, so it’s not even clear that what’s in those 80% of regions are even what’s in the cell. Second of all, it’s unclear what many of these assays mean in terms of physical reality. For example, ChIP-Seq signal size is uncorrelated with factor occupancy or “function” as we currently understand it. Yes, we see signals where we know from other experiments that there is binding, but the seemingly most biologically important sites are not the largest signals. Therefore, the informatics thresholds are probably uncorrelated with degree of occupancy; they are only correlated with how certain we are they they are not simply an informatics artifact. Third, EVEN IF we believed that most of the regions we identify are real (i.e. there is occupancy there), as is likely the case for the more conservative 20% of the genome, that only means that that chemical signature is there — it DOESN’T mean that this has anything to do with function. It is ENTIRELY possible, for example, that wherever you have open chromatin and a visible DNA motif, that a transcription factor in excess will bind to it. As long as this doesn’t mess up function, there is no reason it would be selected against.

I would’ve never made that statement [“80% of the genome is functional”], but it’s been made now and nobody can take it back. I understand that I’m in the consortium and have to own up to anything it says. But please at least understand that it was not a malicious or intentional lie. Yes, the sound bite the public got is incorrect (well, we don’t know if it’s correct or not yet), and even though we didn’t mean it that way, that’s how it was characterized, through ENCODE’s fault and others. But it’s NOT a “lie” in the sense that we intentionally misled anyone. I think Ewan made a poor choice of words when he said “functional” instead of “detectable chemical signal that is elsewhere associated with gene activity” or something more agnostic. And the worst part is that this whole % of the genome that is functional has absolutely jack-squat to do with ENCODE’s actual results, as far as I’m concerned.

I don’t know if either of these two graduate students has a blog, but if not, they are more than welcome to write a guest post here on Genomicron to clarify their own thoughts on the ENCODE project’s findings!

More ENCODE videos.

In addition to the cringe-inducing cartoon sponsored by Nature, here are some other videos that have been put out about the ENCODE results. You tell me who you think is responsible for propagating the claim that ENCODE found that “80% of the genome is functional” and that it has overturned the notion of “junk DNA”: the media, the journal publishers, the project leaders?

New Scientist on junk DNA.

In terms of articles on the topic of junk DNA, New Scientist has consistently been among the most even-handed and accurate. Here is a list of examples from the magazine:

The ever deepening mystery of the human genome

Don’t junk the ‘junk DNA’ just yet

Why ‘junk DNA’ may be useful after all

Unknown genome: What we still don’t know about our DNA

Evolution: 24 myths and misconceptions

(New Scientist is not perfect, of course. There have been some real clunkers there as well.)