Press offices may be the problem.

In their recent book A Scientist’s Guide to Talking with the Media (which I enthusiastically recommend, by the way), Hayes and Grossman describe the role that university press offices have in disseminating new findings by researchers at their institutions. I agree that this is an important job and that good press releases can have a very positive effect. The corollary, of course, would be that poorly crafted ones can sow confusion. I have been critical of science blogs and science news services in the past, but in some cases they are simply re-posting (albeit uncritically) the stories from press offices, which may be where the actual problem is.

Witness two frustrating examples from genome biology. The first was by the press office at Johns Hopkins [How Neutral Genetic Drift Shaped Our Genome], and was re-posted by some science blogs. The second is by the University of California, San Diego, and is re-posted at ScienceDaily and Scientific Blogging [One Man’s Junk May Be A Genomic Treasure].

Both stories are guilty of over-hyping the significance of the research (which perhaps is not surprising) and of including significant factual errors (which is not acceptable). Notably, the Johns Hopkins release mangles basic evolutionary theory, and now we have this from UC San Diego:

Scientists have only recently begun to speculate that what’s referred to as “junk” DNA — the 96 percent of the human genome that doesn’t encode for proteins and previously seemed to have no useful purpose — is present in the genome for an important reason. But it wasn’t clear what the reason was. Now, researchers at the University of California, San Diego (UCSD) School of Medicine have discovered one important function of so-called junk DNA.

The first line is patently false. In my area of study, I encounter far too many speculations regarding functions for non-coding DNA, and this is how the situation has been for decades. It also bears noting that the study in question studied one transposable element, SINE B2, which makes up around 2.4% of the mouse genome and appears to contribute to the regulation of a growth hormone gene. This is not very surprising; recall that McClintock first characterized transposable elements as “controlling elements”, and even the earliest and most vocal proponents of the “selfish DNA” hypothesis surmised that some TEs would have regulatory functions. SINE B2 itself has been implicated in regulation at least since 1984 (see also here from 2001). This is not in any way a critical comment about the work — it is sure to be an interesting study and I look forward to reading the article when it appears in Science this week. But this press release — which I suspect had little to do with the authors of the study — is vastly overstated to the point of twisting the history of the discipline. (They also suggest that protein-coding genes make up 4% of the human genome whereas the real total is less than 2%, but that’s a comparatively minor issue).

I am very interested in working with the media to provide accessible, interesting, and (not or) accurate information to the public. I also realize that writing about scientific research is difficult, and that there are many individuals out there who are very good at it. I don’t like to be critical all the time, but we really must clean up reports on “junk DNA”. I am open to any suggestions on how scientists can help to make this a reality.


Should scientists nit-pick?

I have some quick questions for the scientists, journalists, and neither-nors who read Genomicron. Should scientists nit-pick inaccuracies in news reports? Does it do any good to keep journalism on track, or is it a waste of time (or maybe even irritating)? What else can we do to improve the accuracy of science reporting, in particular involving cooperation with rather than criticism of science writers? Since I don’t have the answers yet, I will just go ahead with a snarky post in the meantime.

Here is what triggered this query. I was reading this story, How Neutral Genetic Drift Shaped Our Genome, at Scientific Blogging (based on a Johns Hopkins press office release), when I came upon this gem:

When they expanded their study across the whole human genome, they found more than 1200 such pieces of mitochondrial DNA of various lengths [nuclear pseudogenes of mitochondrial origin, or numts] embedded into chromosomes. While chimps have a comparable number, mice and rats only have around 600 numts. Since they increase in frequency as species advance, it suggested there was some evolutionary purpose to keeping them around.

Strikingly, however, none of these numts contained the blueprint (an actual gene) to make a protein that does anything, nor did they seem to control the function of any nearby genes. “At best, it seems numts are a neutral part of our genome,” says Katsanis. “If anything, they may be mildly negative since long repeat sequences can be unstable or get inserted inside genes and disrupt them.”

Ok, the nit-picker in me really wants to point out that evolution is not a process under which “species advance”, that there is no such thing as an “evolutionary purpose”, that genes are not blueprints, that this study deals with a tiny fraction of the genome, that the idea that “they accumulate steadily” makes no sense in terms of either function or neutrality given what they just said about rodents versus primates, and that of course these pseudogenes — of mitochondrial descent, no less — do not encode a functional protein by definition. There is absolutely nothing surprising in the observation that they evolve neutrally in the nuclear genome.

To Larry Moran’s recent post Stop the Press!!! … Genes Have Regulatory Sequences!, we can add Pseudogenes do not encode functional proteins! Next: DNA includes both protein-coding genes and non-coding sequence!

(Update: Scientific Blogging redeems itself with a good post about the ENCODE hype here)
(Another update: Evolution Diary uncritically posts the same nonsensical press release here.)
(Another update: The paper itself is actually far more interesting than this poorly crafted press release would make it appear. See the open access pdf here.)



Junk and genomes in The Scientist.

I used to subscribe to (and quite enjoy) The Scientist when it was a free publication, but have mostly stopped reading it now. Two stories in the latest issue were pointed out to me by readers of Genomicron, so I thought I should provide some brief comments.

The first relates to an editorial by Richard Gallagher entitled “Junk Worth Keeping“, which asks “Is it time to retire provocative descriptors such as ‘junk DNA’?”. The answer, says Gallagher, is “no”. Why? Because “junk DNA” is useful for… framing. He cites Nisbet, although as I have come to learn, framing, sensu Mooney and Nisbet, applies to conveying science in a particular way to help win an election. I’m not sure what “junk DNA” has to do with defeating the Republicans as per Mooney’s formulation of framing, and in any case, Greg Laden has already provided a very useful discussion of the fact that “junk DNA” is a frame that anti-evolutionists exploit because it allows them to obfuscate genetic knowledge to their own ends. They do, of course, get both the history and the science of the term totally wrong, but that’s not the point when it comes to spin. In other words, the term “junk DNA” is a horrible way to communicate the complexity of non-coding DNA to an audience with no background in evolution or genome biology. Granted, it does get them to talk about the issue, but I believe there must be better ways to do this than to set up the discussion to be oversimplified and confused.
[Hat tip: junkdna.com]

The second comes from a story by Melissa Lee Phillips reporting “Surprises in the Sea Anemone Genome“. Here are some statements of interest:

  1. “…the sea anemone, one of the oldest living animal species on Earth…”
  2. “The study also found that these similarities were absent from fruit fly and nematode genomes, contradicting the widely held belief that organisms become more complex through evolution.”
  3. “It’s surprising to find such a ‘high level of genomic complexity in a supposedly primitive animal such as the sea anemone,’ Koonin told The Scientist.”

I will probably do an entire post on the common fallacy of describing one extant lineage as older than another (common descent dictates that all contemporary lineages are of exactly the same age, although some have undergone more branching and/or morphological change than others). But statement #1 is especially inaccurate because the age of a species is not the same thing as the age of a lineage (which is what the author meant to describe). This species (Nematostella vectensis) could, for all we know, be very young, and certainly there is no basis in evidence for calling it one of the oldest species on Earth. (The original paper in Science is at least clearer on this, describing the Cnidaria as “the oldest eumetazoan phylum”, though this is difficult to substantiate given that no unambiguous fossils exist for eumetazoa prior to the Cambrian, at which time early representatives of modern phyla were already present; they cite two papers that can, at best, only suggest that Cnidarians [and perhaps molluscs] were present among the Ediacaran biota). And, of course, “the” sea anemone is a misnomer as this name applies to the entire order Actinaria, within which there are several dozen families.

Statement #2 may refer to a widely held belief among the public (and perhaps among some genome sequencers), but the expectation that complexity always increases in evolution is a fallacy that was abandoned long ago by evolutionary biologists. (It also bears noting that organisms do not evolve, populations do). A huge amount of the diversity on this planet is composed of parasites, and their evolution often involves simplification. In this case, it is not morphological but genome evolution that is under discussion, and there is really no reason to expect an increase in complexity here either. Gene loss is a well known phenomenon, and in fact one could make a case that streamlining of the genome or having one gene do multiple things is related to increased morphological complexity whereas gene number is not. Take, for example, the sea urchin, whose immune system seams to involve a large number of genes, as compared to a vertebrate, whose immune system generates nearly endless variation by recombining a comparatively limited number of genes. (Incidentally, the anemone genome appears to contain 18,000 genes).

Statement #3 is not necessarily inaccurate, but I caution readers that it must be interpreted in a certain way. “Primitive” does not mean “less complex” or “less advanced”, it means “more like the last common ancestor of the groups being compared”. In this sense, a sea anemone is probably more like the last common ancestor of all animals than is, say, a fly. But this is also a modern species whose overall lineage has been around for exactly the same amount of time as that of the fly. The anemone lineage may have undergone fewer morphological changes (though still probably a lot) than the lineage that led to insects, but the anemone itself is not the common ancestor and in fact may bear only a modest resemblance to the ancestor. This is exactly the same confusion people face when thinking about humans and chimpanzees. Chimps were not the ancestor of humans, and may bear only a modest resemblance to the common ancestor of the two lineages — the two split millions of years ago, and both lineages have undergone considerable change since, with many species arising and disappearing in the meantime.

I like to see these interesting topics covered in The Scientist, but I must admit that these two pieces in particular do not inspire me to re-subscribe to the magazine.

While I’m at it, let me register a small complaint about the story by Elizabeth Pennisi in Science [Sea Anemone Provides a New View of Animal Evolution]. I have become mostly resigned to the fact that science writers will insist on describing genome sequencing as “decoding” a genome, and this story is no exception. What bothers me more is that Pennisi is so sloppy in characterizing evolution as a “progressive” process. Thus, she argues that “genome sequencers have just jumped down to a lower branch on the tree of life“, and that “until now, researchers have relied heavily on the sequenced genomes of the fruit fly, nematode, and that of a few other invertebrates to understand genome evolution leading up to the vertebrates“. (Update: Apparently the authors the paper cautioned Pennisi about this, but to no avail!).

Evidently, we have a lot of work to do in clearing up the basic details of how evolution occurs.


On framing.

I finally checked out the “framing” presentation by Chris Mooney and Matthew Nisbet which is available with PowerPoint slides here. I am not particularly interested in the debate over this issue, but I thought I would give it a try in light of my hope of improving media coverage and public comprehension of science. This is not my entry into the debate as I think it has garnered more attention that it warrants already; this is simply a set of thoughts on the issue after having spent the time watching the talk.

I will say that I found much to agree with as far as the descriptive components were concerned. That is, I think Mooney and Nisbet make some good arguments with regard to what is and is not working in scientific communication. This is Nisbet’s subject of research, and it was useful to see actual data applied to the question. My sense was that “framing” likely is something that nonspecialists do use when evaluating complex issues, and that this is a problem for scientists who want to convey complicated ideas with societal ramifications to them. However, I think the discussion runs aground in three major areas: 1) How it is presented to scientists, 2) In the failure to distinguish it from “spin” or “marketing”, and 3) When it shifts from description to prescription.

As to the first, Mooney and Nisbet seem to use an only partially appropriate “framing” when speaking to scientists who, both as individual people and as part of a collective, exhibit inherent preferences, biases, and other filters. To wit, scientists in general will be unwilling to compromise certain principles, and there appears to be insufficient appreciation of this fact by framing advocates. For example, scientists will not simplify to the point of eroding accuracy, they will not do anything that could be perceived as lying to the public, and they will never give up on the notion that getting the public to understand science is the primary long-term goal. From what I can gather, Mooney and Nisbet are not asking scientists to compromise on these principles, but this is not stated clearly — following their own advice, this should be presented clearly and repeatedly so as to reassure scientists that they are not being told to betray their scientific ideals. (And if they are asking scientists to do so, then this should be made clear also so that the debate can be put to a swift end).

The question of motives also comes into play as part of the mis-framing of framing. No one can be totally objective, so what scientists are trained to do is to look for biases and associated violations of objectivity so that these can be factored into the evaluation of scientific arguments. Personally, I found myself asking “why do they care what scientists do?”. One obvious explanation is that they are concerned citizens with a particular interest in science and its impacts on society. This is not stated upfront, however, and so questions come up about whether this isn’t an exercise in attention getting (and possibly book promoting) as much as a sincere call to action.

Finally, while I do not read their blogs, I have seen a few links to statements that I have found offensive to my scientific sensibilities. As a case in point, Mooney argues on his blog that science journalists are not the problem (this is also stated in the presentation). It would seem to follow, therefore, that if science is reported inaccurately, sensationalized, overstated in its implications, or otherwise distorted, that is the fault of scientists. Worse, Mooney goes so far as to argue that scientists should just shrug it off and move on if they are misquoted in the media. Again, this ignores the frame that scientists use, in which accuracy is of paramount significance. He also seems to think that simply telling scientists about the difference between a science journalist (well-trained and comprehensive) and a non-science journalist reporting on science (no expertise or experience in dealing with such issues) will make the resentment of the media’s handling of research disappear. It will not.

The second point is the one that has been the primary subject of discussion by some prominent scientist-bloggers, namely that “framing” bears a striking resemblance to “spin”. We all know that “spin” plays a substantial role in politics. To scientists, this is not something to be emulated. I won’t go so far as to say that framing is mere spin, but throughout the presentation I had the strong notion that it was largely indistinguishable from “marketing”. Scientists should care about how their work is presented to and received by the public, and therefore marketing is a legitimate consideration. Indeed, scientists market their work often — to granting agencies, students, journals, and colleagues. Adding some audience-specific adjustments when dealing with the public is perfectly reasonable, but if that’s all “framing” is, then it’s really just repackaged marketing truisms.

The third point, in which Mooney and Nisbet transition from describing the issue to prescribing what scientists should do, was by far the weakest part of the talk. In fact, I found almost nothing in their presentation that actually applied to me as an individual researcher. Almost everything they suggested actually fell under the purview of science writers, press offices, lobby groups, professional societies, or educational organizations. I still do not know what they expect me to do even with information in mind about how the public frames important topics. As a result, much of the talk seems to be about telling scientists what they are doing wrong with no real solutions that individual scientists can or will implement.

If I may, I would also add that Mooney and Nisbet’s discussion is, at heart, not about science or communication, but about American politics. In many other countries, scientific literacy is much higher, issues do occupy the primary stage in election campaigns, and religion and partisanship play a much smaller role in influencing decisions about science. Once again, this suggests that education about science early on is an effective strategy and a viable objective. The question of framing is more geographically and temporally localized than this, and so it is difficult for some scientists who are trained to look beyond such limitations to the larger picture to make framing a primary tool.

In stark contrast to all of this ambiguity and apparent misreading of scientific audiences, I point to the recent book A Scientist’s Guide to Talking with the Media by journalists Richard Hayes and Daniel Grossman, published by the Union of Concerned Scientists. I am only part way through the book, but already I can note that it does a fine job of framing the topic in a manner acceptable to scientists. Hayes and Grossman are very clear that they have the utmost respect for science and scientists, and that they absolutely do not wish to see spin implemented at the expense of accuracy. Theirs is a well articulated set of practical suggestions for dealing with the media. They do not appear to blame scientists but instead point to examples where different strategies could have forestalled problems. They do not let science reporters off the hook, but do try to promote a better understanding among scientists of the challenges of writing for a nonspecialist audience. They do not point out the challenge and leave the solutions unclear, but give point by point suggestions on how to improve the important relationship between scientists and those who report science. As a scientist with some experience with the media, I find a great deal of use in this volume. And I do not hesitate to recommend it as an alternative to the far less helpful argument about framing.


Evolution for Everyone — David Sloan Wilson on CBC.

David Sloan Wilson is Distinguished Professor of Biology and Anthropology at Binghamton University in New York and author of Unto Others, Darwin’s Cathedral, and most recently Evolution for Everyone. I confess that I have not yet read the book, though it is near the top of my pile. (Dr. Wilson and I are on the editorial board of Evolution: Education and Outreach, and he was kind enough to have copies sent to all of us).

On Saturday I happened to be listening to the radio and caught an interview with Dr. Wilson on CBC’s Quirks and Quarks program. You can listen here to a discussion of the book and his ideas about evolution, morality, religion, and other subjects. (Another connection: the host, Bob McDonald, received an honourary Doctor of Letters degree from the University of Guelph at the same convocation at which I received my degree).

Enjoy.


Nonsense from home.

I grew up in and around the small city of Orillia, Ontario. It is a charming place, and was both the hometown of Gordon Lightfoot and the summer home of Stephen Leacock, in the latter case serving as the inspiration for his Sunshine Sketches of a Little Town.

My parents (both since re-married) still live in the area, and I try to get home when I can as a good son should. They also make sure that I am kept up to date with local news of interest, which mostly means stories about the hospital administration’s shenanigans (my mother is a nurse and her husband is an MD, both recently retired and glad of it) and the amazing community project for Zambia that my father and stepmother are hard at work planning.

In addition, my mother enjoys sending me things like the following letter, which appeared in one of the local newspapers. This is, I think, the third rant by a creationist that I have seen in print from this or the smaller paper. A previous one claimed that no one had ever considered the evolution of plants, and thus that creationism must be accurate. I have many botanist colleagues who would be surprised to learn of this omission. In light of the recent poll results that show only 51% of Ontarians accept evolution, I think it is informative. This, by the way, is a lower total than for the USA as a whole, which I find disconcerting. To be fair, the results would probably depend heavily on which part of the province they sampled. Rural areas and small towns differ considerably from the larger cities in various socio-political attitudes. Frankly, I don’t have the time or energy to correct the factual errors and logical fallacies in this latest letter, so I will just post it for your enjoyment (original source).

Evolutionists have their heads in the sand
Orillia Packet & Times
Editorial – Wednesday, May 23, 2007 @ 09:00

Letter to the editor:

Re: M. Brown’s letter “Atheism; a sensible alternative to some”

Atheism: the belief that there is no God. Mr. Albert Einstein, the great theoretical physicist, upon having been asked how much he knew about what there is to be known, said he thought he might know about one hundredth of one per cent without doubt. Most of us know less than he did. All of which leads one to wonder how a person can come to conclude that “there is no God, no creator, no higher intelligence,” when we realize how little we know about anything in general, and origins and beginnings in particular.Atheism does not seem very sensible.

To deny evolution, Mr. Brown writes, is like putting one’s head in the sand, and to dismiss it because we still have apes, is ignorant. Well, where is the evidence for the theory? Where are all the in-between transitional life forms that Mr. Darwin and his followers were sure to be found in the fossil record? After all the digging and searching of the last 150 years, out of hundreds of thousands of fossil discoveries, there is not one clear-cut sample, when there should have been thousands. Dr. Colin Patterson, an evolutionist who was senior paleontologist at the prestigious British Museum of National History and a world-renowned fossil expert, wrote the following about his book entitled “Evolution:” “I fully agree with comments on the lack of direct illustration of evolutionary transitions. If I knew of any, fossil or living, I would certainly have included them. I will lay it on the line; there is not one such fossil for which one could make a watertight argument.”

Surely, this leads open, honest minds to conclude that the theory of evolution and all the resulting evo-babble is unproven, unsubstantiated, without evidence and just plain wrong. As a matter of fact, the earliest fossil record shows that plants and animals appeared suddenly and fully formed, much as we see them today, in complete agreement with the Biblical record. So who, in reality, is putting their head in the sand and ignoring the evidence?

All of which makes one wonder, why evolution is still being taught in our schools, and creation ignored. Why do we so readily allow our children to be misled?

P. Visser

The old saying is not quite accurate: you can go home again, just try not to read the letters to the editor in the local newspaper.


ENCODE links.

The ENCODE paper and related commentaries in Nature (June 14):

List of stories about the ENCODE study:

I think the project is very interesting and important, but as I have said before, one study by itself is rarely revolutionary. ENCODE is adding evidence in favour of a revised understanding of genome function. It, along with many other studies, may require us to re-think a few concepts like “regulatory sequences” or “gene”, but this one paper alone is not engaged in battle against some stubborn establishment that steadfastly refuses to consider new possibilities.


More about ENCODE from Scientific American.

It is probably just coincidence, but two articles for which I gave interviews appeared online today. The first, which I discussed in an earlier post, was online in Wired, One Scientist’s Junk Is a Creationist’s Treasure by Catherine Shaffer. The second appeared in the online edition of Scientific American, The 1 Percent Genome Solution by JR Minkel. Both deal with non-coding DNA, though from rather different perspectives. The first is about creationists invoking the discovery (by evolutionary biologists and other scientists) of (indirect indication of) function in (small sections of) non-coding DNA. The second is about the search for those functions through detailed, rigorous scientific analysis.

I know that science writers have a tough job. And I know that we scientists grumble about a lot of what they generate. But this time I want to do something a little different. I want to give readers some idea of what science writers are faced with when they interview a scientist. This is possible because the interview for Scientific American was conducted by email rather than by phone (which I actually prefer). Have a look at the article, and then see how the interview actually proceeded, and think about the challenge of summarizing my answers, which were admittedly somewhat long-winded (some might say carefully worded so as to avoid confusion and to not overlook important points). Note also the kinds of questions that a writer has to develop.

Here are the pertinent sections from the article:

The consortium found that 5 percent of the studied sequence has been conserved among 23 mammals, suggesting that it plays an important enough role for evolution to preserve while species have evolved. But of all the new ENCODE sequences identified as potentially important, only half fall into the conserved group.

These unconserved sequences may be “bystanders, Birney says”—consequences of the genome’s other functions—that neither help nor hurt cells and may have provided fodder for past evolution.

They could also simply maintain a useful DNA structure or spacing between pieces of DNA regardless of their particular sequence, says genomics researcher T. Ryan Gregory of the University of Guelph in Ontario, who was not part of the consortium.

“The biological insights are mainly incremental at this point,” says genome biologist George Weinstock of the Baylor College of Medicine in Houston, which he says is to be expected of such a pilot study. “This is a ‘community resource’ project, like a genome project, that makes lots of new data available to the community, who then dig into it and mine it for discoveries.”

Gregory says the results, although still cryptic, do hint at new functions and a more complicated genome. “This study shows us how far we are from a comprehensive understanding of the human genome.”

And here are my answers to Minkel’s questions reproduced in full:

How much of what the consortium found is new?

– What is new about this study is the fine focus being applied to the search for functional elements. By way of analogy, this study is like a group of 35 treasure hunters with metal detectors and sifters combing the same 35m of a 3.5km long beach. (In fact, the 35m are broken up into 44 discrete stretches of beach, half of them chosen because they are known to contain lots of interesting objects and the other half selected to include areas with varying properties. The plan is eventually to comb the entire beach this way, but this first pass should be taken more as a proof-of-principle than a conclusive assessment).

– Some of the conclusions reinforce ideas that have already been in the literature for several years, for example that the majority of the human genome is transcribed (see, e.g., Wong et al. 2000; Wong et al. 2001). The identification of non-protein-coding transcripts, particularly in areas where this was not thought to occur, is novel. But, again, this particular study is based on only 1% of the genome and one should exercise caution in extrapolating it to the entire human genome.

– Other ideas, such that chromatin structure is important in regulation, are also not entirely new, but these data provide interesting new evidence for them.

How much of what was identified is likely to be functional?

– 5% of the genome sequence is conserved across mammals, and for about 60% of this (i.e., 3% of the genome) there is additional evidence of function. This includes the protein-coding exons as well as regulatory elements and other functional sequences. So, at this stage, we have increasingly convincing evidence of function for about 3% of the genome, with another 2% likely to fall into this category as it becomes more thoroughly characterized.

– The authors report the presence of sequences that are not conserved but show experimental (in the genomics sense) evidence of function. There need not be constraint on base pair sequences if merely the presence of non-coding DNA would fill the role independent of what that DNA is. For example, if it is simply a matter of physical spacing or structural arrangement, then it may not matter what the actual sequence of bases were. On the other hand, the authors argue that these elements “may serve as a ‘warehouse’ for natural selection, potentially acting as the source of lineage-specific elements and functionally conserved but non-orthologous elements between species”. Of course, this would be an effect, not a function, because natural selection does not have foresight and cannot maintain elements because they may someday be useful. Also, they suggest that these regions are “neutral”, meaning that they are “biochemically active” but “do not confer a selective advantage or disadvantage to the organism”. If they have no fitness effects then they cannot have a function in the usual sense of the term; however, it could be that their absence would be detrimental, in which case there would be convincing evidence of function of some sort.

– A large fraction of the sequences analyzed, both in introns and intergenic regions, appears to be transcribed. However, most of this DNA is not conserved and there is no clear indication of function. It could be that the transcripts themselves play a functional role or that the process of transcription but not the transcripts per se contributes an important effect. It could be that the regions they examined, which were typically gene-dense, included transcribed introns (no surprise) plus longer-than-expected regulatory regions such as promoters near but outside of genes (e.g., Cooper et al. 2007), but that on the whole the long stretches of non-coding DNA in between genes are not actually transcribed. Or, it could be that transcription in the human genome simply is very inefficient. For example, the data in this study suggest that 19% of pseudogenes in their sample are transcribed, even though by definition they cannot encode a protein and are unlikely to play a regulatory role. It also appears that in other groups, e.g., plants (Wong et al. 2000), there is lots of intergenic DNA that is not transcribed, which may indicate that this is a process unique to mammals and is not typical of eukaryotic genomes.

– Looking at a broader scale, we must bear in mind that about half the human genome consists of transposable elements. Some of these clearly do have functions (e.g., in gene regulation), but others persist as disease-causing mutagens. It could be that a large portion of these have taken on functions, but this remains to be shown. We are also left with the question of why a pufferfish would require only 10% as much non-coding DNA as a human whereas an average salamander needs 10 times more than we do. The well known patterns of genome size diversity make it difficult to explain the presence of all non-coding DNA in functional terms, even as there is growing evidence that a significant portion of non-coding DNA is indeed functionally important.

What does this tell us about the genome’s organization and evolution?

– This work follows the growing trend in which simplistic assumptions about genome form and function are being overturned. Previous examples include the assumption that each gene encodes one protein product and the associated expectation that there would be a relatively large number of genes in our genome. This study deals a blow to the notion that the human genome is organized and regulated in a simple way, and further suggests that our definition of “gene” may need to be expanded.

– This study shows us how far we are from a comprehensive understanding of the human genome, but it also provides some of the tools that will be needed to achieve this goal.

– The authors begin their paper with a conclusion (p. 799): “The human genome is an elegant but cryptic store of information.” Elegant, in the scientific sense, means “concise, simple, succinct”. This does not strike me as an accurate descriptor for such a complex, redundant evolutionary patchwork.

– This study reinforces the notion that the genome is a legitimate level of biological organization with its own complex evolutionary history.

You advise caution in extrapolating the results. Do you think it more likely that the study over- or under-represents the amount of complexity or underappreciated function in the genome? Why, or what other biases would you expect?

The concern is that it may over-estimate the level of function in the genome, given that they specifically selected regions rich in well-characterized genes for at least half the dataset. Of course, the objective of the study is to identify functional elements, so an aggressive approach to the question is warranted in that context. However, they probably considered few sequences that were not associated with genes in some way, such as long stretches of short repeats or transposable elements. The study does suggest that regulation is more complex than we thought, shows some evidence of function for some noncoding DNA, and indicates that lots of noncoding DNA is transcribed, but beyond that it hasn’t really clarified these issues — nor should it be expected to, as this was a pilot project only.

You say the study “deals a blow to the notion that the human genome is organized and regulated in a simple way, and further suggests that our definition of “gene” may need to be expanded.” Do most biologists believe the genome is simply organized and regulated? What’s the dominant view?

I would say that, for obvious pragmatic reasons, people assume that a system is simple until it is shown to be otherwise. At first, it was surprising that genome size is decoupled from organismal complexity. Then it was surprising that genes are split into coding exons and noncoding introns. Then it was surprising that half the human genome is transposable elements. Then it was surprising that there are only 25,000 genes. Now it is surprising that a significant portion of the noncoding DNA is transcribed and that gene regulation is not a simple on-off system but involves interactions – perhaps even networks – of coding and noncoding segments. I wouldn’t want to speak for “most biologists”, but I think overall we are coming to appreciate that less has been figured out about genome function than we first thought. And that is what makes the future of genomic science exciting.

And what further evidence would tell us whether we should redefine “gene”? (E.g., would we need to find disease mutations associated with these chimeric transcripts?)

It depends on what you want the term “gene” to represent. In its original definition, it did not specify “protein-coding exons” (because these were not discovered until decades later), and instead referred to a generalized notion of a genetic “determiner” (according to Johansen 1909 “The word gene is completely free from any hypothesis; it expresses only the evident fact that, in any case, many characteristics of the organism are specified in the germ cells by means of special conditions, foundations, and determiners which are present in unique, separate, and thereby independent ways”). After the rise of molecular genetics in the ‘50s, the focus shifted to individual protein-coding sequences (hence, “one gene, one protein”), though this was expanded to include the intron-exon arrangement after it was described by Gilbert in 1978. Now we see that “units of genetic specification”, or what we might want the term “gene” to describe, can include exons, introns (especially as they play a role in alternative splicing to generate several proteins from one “gene”), regulatory regions, promoters, noncoding RNAs, and other elements. Maybe we need a word to mean “an associated unit of protein-coding exons that specifies a particular set of protein products” and one for “all sequences that are involved in generating a particular set of protein products, including coding, regulation, and associated processes”. One of these could be “gene” but we’d need another term to refer to the other. It may be rendered more complex if some regulatory elements affect multiple coding regions (hence discussion regarding relative contributions of cis vs. trans mechanisms). I think it is becoming clear enough that there is more to it than simply transcribing the stretch of DNA and splicing out the introns without linking changes in non-exonic elements to deleterious effects. So, it’s not so much a requirement of more experimental work to identify disease mutations as a conceptual decision about what we want the word to mean based on the more fundamental discoveries about regulation, protein-coding, and non-protein-coding function.

Overall, I think Minkel did a very good job with this piece — especially given the complex issues being discussed and the input offered by several scientists.


"Because" versus "so that".

I want to make a quick point about how evolution works and how it does not. The reason is that two stories about non-coding DNA posted today include a major misconception about evolution. Unfortunately, this is a misconception attributed in the articles to biologists, so I can only imagine what the state of comprehension is among non-scientists.

The distinction is between “because” and “so that”. In evolution, things evolve “because,” meaning that there are causes and effects that can be identified. Why are some strains of bacteria resistant to antibiotics? Because a mutation that occurred that happened to be beneficial under the conditions of antibiotic treatment became common in the population over the course of several generations. By contrast, things do not evolve “so that”. Bacteria do not experience mutations so that they will become resistant to antibiotic agents.

Why is there so much non-coding DNA? Because transposable elements spread, or because there are accidental duplications that are not eliminated by selection, or because of the interaction of some other mutational processes and their consequences (or lack thereof). So much non-coding DNA did not evolve so that it might someday be useful, or so that it could be coopted when needed, or so that evolution would have more potential in the form of genetic raw materials.

So why, then, do we see quotes like these?

Wired One Scientist’s Junk Is a Creationist’s Treasure:

“I’ve stopped using the term [‘junk’],” Collins said. “Think about it the way you think about stuff you keep in your basement. Stuff you might need some time. Go down, rummage around, pull it out if you might need it.”

Reuters Human instruction book not so simple: studies:

“It is not the sort of clutter that you get rid of without consequences because you might need it. Evolution may need it,” [Collins] said.

That little extra padding might be just what an animal needs to adapt to some unforeseen circumstance, the researchers said. “They may become useful in the future,” Birney said.

The latter quote by Ewan Birney illustrates the problem that can arise when a detailed, nuanced discussion is summarized into a short soundbite. I know this from experience, and I suspect that this is what has happened here, given how his very reasonable interpretation is paraphrased in New Scientist ‘Junk’ DNA makes compulsive reading:

Birney says that the additional switches may be mutations that appear by accident and then generate new slugs of RNA, but because they are produced randomly, most are evolutionarily neutral ‘passengers’ in the genome. There might be rare occasions, however, when a new RNA does confer an advantage.

Collins, on the other hand, seems to have said his bit to two different reporters, so I strain to give him the benefit of the doubt on this one. When I began this blog, I did not think I would be pointing out obvious misconceptions about evolution, genomes, and DNA as propagated by the likes of Collins or Nature. But here we are.