Junk DNA gets Wired.

Posted on June 13, 2007 by T. Ryan Gregory

There is a new article on the Wired website about junk DNA [One Scientist’s Junk Is a Creationist’s Treasure]. I make a very brief appearance in it, and I just want to clarify what I meant by the statement cited (I’m still learning that even an hour-long interview might result in only a short blurb).

My quote is “Function at the organism level is something that requires evidence”. I make this statement because there are several different sorts of DNA sequences in the genome whose presence can be explained even if they do not benefit (and indeed, even if they slightly harm) the organism carrying them. Pseudogenes, satellite DNA, transposable elements (45% of our genome), and other non-coding sequences may or may not be functional — that requires evidence — and some may exist as a result of accidental duplication or even due to selection at the level of the elements themselves (by “intragenomic selection”). The old assumption that all non-coding DNA must be beneficial to the organism or it would have been deleted by now ignores genome-specific processes by which non-coding DNA evolves.

As I have discussed previously, both hardcore adaptationists (if any exist anymore) and creationists have a vested interest in having all non-coding DNA be functional. I believe that real-world variability in genome size argues strongly against such a prospect, but of course it is possible, and this is the point that people like Ohno, Doolittle, Orgel, and Crick made in the 1980s. The important point is that yes, some non-coding DNA is functional at the organism level (as opposed to existing for its own sake or because there is no strong selection against it). And certainly, non-coding DNA has effects at the organism level. But current evidence suggests that about 5% of the human genome is functional, and even the least conservative ENCODE participants (whose primary, and important, objective is to identify the functional elements and their features) are betting that 20% is functional.

In the end, it is obvious that non-coding DNA is the product of evolution whether it all turns out to be functional or not. The cases in which former parasites (transposons) have taken on function at the organism level are a perfect illustration of cooption, which is the same basic process that allows explanations for the evolution of complex structures like eyes or flagella. The research into function of non-coding DNA, which the creationists are eager to cite, can be carried out only under an evolutionary framework — it is meaningless to talk about “conserved non-coding DNA sequences” otherwise.

Finally, let me say one thing about Francis Collins’s quote: “Think about it the way you think about stuff you keep in your basement. Stuff you might need some time. Go down, rummage around, pull it out if you might need it.” With all due respect (which is considerable, given his contribution to the Human Genome Project), it makes no sense to explain the existence of non-coding DNA because it might someday prove useful. Evolution does not work that way. Elements might be coopted, but maintaining this option explains neither the origin nor the persistence of non-coding sequences.

As to what the creationists have to say, well, I leave that to others with more (or less?) patience to attend to.

____________

Updates:

Larry Moran takes the article apart in his inimitable way.
PZ Myers is likewise unimpressed.
Catherine Shaffer holds her own in the comments at Wired.
Larry posts his reply.
Steve Reuland on Panda’s Thumb clarifies some important issues. See also here.

Decoding the blueprint. Sigh.

Posted on June 13, 2007 by T. Ryan Gregory

The results of the proof-of-principle phase of ENCODE, the Encyclopedia of DNA Elements Project, appear in the June 14 issue of Nature. It’s a very interesting project, and it has revealed a few more surprises (or at least, added evidence in favour of previously surprising observations). I will probably post more about it soon, but for the time being let me just offer a brief apology to the science writers out there whom I have given a hard time about invoking sloppy language to describe non-coding DNA, sequencing, and genomes (recent example, but one I will leave alone, ‘Junk’ DNA makes compulsive reading online at New Scientist).

The reason I am sorry is that I simply cannot hold you to a higher standard than is maintained by one of the most prestigious journals on planet Earth. You see, Nature has decided to depict the ENCODE project on the cover as “Decoding the Blueprint”. Needless to say (again), genomes are not blueprints (as the ENCODE project shows!) and no one is decoding anything at this point.

I have said all this before, and even I am getting tired of my complaints about it. Thus, I will focus only on the interesting science in a later post.

Sigh.

Two-for-one misconceptions about genomes from the New York Times.

Posted on June 1, 2007 by T. Ryan Gregory

To date, two identified human beings have had their genomes sequenced: J. Craig Venter and James D. Watson. Venter’s was completed in draft form in 2001 and the final version was completed recently. Watson received his genome sequence on disk (a hard drive, not a DVD as reported) from Jonathan Rothberg, founder of 454 Life Sciences, at Baylor College of Medicine yesterday. You can watch the presentation here .

The notion that individual people can have their genomes sequenced (still for about $2 million, but the cost will fall precipitously in the future) is sure to elicit some interesting discussions about medical applications, ethical implications, and intriguing research into human variation. Certainly, the completion of Watson’s genome sequence has already gained media attention. Unfortunately, the same old catchphrases and errors abound. Apparently, even the mighty combined forces of Genomicron, Evolgen, and Sandwalk are insufficient to stop this.

Today, both RPM of Evolgen and Jonathan Badger at T. taxus take aim at the New York Times, who not only confuse sequencing with “deciphering”, but think that Watson discovered DNA in 1953 (Genome of DNA Discoverer Is Deciphered by Nicholas Wade).

To clarify, DNA (“nuclein”) was discovered by Friedrich Miescher in 1869. Watson and Crick elucidated the double helix structure of DNA in the 1950s, based on the results of decades of work on the chemical properties of the molecule by a large number of researchers.

I give full credit to Watson and Crick for their monumental contribution, which rightly garnered them the 1962 Nobel Prize. But credit is also due to Miescher and the countless others whose work was integral to the subsequent rise of molecular genetics and genome sequencing.

Here are two headlines announcing the same story, one inaccurate and the other fine:

Genome of DNA Discoverer is Deciphered (New York Times)

Nobel Laureate James Watson Receives Personal Genome (ScienceDaily)

Is one less catchy than the other? It seems to me that getting the history and the science right would be relatively simple and would only add to the strength of a story.

____________

Updates:

The Genetic Genealogist mentions the story and argues that Nicholas Wade may not be responsible for the headline. Fair enough — my criticism is about the entire presentation, whether that be the fault of the author, editor, or other. It does bear noting, however, that Wade has used this terminology several times previously, including describing it in the main text as the “project to sequence, or decode, the genome.”

Sandwalk has opened a discussion about whether readers would (or, like Larry, would not) want to have their genomes sequenced.

DNADirectTalk repeats the standard inaccuracies.

I don’t think we’re going to be rid of the “decoding” analogy any time soon, especially since sequencers themselves use it. Venter has a book coming out in October, with the unfortunate title A Life Decoded: My Genome: My Life. (Wouldn’t The Sequence of My Life or My Life’s Sequence have been catchier anyway?). The US Department of Energy (which financed much of the Human Genome Project) still has it on their website Human Genome Research: Decoding DNA also. To be fair to science writers, we can’t hold them to a higher standard of terminological accuracy than applies to scientists. In other words, we need to clean it up on our side first and then, hopefully, reporters will follow our lead.

Science’s "big questions"

Posted on May 28, 2007 by T. Ryan Gregory

The July 1^st, 2005, issue of Science included a list of 25 “hard questions” in celebration of the 125th anniversary of the journal, one of the most prestigious on the planet. (A more detailed discussion is currently underway at Sandwalk — I thought I would throw in my 2 cents).

Here is how they came up with the list, as described by Donald Kennedy and Colin Norman in the introductory article “What Don’t We Know?“:

We began by asking Science‘s Senior Editorial Board, our Board of Reviewing Editors, and our own editors and writers to suggest questions that point to critical knowledge gaps. The ground rules: Scientists should have a good shot at answering the questions over the next 25 years, or they should at least know how to go about answering them. We intended simply to choose 25 of these suggestions and turn them into a survey of the big questions facing science. But when a group of editors and writers sat down to select those big questions, we quickly realized that 25 simply wouldn’t convey the grand sweep of cutting-edge research that lies behind the responses we received. So we have ended up with 125 questions, a fitting number for Science‘s 125th anniversary.

[…]

We selected 25 of the 125 questions to highlight based on several criteria: how fundamental they are, how broad-ranging, and whether their solutions will impact other scientific disciplines. Some have few immediate practical implications–the composition of the universe, for example. Others we chose because the answers will have enormous societal impact–whether an effective HIV vaccine is feasible, or how much the carbon dioxide we are pumping into the atmosphere will warm our planet, for example. Some, such as the nature of dark energy, have come to prominence only recently; others, such as the mechanism behind limb regeneration in amphibians, have intrigued scientists for more than a century. We listed the 25 highlighted questions in no special order, but we did group the 100 additional questions roughly by discipline.

The questions are:

What Is the Universe Made Of?
Charles Seife

What Is the Biological Basis of Consciousness?
Greg Miller

Why Do Humans Have So Few Genes?
Elizabeth Pennisi

To What Extent Are Genetic Variation and Personal Health Linked?
Jennifer Couzin

Can the Laws of Physics Be Unified?
Charles Seife

How Much Can Human Life Span Be Extended?
Jennifer Couzin

What Controls Organ Regeneration?
R. John Davenport

How Can a Skin Cell Become a Nerve Cell?
Gretchen Vogel

How Does a Single Somatic Cell Become a Whole Plant?
Gretchen Vogel

How Does Earth’s Interior Work?
Richard A. Kerr

Are We Alone in the Universe?
Richard A. Kerr

How and Where Did Life on Earth Arise?
Carl Zimmer

What Determines Species Diversity?
Elizabeth Pennisi

What Genetic Changes Made Us Uniquely Human?
Elizabeth Culotta

How Are Memories Stored and Retrieved?
Greg Miller

How Did Cooperative Behavior Evolve?
Elizabeth Pennisi

How Will Big Pictures Emerge From a Sea of Biological Data?
Elizabeth Pennisi

How Far Can We Push Chemical Self-Assembly?
Robert F. Service

What Are the Limits of Conventional Computing?
Charles Seife

Can We Selectively Shut Off Immune Responses?
Jon Cohen

Do Deeper Principles Underlie Quantum Uncertainty and Nonlocality?
Charles Seife

Is an Effective HIV Vaccine Feasible?
Jon Cohen

How Hot Will the Greenhouse World Be?
Richard A. Kerr

What Can Replace Cheap Oil–and When?
Richard A. Kerr and Robert F. Service

Will Malthus Continue to Be Wrong?
Erik Stokstad

(The other 100 questions are listed here).

Overall, I think there are some good questions in there. I am also happy to see so many of them being about biology, given my interests. However, in some ways the list is rather disappointing. Many of the questions are simply about technology and not science as I understand the term. Others are simply “wait and see” types of questions that involve only continued measurements and no real innovations. Some are “wait and see” questions about technology, in fact.

A more significant point relates to the packaging. Several of the questions are indeed intriguing, but they are not big enough because they focus on one species. Here’s what I mean:

“Why Do Humans Have So Few Genes?”
Humans have what will have to be recognized as the standard amount of genes for a mammal. The real question is, how do those genes relate to proteomes (e.g., do most genes encode multiple proteins, for example via alternative splicing?), how do genes interact, how are genes regulated, and how did the regulatory systems evolve. All very important questions. The fact that humans have what was a somewhat surprisingly low gene number is very much secondary to all of this, and there is really nothing particularly relevant about humans having this number rather than, say, a cow.

“How Much Can Human Life Span Be Extended?”
The real issue is, what are the inherent limitations on longevity? This applies to all animals, not just humans, and can be studied from a variety of perspectives, including evolutionary theories of senescence, physiological work involving DNA damage (e.g., by oxygen radicals), links with diet, and genetic input. As phrased, it is a technology question, but viewed more broadly it is a very interesting and active area of research touching on multiple fields.

“How Can a Skin Cell Become a Nerve Cell?”
The real question is about how stem cells become specialized somatic cells in general, which is a fundamental question in developmental biology that happens to have major medical implications. The “biological alchemy” described in the article comes after the key processes are understood, and of course these insights extend well outside the boundaries of our own species and are relevant to the evolution of morphology in multicellular organisms as a whole.

In general, I think this is a useful exercise and it is always interesting to get people to think about the big picture in science. I just wish the picture were a little more scientific and a little bigger in this particular case.

________________

Incidentally, I am glad to see that two questions that relate to my work made it on the secondary list of 100 questions. So glad, in fact, that I won’t even complain about anything they say, though readers can likely guess what my comments might be…

Why are some genomes really big and others quite compact?
The puffer fish genome is 400 million bases; one lungfish’s is 133 billion bases long. Repetitive and duplicated DNA don’t explain why this and other size differences exist.

What is all that “junk” doing in our genomes?
DNA between genes is proving important for genome function and the evolution of new species. Comparative sequencing, microarray studies, and lab work are helping genomicists find a multitude of genetic gems amid the junk.

Give us the title!

Posted on May 27, 2007 by T. Ryan Gregory

I have made this point in passing before, but I will reiterate it in its own post in the vague hope that science writers will get the message (or perhaps that other bloggers will pick up the issue).

When you write a story about a recent discovery, whether for a magazine, an online news service, or a blog, please give us the title and as much other information as you can about the article so that we can look up the original. A footnote at the end would go a long way. Online, there are no constraints on page space, so this should be straightforward to implement.

I am getting frustrated with the usual “… which will appear this week in Nature” or “… to be published online in the next issue of PNAS”. Don’t you know that this almost instantly becomes dated and uninformative? Are you unaware that readers may come across your story even years from now, and that it is a substantial pain to go from the date of your entry and try to find which paper came out in the online pre-publication version of the journal shortly thereafter? Yes, we can search author names, but we shouldn’t have to.

Maybe this is not the writers’ fault. It could be that journals provide only summaries to writers without any information on the actual reference. However, many stories in the larger media include interviews with the author(s). Maybe the last question could be “Hey, what’s your paper called?”. And to press offices: you should give the title too.

I am sure that many, maybe most, readers of science news stories do not look up the original article. But some of us do. In this sense, having the summary stories serving as a gateway to the actual paper would be helpful.

Cracking the code?

Posted on May 27, 2007 by T. Ryan Gregory

I was at a scientific conference last week, and am only now catching up on email, journal publications, and science news. In the case of the latter, I am noticing a striking resurgence (or maybe just persistence) of the description of genome sequencing and analysis as “cracking the genetic code”. Sigh.

I can understand the attraction to the analogy, in that a great deal remains to be done before we will have anything approaching a comprehensive understanding of how complex phenotypes are generated from the combined influence of genes, their regulation and interaction, and the environment. However, the “genetic code” has a specific meaning in science, and it was “cracked” in the 1960s.

[Based on an astute comment by RPM of Evolgen, it bears updating this post to include “mapping the genome” as an equally inappropriate description of modern comparative genomics. Genetic mapping, in proper terms, refers to identifying the relative proximities of genes on chromosomes as first accomplished in 1913 by Alfred Sturtevant, a student of Nobel Prize-winner Thomas Hunt Morgan in the famous Fly Room at Columbia University. Physical mapping, which is the identification of physical locations of genes on chromosomes, remains an important component in genomics, but this generally is not what journalists are referring to when they use the term; what they mean is sequencing.]

I think this is another symptom of too much journalism and not enough science in science journalism. Instead of resorting to the standard catchphrases and clichÃ©s, why not introduce your readers to some accurate terms and concepts with which they may not be familiar? You can catch the interest of readers and educate them on the basics rather than appealing to their misconceptions or lack of prior knowledge.

DNA Barcoding on Canada AM.

Posted on May 16, 2007 by T. Ryan Gregory

Prof. Paul Hebert of the University of Guelph was featured on a recent segment of CTV’s Canada AM in their Canadians on the Cutting Edge series. Click here to view the segment. There has been a massive amount of press coverage of DNA barcoding over the past few years, and you can see more at the Canadian Centre for DNA Barcodi ng.

Suggestions for science writers.

Posted on May 10, 2007 by T. Ryan Gregory

In an earlier post, I expressed some frustration at the way discoveries about non-coding DNA are reported. I noted in particular ScienceDaily‘s description of the recent publication of the opossum genome sequence. In case you missed it, here it is again:

Opossum Genome Shows ‘Junk’ DNA Source Of Genetic Innovation

(…)

The research, released Wednesday (May 9) also illustrated a mechanism for those regulatory changes. It showed that an important source of genetic innovation comes from bits of DNA, called transposons, that make up roughly half of our genome and that were previously thought to be genetic “junk.”

The research shows that this so-called junk DNA is anything but, and that it instead can help drive evolution by moving between chromosomes, turning genes on and off in new ways.

(…)

It had been initially thought that most of a creature’s DNA was made up of protein-coding genes and that a relatively small part of the DNA was made up of regulatory portions that tell the rest when to turn on and off.

As studies of mammalian genomes advanced, however, it became apparent that that view was incorrect. The regulatory part of the genome was two to three times larger than the portion that actually held the instructions for individual proteins.

Since my post, National Geographic has gotten in on the act as well:

First Decoded Marsupial Genome Reveals “Junk DNA” Surprise

(…)

The study reveals a surprising role in human evolution for “jumping genes”â€”parasitic bits of “junk DNA” that until now were thought to be nothing more than a nuisanceâ€”and may also lead to a number of medical breakthroughs.

(…)

The scientists were also surprised to find that these regulatory sequences have in large part been distributed across the human genome by so-called jumping genes.

These genes have hopped through chromosomes for more than a billion years, leaving behind many copies of themselves. So until now the genes had been widely regarded by scientists as parasites, or “junk DNA,” that played no creative role in evolution.

You can consult my earlier posts for specific complaints on this. More generally, I have the following suggestions for science writers who are reporting on interesting findings about non-coding DNA.

1) Don’t assume that every new discovery is overthrowing some recalcitrant conventional wisdom.

If you want to claim that all scientists have long believed that all non-coding DNA is totally functionless, kindly point to a few examples. Here are a few cases that suggest that you may have a bit of trouble with this.

When Barbara McClintock first characterized transposable elements in 1950, she called them “controlling elements”. Comings (1972), who gave the first detailed discussion of “junk DNA” (his paper, unlike Ohno’s, was an explicit discussion of the topic of “junk DNA” and appeared in print before Ohno [1972], which he cites as “in press”), stated that “being junk doesnâ€™t mean it is entirely useless.”

Orgel and Crick (1980), in their paper introducing the concept of “selfish DNA”, noted very clearly that:

It would be surprising if the host organism did not occasionally find some use for particular selfish DNA sequences, especially if there were many different sequences widely distributed over the chromosomes. One obvious use … would be for control purposes at one level or another. This seems more than plausible.

Doolittle and Sapienza (1980), whose paper appeared along with that of Orgel and Crick (1980), were equally unambiguous on the issue:

We do not deny that prokaryotic transposable elements or repetitive and unique-sequence DNAs not coding for proteins in eukaryotes may have roles of immediate phenotypic benefit to the organism. Nor do we deny roles for these elements in the evolutionary process. We do question the almost automatic invocation of such roles for DNAs whose function is not obvious, when another and perhaps simpler explanation for their origin and maintenance is possible.

2) Don’t imply, intimate, or suggest, directly or indirectly, that the discovery of function in some non-coding DNA sequences means that all non-coding DNA is functional.

Remember to implement the onion test if you are tempted to argue otherwise. Note that simply having “junk DNA found to be functional” as a headline with no qualification or clarification commits the fallacy as well. The last two examples of poor reporting (see here and here) have neglected to mention that the amount of non-coding DNA that was shown to be conserved and presumably functional is less than 5% of the genome. I imagine that a reader’s interpretation may change somewhat when this important detail is made clear.

I am as excited as anyone about new discoveries in genome biology. I have also been critical of the tendency to focus too much on protein-coding genes or simple allele frequency changes in evolutionary science (Gregory 2005). But it does not follow that every new finding is revolutionary in and of itself, nor is it the case that non-coding DNA has been dismissed as unimportant for decades and that its relevance is only now being admitted by stubborn academics. The commentaries of people like Comings, Ohno, Orgel and Crick, and Doolittle and Sapienza were made in response to an overemphasis on functional explanations for all non-coding DNA, but even they did not reject the potential importance of some non-coding elements.

There is a growing frustration among scientists relating to the unnecessary search for “balance “ in journalists’ reporting. What I see happening with non-coding DNA is the opposite of this, though equally problematic. To wit, many writers are painting a monochromatic picture of genome biologists when in fact there has always been a full spectrum of opinions regarding the importance of non-coding sequences. The material is exciting; it doesn’t need to be embellished with exaggerated controversy to be worth reading about.

___________

References

Comings, D.E. 1972. The structure and function of chromatin. Advances in Human Genetics 3: 237-431.

Doolittle, W.F. and C. Sapienza. 1980. Selfish genes, the phenotype paradigm and genome evolution. Nature 284: 601-603.

Gregory, T.R. 2005. Macroevolution and the genome. In The Evolution of the Genome (ed. T.R. Gregory), pp. 679-729. Elsevier, San Diego.

McClintock, B. 1950. The origin and behavior of mutable loci in maize. Proceedings of the National Academy of Sciences of the USA 36: 344-355.

Ohno, S. 1972. So much “junk” DNA in our genome. In Evolution of Genetic Systems (ed. H.H. Smith), pp. 366-370. Gordon and Breach, New York.

Orgel, L.E. and F.H.C. Crick. 1980. Selfish DNA: the ultimate parasite. Nature 284: 604-607.

Non-coding DNA and the opossum genome.

Posted on May 9, 2007 by T. Ryan Gregory

The genome sequence of the gray short-tailed opossum, Monodelphis domestica, was published in today’s issue of Nature (Mikkelsen et al. 2007). It is interesting for many reasons, including its status as the first marsupial genome to be sequenced, its relatively large genome size, and low chromosome number (2n = 18). It is also interesting because it contains a similar number of genes (18,000 – 20,000) to humans, the vast majority of which exhibit close associations with the genes of placental mammals. Also, in keeping with the hypothesis that transposable elements are the dominant type of DNA in most eukaryotic genomes, the comparatively large opossum genome is comprised of 52% transposable elements, the most for any amniote sequenced so far.

One of the most intriguing discoveries about the opossum genome is that changes to protein-coding genes seem not to have been the driving force behind mammalian diversification. Instead, non-coding elements with regulatory functions — mostly derived from formerly parasitic transposable elements — appear to underly much of the difference.

Now, I would prefer to just talk about the science here, noting that this is yet another great example of the complex nature of genome evolution, the key role played by “non-standard” genetic processes (Gregory 2005), and the ever-increasing relevance of non-coding DNA in genomics. But, inevitably, I must comment on how this discovery has been reported. Here is what ScienceDaily (which I otherwise like a great deal) said about it:

Opossum Genome Shows ‘Junk’ DNA Source Of Genetic Innovation

(…)

The research, released Wednesday (May 9) also illustrated a mechanism for those regulatory changes. It showed that an important source of genetic innovation comes from bits of DNA, called transposons, that make up roughly half of our genome and that were previously thought to be genetic “junk.”

The research shows that this so-called junk DNA is anything but, and that it instead can help drive evolution by moving between chromosomes, turning genes on and off in new ways.

(…)

It had been initially thought that most of a creature’s DNA was made up of protein-coding genes and that a relatively small part of the DNA was made up of regulatory portions that tell the rest when to turn on and off.

As studies of mammalian genomes advanced, however, it became apparent that that view was incorrect. The regulatory part of the genome was two to three times larger than the portion that actually held the instructions for individual proteins.

I will just reiterate two brief points, as I have already dealt with some of these topics in earlier posts (and will undoubtedly have to do so again in the future). One, very few people have actually argued that all non-coding DNA is 100% functionlesss “junk”, and no one is surprised anymore when a regulatory or other function is observed for some non-coding DNA sequences. Moreover, transposable elements are more commonly labeled as “selfish DNA”, and it has been noted in countless articles that they can and do take on functions at the organism level even if they begin as parasites at the genome level. Two, yet again we are talking about a small portion of the genome such that this should not be considered a demonstration that all non-coding DNA is functional. In particular, the authors identified about 104 million base pairs of DNA that is conserved (i.e., shared and mostly invariant) among mammals, about 29% of which overlapped with protein-coding genes. In other words, about 74 million base pairs of non-coding DNA, much of it derived from former transposable elements, is found to be conserved among mammals and shows signs of being functional in regulation. The genome size of the opossum is probably around 3,500 million bases, which means that this functional non-coding DNA makes up 2% of the genome.

A note to science writers. There is nothing surprising about some sequences of non-coding DNA having an important function. The notion that all non-coding DNA has long been assumed to be completely functionless junk is a straw man. And to avoid misleading readers, you really need to specify that most examples of non-coding DNA with a function represent a very small portion of the total genome.

___________

References

Gregory, T.R. 2005. Macroevolution and the genome. In The Evolution of the Genome (ed. T.R. Gregory), pp. 679-729. Elsevier, San Diego.

Mikkelsen, T.S., M.J. Wakefield, B. Aken, C.T. Amemiya, J.L. Chang, S. Duke, M. Garber, A.J. Gentles, L. Goodstadt, A. Heger, J. Jurka, M. Kamal, E. Mauceli, S.M.J. Searle, T. Sharpe, M.L. Baker, M.A. Batzer, P.V. Benos, K. Belov, M. Clamp, A. Cook, J. Cuff, R. Das, L. Davidow, J.E. Deakin, M.J. Fazzari, J.L. Glass, M. Grabherr, J.M. Greally, W. Gu, T.A. Hore, G.A. Huttley, M. Kleber, R.L. Jirtle, E. Koina, J.T. Lee, S. Mahony, M.A. Marra, R.D. Miller, R.D. Nicholls, M. Oda, A.T. Papenfuss, Z.E. Parra, D.D. Pollock, D.A. Ray, J.E. Schein, T.P. Speed, K. Thompson, J.L. VandeBerg, C.M. Wade, J.A. Walker, P.D. Waters, C. Webber, J.R. Weidman, X. Xie, M.C. Zody, J.A.M. Graves, C.P. Ponting, M. Breen, P.B. Samollow, E.S. Lander, and K. Lindblad-Toh. 2007. Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences. Nature 447: 167-177.

Junctional DNA.

Posted on April 25, 2007 by T. Ryan Gregory

JR Minkel at the Scientific American blog has responded to the post on Evolgen about his earlier story regarding “junk DNA” (did you catch all that?). At the end of the post, he asks:

Scientists and scientist bloggers: Again, do you care [if journalists call it junk DNA]? If so, what term would you propose instead, or how would you make the distinction between functional and nonfunctional noncoding DNA clear to a popular audience?

Yes, I care, and here are my suggestions. If you mean the general category without any speculation either way about function, then it is simply and accurately “noncoding DNA”. If it has a function, then you specify what that function is: “regulatory DNA” or “structural DNA” or what have you. If the type of sequence is known, then you can use that as well or instead: “transposable elements” or “mobile DNA” or “pseudogenes” or “introns”. Maybe readers won’t know what those terms mean. This is a good opportunity to inform them.

What is missing is a term to describe a given collection of noncoding DNA for which there is thought to be some function, but for which that function and/or the type of sequence is unknown. This would reside somewhere between “junk DNA” (in the vernacular sense) and “functional DNA” (to which specific names can be applied). I therefore suggest the neologism “junctional DNA” to encompass this category. Note that Petsko (2003) suggested “funk DNA” to represent “functionally unknown DNA”, but I think “junctional DNA” is a little less, uh, funky.

Let me be even more specific. The proposed term “junctional DNA” derives from a dual etymology: 1) a simple portmanteau of â€œjunkâ€ and â€œfunctionalâ€; 2) an indication that the sequences so described reside at the crossroads between DNA with no evident function and that with a clear function.

Two terms in one day — “the onion test” and “junctional DNA” — how ’bout that.

Incidentally, my annoyance with such reports has less to do with the terminology than with the fact that the highly conserved sequences in question make up about 5% of the total genome. To jump from this to imply that all noncoding DNA is recognized as functional is inappropriate and misleading. I also wish they would cite the source papers they reference; some of us would like to look up the primary material when we see a summary in a news story.

_______________

Update: Other bloggers (RPM of Evolgen in personal correspondence, Sandwalk) seem to think this term is not needed. I point out that this post was given in direct response to Minkel’s appeal for a term that would “make the distinction between functional and nonfunctional noncoding DNA clear to a popular audience”. In light of the fact that a journalist sees the need for such a term, and that it was coined in response to that need, I think ‘junctional DNA’ could be a useful term.