Quotes of interest — Nobel Prize special edition.

Posted on February 15, 2008 by T. Ryan Gregory

The story we have been told by creationists and neo-Panglossian scientists is that most if not all noncoding DNA is functional and that this fact has been obscured by long neglect in the scientific community of the potential importance of noncoding elements. In particular, the “junk DNA” and “selfish DNA” ideas put forth in the 1970s and 1980s are suggested to have stifled interest in the possible biological and medical importance of noncoding sequences, which have long been dismissed as irrelevant. The question is, did the scientific community turn its back on researchers interested in the roles of noncoding elements after 1980?

1983 Nobel Prize in Physiology or Medicine
to Barbara McClintock

For her discovery of
mobile genetic elements
[transposable elements]

Barbara McClintock discovered mobile genetic elements in plants more than 30 years ago. The discovery was made at a time when the genetic code and the structure of the DNA double helix were not yet known. It is only during the last ten years that the biological and medical significance of mobile genetic elements has become apparent. This type of element has now been found in microorganisms, insects, animals and man, and has been demonstrated to have important functions.

…

Such elements were also found to have an important function in the ability of unicellular parasites (trypanosomes) to change their surface properties, thereby avoiding the immune response of the host organism. Recombination of DNA segments proved to be an essential factor in the ability of lymphoid cells to produce a seemingly infinite number of different antibodies to foreign substances. In recent years, evidence has accumulated that transposition of genes or incomplete genes are involved in the transformation of normal cells into tumour cells. Thus, genes controlling cell growth have been found to undergo translocation from chromosome to another during cancerogenesis. The initial discovery of mobile genetic elements by Barbara McClintock is of great medical and biological significance. It has also resulted in new perspectives on how genes are formed and how they change during evolution.

http://nobelprize.org/nobel_prizes/medicine/laureates/1983/press.html

1993 Nobel Prize in Physiology or Medicine
to Richard J. Roberts and Phillip A. Sharp

For their discovery of split genes
[introns and exons]

Roberts’ and Sharp’s discovery has changed our view on how genes in higher organisms develop during evolution. The discovery also led to the prediction of a new genetic process, namely that of splicing, which is essential for expressing the genetic information. The discovery of split genes has been of fundamental importance for today’s basic research in biology, as well as for more medically oriented research concerning the development of cancer and other diseases.

…

As a consequence of the discovery that genes are often split, it seems likely that higher organisms in addition to undergoing mutations may utilize another mechanism to speed up evolution: rearrangement (or shuffling) of gene segments to new functional units. This can take place in the germ cells through crossing-over during pairing of chromosomes. This hypothesis seems even more attractive following the discovery that individual exons in several cases correspond to building modules in proteins, so-called domains, to which specific functions can be attributed. An exon in the genome would thus correspond to a particular subfunction in the protein and the rearrangement of exons could result in a new combination of subfunctions in a protein. This kind of process could drive evolution considerably by rearranging modules with specific functions.

http://nobelprize.org/nobel_prizes/medicine/laureates/1993/press.html

____________

Part of the Quotes of interest series.

Quotes of interest — long neglected, some noncoding DNA is actually functional.

Posted on February 15, 2008 by T. Ryan Gregory

I have started a series listing quotes from papers published during the supposed period of neglect of noncoding DNA that, we are told repeatedly by authors of various persuasions, was inspired by the “junk DNA” and “selfish DNA” ideas. For this installment, I want to quote at length from one article which represents a typical discussion of some eukaryotic “junk DNA” turning out to have functions. This is the sort of thing we see regularly in the media and in the scientific literature, so a single example should be sufficient.

The protein-coding portions of the genes account for only about 3% of the DNA in the human genome; the other 97% encodes no proteins. Most of this enormous, silent genetic majority has long been thought to have no real function — hence its name: “junk DNA”. But one researcher’s trash is another researcher’s treasure, and a growing number of scientists believe that hidden in the junk DNA are intellectual riches that will lead to a better understanding of diseases (possibly including cancer), normal genome repair and regulation, and perhaps even the evolution of multicellular organisms.
Rather than the genes, junk DNA “is actually the challenge right now,” says Eric Lander of the Massachusetts Institute of Technology, who is himself a prominent Human Genome Project researcher. And in rising to meet that challenge, geneticists are beginning to formulate a new view of the genome. Rather than being considered a catalogue of useful genes interspersed with useless junk, each chromosome is beginning to be viewed as a complex “information organelle,” replete with sophisticated maintenance and control systems — some embedded in what was thought to be mere waste.

…

…when geneticists started studying complex, multicellular organisms, it was easy to dismiss the vast reaches of non-protein-coding DNA as a wasteland. Now, however, that notion is being overturned as researchers find that junk DNA is not a single midden heap, but a complex mix of different types of DNA, many of which are vital to the life of the cell.

…

Some of the earliest indications that junk DNA might have important functions came from studies on gene control. Those studies found that genes have regulatory sequences, short segments of DNA that serve as targets for the “transcription factors” that activate genes. Many of the regulatory sequences lie outside the protein-coding sequences — in the genetic garbage can. “There’s at least five regulatory elements for each [human] gene, probably many more,” says gene control expert Robert Tjian of the University of California, Berkeley. “For a long time it wasn’t appreciated how widespread those elements can be, but now it seems that patches of really important regulatory elements can be buried among the junk DNA.”

…

Now, however, it appears that some repetitive sequences may contain stretches of DNA needed for gene regulation. What is more, the function of these stretches must be significant, because if their sequences go astray they may result in cancer.

…

But housing sequences that control the genes isn’t the only role that so-called genetic trash plays. Some repetitive sequences also seem to have a crucial function in maintaining the structure of the genome.

…

Thus, in a dramatic reversal, the repetitive sequences, once thought to be the epitome of genetic debris, now seem to be needed to maintain the integrity of the chromosomes. But the repetitive sequences aren’t the only forms of genetic garbage moving up in the world. Whereas the repetitive sequences are usually found outside genes, a second type of genetic junk, the introns, are scattered through the genes of higher organisms.

…

Koop and Hood have found that the DNA of the T cell receptor complex, a crucial immune system protein, shows 71% identity between humans and mice. That finding is startling, since only 6% of the DNA encodes the actual protein sequence, while the rest consists of introns and noncoding regions. “[The finding] certainly questions the assumption that introns are junk,” says Koop. Instead, he says, “it fits the view that chromosomes are information organelles that carry out a variety of functions besides encoding genes, such as maintenance of genome structure and gene regulation.”
That opinion appeals to John Mattick, a molecular biologist at the University of Queensland in Australia, currently on sabbatical at Cambridge University in England. Mattick has proposed that introns provide a previously unsuspected system for regulating gene expression.

…

“[Mattick’s] idea is very interesting indeed,” says evolutionary geneticist Laurence Hurst of Cambridge University, England. “And it’s perfectly testable.” For example, he says, Mattick’s model predicts that certain genes, like regulatory developmental genes, that must be finely controlled, will likely bear intron-encoded regulatory RNAs.

…

“There’s too many cases of odd RNAs,” says molecular geneticist Marvin Wickens of the University of Wisconsin, Madison. “It smells like there might be a whole family of regulatory RNAs.” And if that suspicion proves correct, it would be a big boost for Mattick’s new theory, as well as for the status of junk DNA — a status that is likely to keep on rising over the next couple of years. Enough gems have already been uncovered in the genetic midden to show that what was once thought to be waste is definitely being transmuted into scientific gold.

You may be curious, out of all the discussions like this that are being published, why would this be the one that is singled out?

For a simple reason: it was written 14 years ago.

Nowak, R. 1994. Mining treasures from ‘junk DNA’. Science 263: 608-610.

I will talk about the timeline of the “junk DNA” discussion more comprehensively later, but here is what we can tell so far. The term “junk DNA” was coined by Ohno (1972), and in the first detailed discussion of the topic (Comings 1972), the likelihood that some noncoding DNA would be functional was explicitly noted. In any case, Ohno (1972) seems not to have had much influence during the first decade after he coined the term, because in 1980, when “selfish DNA” was introduced, the overwhelming tendency was to assume that all noncoding DNA was present because it was adaptive — this is why Orgel and Crick (1980) and Doolittle and Sapienza (1980) wrote their papers. There was strong resistance to the idea of selfish DNA for at least the first few years after the idea was proposed (Doolittle 1982), and even in the late 1980s there was at most discussion about how much noncoding DNA might be parasitic versus functional. Keep in mind also that DNA sequencing did not become a common method until the late 1970s/early 1980s, and that introns weren’t even discovered until 1977, and then much of the study focused on seeing how abundant they were and on their origin (were they present from the beginning, or did they arise only among eukaryotes?). The term “pseudogene” was coined in 1977 as well. So, the kind of work that people expect when they say detailed functional research wasn’t done could not have started until the 1980s in any case, and in fact there was abundant research investigating possible roles of satellite DNA, introns, and transposable elements during that decade. By the early 1990s, people had begun proposing additional functions for noncoding DNA, including Mattick’s idea about regulatory RNA sequences.

In other words, there was no real period in which noncoding DNA was dismissed by the scientific community, though there was a much-needed shift away from strictly adaptive interpretations in the 1980s. Some individual researchers ignored noncoding regions, but there is no gap in the literature other than limits on what could be done in a methodological capacity. The “new” view of noncoding DNA as potentially important has been proclaimed regularly for at least as long as the claimed period of neglect between 1980 and 1994.

One wonders just how long we will be told that we have long been neglecting noncoding DNA.

________

Part of the Quotes of interest series.
________

Comings, D.E. 1972. The structure and function of chromatin. Advances in Human Genetics 3: 237-431.

Doolittle, W.F. and C. Sapienza. 1980. Selfish genes, the phenotype paradigm and genome evolution. Nature 284: 601-603.

Doolittle, W.F. 1982. Selfish DNA after fourteen months. In Genome Evolution (eds. G.A. Dover and R.B. Flavell), pp. 3-28. Academic Press, New York.

Ohno, S. 1972. So much “junk” DNA in our genome. In Evolution of Genetic Systems (ed. H.H. Smith), pp. 366-370. Gordon and Breach, New York.

Orgel, L.E. and F.H.C. Crick. 1980. Selfish DNA: the ultimate parasite. Nature 284: 604-607.

_________

Part of the Quotes of interest series.

Quotes of interest — 1980s edition (part one).

Posted on February 14, 2008 by T. Ryan Gregory

I previously posted a few quotes from the original authors of the “junk DNA” and “selfish DNA” hypotheses. These showed that the early discussions of these notions did not rule out possible functions for noncoding DNA. Nevertheless, creationists, many science writers, and far too many biologists insist on claiming that noncoding DNA was long dismissed as unimportant because of these ideas. I will be discussing the history of research in this field in some detail later, but for the time being I thought it would be interesting to give some more quotes from papers written in top journals during the supposed period of disregard of noncoding sequences. This is quote-mining, of course, so you are encouraged to consult the original sources. I have not hand-picked these, rather these are the types of papers that come up in searches from this period. By all means, if you know of works from any time that claimed that all noncoding DNA is nonfunctional or discouraged research into possible functions let me know the citation.

There is obviously a continuum of possible selective advantages (positive or negative) to the organism. We had excluded from our definition of selfish DNA those cases where the selective advantage is very high. To decide whether a repeated sequence is parasitic or not, one must determine whether the presence of the repeated sequence in the population is mainly due to the efficiency with which the sequence spreads intragenomically or mainly due to the reproductive success of those individuals in the population who possess repeated copies of the sequence. Only in the former case do we consider it useful to use the term selfish or parasitic DNA, as opposed to useful or symbiotic DNA — the borderline between the two may not be sharp.

…

In our recent experience most people will agree, after discussion, that ignorant DNA, parasitic DNA, symbiotic DNA (that is, parasitic DNA which has become useful to the organism) and ‘dead’ DNA of one sort or another are all likely to be present in the chromosomes of higher organisms. Where people differ is in their estimates of the relative amounts. We feel that this can only be decided by experiment.

Orgel, L.E., F.H.C. Crick, and C. Sapienza. 1980. Selfish DNA. Nature 288: 645-646.

Perhaps the most surprising discovery in the initial studies of eukaryotic gene structure has been that many genes contain interruptions in the coding sequences. The origin and the function of these intervening sequences (IVS or introns) are not yet well understood but are the subject of intense investigation.

Wallace, R.B., P.F. Johnson, S. Tanaka, M. SchÃ¶ld, K. Itakura, and J. Abelson. 1980. Directed deletion of a yeast transfer RNA intervening sequence. Science 209: 1396-1400.

As long ago as 1970, Ohno argued, on the basis of genetic load, that much of the eukaryote genome was little more than junk. This viewpoint, which is still unpalatable to many biologists, now has a substantial supporting DNA data base. More recently, this has led Ohno to conclude that genes in the mammalian genome are like ‘oases in a barren desert’ (Ohno 1982) and that for every copy of a new gene that has arisen during evolution, hundreds of other copies have ‘degenerated’ to swell the ranks of junk DNA (Ohno 1985).

…

It has, in the past, been commonplace to assume that most, if not all, aspects of the morphology, physiology and behaviour of an organism represent adaptive responses to the environment in which that organism lives. This assumption, however, is difficult to test objectively and represents more an article of faith than of fact. Indeed, biologists have become addicted to the adaptationist viewpoint not so much because of the compelling evidence in favour of it, but rather because it seems so eminently logical and reasonable. This view, of course, assumes that functional explanations must necessarily exist for all facets of the bewildering diversity we see within and between genomes. An alternative extreme viewpoint is that eukaryote genomes are, in effect, simply larger, more sophisticated and embellished prokaryote genomes, loaded with non-coding DNA sequences which are in a constant state of flux but without any significant short-term impact on the phenotype.
To decide which, if either, of these interpretations is the more realistic, we need to determine the number of functional genes within a genome and the proportion of these that are developmentally significant. We also require precise information on the changes that go on within a genome at the molecular level and the extent to which these lead to meaningful evolutionary change. Compared to the differences in structural gene composition between related species, we now know that there are much more striking molecular differences in their repeated DNA components. This raises the question of whether this is because such sequences are important or unimportant. There is also a clear need to distinguish between historical chance and biological necessity as causative factors in determining genome structure.

John, B. and G.L.G. Miklos. 1988. The Eukaryote Genome in Development and Evolution. Allen & Unwin, London. p.24-25.

Interest in repetitive DNA sequences goes back many years but, as with many aspects of molecular biology, the advent of recombinant DNA technology and DNA sequencing now permits previously unmatched scrutiny of the structures of interest.

…

If mobility is a reality, and most agree that it probably is, then it seems likely that at least some members of repeat families will have important effects in the genome, even if they have no formal function. Enhancing recombination and altering rates of gene expression are obvious possibilities, while the initiation of new species is a more recondite proposal.

…

The truth is, however, that the functions of the large and motley collection of repeated DNA families are proving particularly resistant to elucidation. Putative functions are many, including, variously, involvement in chromosome pairing, control of gene expression, processing of messenger RNA precursors, and participation in DNA replication. So far none has been established, save for the single exception of a small family that gives rise to 7S RNA, a molecule that recently was serendipitously discovered to be an essential component of a particle that mediates the secretion of proteins from cells.

…

Some repetitive DNA will undoubtedly be shown to have a function, in the formal sense; some will likely be shown to exert important effects; and the remainder may well have no function or effect at all and can therefore be called selfish DNA. Repetitive DNA constitutes a substantial proportion of the genome (up to 90 percent in some cases), and there is considerable speculation on how it will eventually be divided between these three groups. Current bets would put a small fraction in the function category, with distribution of the rest rising steeply through the effect and selfish categories.

…

Satellite DNA unquestionably is a puzzle. What determines the number of copies in a repeat family? And how does the genome tolerate so much of it? Perhaps, as Singer has recently promulgated, just a small fraction of the satellite sequences is essential to some genomic function while the remainder is harmless surplus. This, she indicates, is a comfortable middle ground between the extreme selfish DNA position, which sees no function in all this “junk DNA,” and the adaptationist position, which looks for functions in every structure. The same questions and speculations can be applied to dispersed repetitive DNA.

…

One observation that might be taken as evidence of function in repeated sequences is the frequency of transcription into RNA. A significant proportion of nuclear RNA contains transcripts of repeated sequences, although 90 percent of this is lost in RNA processing and exit to the cytoplasm. Davidson and his colleagues have shown that in sea urchin the spectrum of repeat families that are transcribed changes during development, an appealing argument for some regulatory function. Most intriguing, however, is the discovery that only a small proportion of any repeat family is ever transcribed. “Most members appear to be quiescent, which must make you cautious when isolating samples in search of their function.”

…

It is clear that, from their abundance, their unusual structure, and their frequent transcription, dispersed repetitive DNA families cannot be ignored. But it is equally clear that for the most part they, like their tandemly repeated relatives, remain a phenomenon in search of a function.

Lewin, R. 1982. Repeated DNA still in search of a function. Science 217: 621-623.
[Reporting about International Workshop in Highly Repeated DNA, NIH, July, 1982]

Even though the human Î²-globin complex contains a relatively large number of active genes, 95 percent of the locus is made up of DNA that does not code for proteins. What is the role of this extra DNA, if any? The pseudogenes constitute just a small proportion of the region, although more pseudogenes might exist. Some of the DNA is made up of representatives of well-known families of repetitive sequences. And the remainder is DNA of no known function or comparable sequence.
“We wanted to test the hypothesis that this extra DNA is ‘junk DNA,'” says Jeffreys, “so we compared the Î² loci in humans, gorillas, and baboons.” Jeffreys and his colleagues reasoned that if it were junk DNA, then over the 20 to 40 million years of evolution represented by humans, apes, and Old World monkeys both the sequence and the overall quantity of intergenic DNA could be expected to vary. “It turned out that the cluster is remarkably stable,” reports Jeffreys. “The overall pattern and size of the cluster is the same, and the rate of nucleotide substitutions is one-quarter to one-fifth of what be expected in functionless DNA”. The noncoding DNA therefore appears not to be junk, but what function it might perform is still a mystery.

Lewin, R. 1981. Evolutionary history written in globin genes. Science 214: 426-427.

Since the discovery that many eukaryotic genes are discontinuous, a number of studies have been directed towards identifying a function for intervening sequences (IVSs).

…

Whilst the results presented here point out a clear role for the intron in one tRNA gene family, a common function for all tRNA intervening sequences is not evident. Perhaps tRNA IVSs represent remnants of evolutionary gene rearrangements and only occasionally evolve a role in RNA synthesis. Alternatively, there may be a common but as yet identified function for these IVSs, and the role for the IVS described here for tRNA(tyr) may represent an auxiliary use of the precursor RNA. Clearly, analysis of IVS mutants in other tRNA gene families will be necessary to obtain definitive answers to these questions.

Johnson, P.F. and J. Abelson. 1983. The yeast tRNA(tyr) gene intron is essential for correct modification of its tRNA product. Nature 302: 681-687.

Repetitive sequences are interspersed with single-copy regions in the human genome. Because this arrangement is conserved in hetergeneous nuclear (hn) RNA, the role of repetitive sequences in the control of gene expression at the transcriptional and posttranscriptional levels is conjectured.

…

A large amount of evidence suggests that most double-stranded regions in hnRNA are transcripts of Alu repeats. The presence of the Alu repeat in mRNA may result from incomplete removal of Alu sequences in the nucleus, such that a region of homology to the Alu repeat is preserved. In this regard, we note that the region of association in RNA complexes (120bp) and the average size of R loops in groups III, IV, and V are significantly smaller than the Alu DNA sequence. This observation could also reflect involvement of Alu sequences in mRNA processing. Recently, evidence of molecular interactions among different species of cytoplasmic RNA has been reported. The presence of Alu repeat transcripts in different cytoplasmic molecules of either mRNA or 7S RNA suggests the potential for in vivo occurrence of interactions involving Alu repeat transcripts. Such interactions may also play a role in the cytoplasmic stability or translation efficiency of mRNA.
Finally, we find that it is most intriguing to have detected a significant frequency of complexes in hybridized RNA of normal T lymphocytes but not of placental tissue. This observation could reflect tissue-specific transcription of Alu sequences.

Calabretta, B., D.L. Robberson, A.L. Maizel, and G.F. Saunders. 1981. mRNA in human cells contains sequences complementary to the Alu family of repeated DNA. Proceedings of the National Academy of Sciences of the USA 78: 6003-6007.

The most striking feature of the Alu repeat family is its large numerical representation in the human genome, which suggests that Alu repeat sequences might be involved in genetic rearrangements, a role which could be identified if we consider the human genome to be a dynamic structure. Although most members of the Alu family are scattered throughout the human genome, some may be clustered in certain genomic regions. Such an arrangement would provide a good opportunity to test the hypothesis that repetitive sequences facilitate genetic rearrangements.

…

The pattern of interspersion may have been fixed in evolution, with certain Alu repeat members having been recruited for specific cellular functions, for example, in the initiation of DNA replication and as promotor sites for RNA polymerase III.

…

In general, the human genome seems to be a dynamic structure in which variations can be introduced by sequence rearrangement, certain of which can lead to the formation of circular duplex DNA molecules. This genetic plasticity is quite characteristic of transposable elements, and the consequent genome alterations are relevant to evolutionary changes, while the DNA rearrangements may be involved in human cancer.

Calabretta, B., D.L. Robberson, H.A. Barrera-Saldana, T.P. Lambrou, and G.F. Saunders. 1982. Genome instability in a region of human DNA enriched in Alu repeat sequences. Nature 296: 219-225.

Many students of DNA analysis have been unsuspectingly struck by the regularity and length of banding patterns on sequencing gels produced by simple repetitive DNA. When discussing the meaning of these simple DNA sequences, most professional genome watchers hesitate and simply refer to the enormous amount of sometimes unpalatable literature on the subject. They stress the complexity and intractability of the problem of simple sequences (and repetitive DNA as a whole). Investigators working directly in the field of repetitive sequences must justify the relevance of their efforts before their peers and granting agencies. On this truly meaningful and even existential note, I review here certain related members of the family of simple sequences — the GA(TC)A repeats.

…

Finally, possible functional implications are touched upong by covering RNA expression data of GA(TC)A-containing sequences. Hypotheses on the control of gene expression by GA(TC)A sequences are not covered because the experimental basis is at best scarce in animal systems. Nevertheless, it should be evident from this review that the conception that all the simple repetitive sequences are just “junk” or genes is simplistic. It is interesting but exceedingly difficult to speculate on why they are a characteristic component of the genomes of present-day animals.

…

All attempts to identify any natural GA(TC)A translation products in eukaryotes, for example, in monoclonal antibodies, proved fruitless. Hence the question of the functional meaning, if any, of simple, tandemly repeated sequences such as GA(TC)A DNA remains unanswered.
Because of the high copy numbers, the analysis of simple repetitive DNA is a serious, difficult, and unspectacular Sisyphean labor. We have learned to question many of the general preconceptions about the functionality of DNA sequences. Merely because they exist in the genomes of more or less related animal species does not mean that they have a function.

Epplen, J.T. 1988. On simple repeated GA(TC)A sequences in animal genomes: a critical reappraisal. Journal of Heredity 79: 409-417.

The slime moulds can therefore help us to investigate the structure and evolution of repetitive DNA in ‘simple’ eukaryotes and to understand how these sequences contribute to the architecture and function of the eukaryotic genome. Several questions remain, including perhaps the most important: do repetitive sequences perform some definable function?

…

DNA satellites and mobile genetic elements have both seemingly developed or adapted mechanisms which permit their sequences to multiply in eukaryotic genomes. As suggested a number of years ago, and recently reviewed, this line of thinking suggests that most, if not all, families of repetitive sequence may serve no useful function in eukaryotic DNA. This is the ‘selfish’ or ‘junk’ DNA hypothesis. There have been many supporters of the opposing view that at least some families of repeated sequence must perform some useful function, but so far no fully convincing case has been made for a clearly identifiable role for any repeated sequence family other than repeated genes such as those for rRNA. This may mean either that no such functions exist, or that experimentalists have hitherto possibly not been looking in the right direction. What new information has arisen from recent work that may provide clues as to which new directions to take? …

Hardman, N. 1986. Slime moulds and the origin of foldback DNA. BioEssays 5: 105-111.

There have been several suggested explanations for the presence of noncoding intervening sequences in many eukaryotic structural genes. They may be examples of ‘selfish DNA’, conferring little phenotypic advantage, or they may have some importance in gene expression and/or evolution.

…

It is possible that the relationship between the location of the splice junction in the gene at the surface of the protein confers a biological advantage and hence is a result of natural selection. Introns and their associated splicing systems could be exploited in many ways during the evolution of a protein.

Craik, C.S., S. Sprang, R. Fletterick, and W.J. Rutter. 1982. Intron-exon splice junctions map at protein surfaces. Nature 299: 180-182.

We conclude from this experiment that the intron in the yeast actin gene does not have an observable function. It is possible that the role of the intron is too subtle to be observed in laboratory conditions of growth or that the intron, while having evolutionary significance, has no present role. To conclude that this is true for all yeast genes that contain introns would of course be premature, but there exist strains in which mitochondrial introns have been removed with no observable effect.

Ng, R., H. Domdey, G. Larson, J.J. Rossi, and J. Abelson. 1985. A test for intron function in the yeast actin gene. Nature 314: 183-184.

Solutions to problems of how introns are dealt with by cells do not address the question of why introns are there at all, questions about intron function. Some introns in some genes perform clearly regulatory roles, since splicing factors specific to the tissue or developmental stage decide when and where splicing should occur (Breitbart et al. 1985). In addition, some introns in some genes contain enhancers or modulators of the expression of those genes (Slater et al. 1985). However, the great majority of introns in protein-coding genes have no such “functions.” Direct experimental as well as indirect comparative data show that most introns can be removed from genes without phenotypic effect (Blake 1985). Thus, in terms of beneficial effects on the fitnesses of organisms, we almost certainly cannot account for the presence of the majority of individual introns, nor for the propensity to have introns at all, even though introns may on the average represent as much as 90% of the length of a gene and perhaps as much as half of the total DNA in some complex eukaryotes such as humans.

…

Thinking about introns challenges basic concepts of adaptation and function. In particular, it challenges the rather strict adaptationist approach that molecular biologists have traditionally taken toward elements of gene structure.

Doolittle, W.F. 1987. The origin and function of intervening sequences in DNA: a review. American Naturalist 130: 915-928.

Ever since the discovery of split genes, there has been a debate about why they are split. This can be resolved into three separate problems: the origin of the introns that split the genes (separating exons from each other), the role of introns in evolution, and their present function, if any.

Rogers, J. 1985. Exon shuffling and intron insertion in serine protease genes. Nature 315: 458-459.

____________

Part of the Quotes of interest series.

Evolutionary trees for Darwin Day.

Posted on February 12, 2008 by T. Ryan Gregory

In time for Darwin Day, my article on “Understanding evolutionary trees” in the forthcoming issue of Evolution: Education and Outreach is now freely available as a preprint online.

Here is the article abstract:

Charles Darwin sketched his first evolutionary tree in 1837, and trees have remained a central metaphor in evolutionary biology up to the present. Today, phylogeneticsâ€”the science of constructing and evaluating hypotheses about historical patterns of descent in the form of evolutionary treesâ€”has become pervasive within and increasingly outside evolutionary biology. Fostering skills in â€œtree thinkingâ€ is therefore a critical component of biological education. Conversely, misconceptions about evolutionary trees can be very detrimental to oneâ€™s understanding of the patterns and processes that have occurred in the history of life. This paper provides a basic introduction to evolutionary trees, including some guidelines for how and how not to read them. Ten of the most common misconceptions about evolutionary trees and their implications for understanding evolution are addressed.

Gregory, TR. 2008. Understanding evolutionary trees. Evolution: Education and Outreach 1: in press.

I have also started a series on the topic at DNA and Diversity.

My earlier piece “Evolution as fact, theory, and path” is also free to access.

Non-functional DNA: non-functional vs. inconsequential.

Posted on February 12, 2008 by T. Ryan Gregory

Each copy of the human genome consists of about 3,200,000,000 base pairs, and includes about 500,000 repeats of the LINE-1 transposable element (a LINE) and twice as many copies of Alu (a SINE), as compared to around 20,000 protein-coding genes. Whereas protein-coding regions represent about 1.5% of the genome, about half is made up LINE-1, Alu, and other transposable element sequences. These begin as parasites, and some continue to behave as detrimental mutagens implicated in disease. However, most of those in the human genome are no longer mobile, and it is possible that many of these persist as commensal freeloaders. Finally, it has long been expected that a significant subset of non-coding elements would be co-opted by the host and take on functional roles at the organism level, and there is increasing evidence to support this.

A notable fraction of the non-genic portion of human DNA is undoubtedly involved in regulation, chromosomal function, and other important processes, but based on what we know about non-coding DNA sequences, it remains a reasonable default assumption — though one that should continue to be tested empirically — that much or perhaps most of it is not functional at the organism level. This does not mean that a search for the functional segments is futile or irrelevant — far from it, as many non-genic regions are critical for normal genomic operation and some have played an important role in many evolutionary transitions. It simply means that one must not extrapolate without warrant from discoveries involving a small fraction of sequences to the genome as a whole.

More generally, it has been known for more than 50 years that the total quantity of DNA in the genome is linked to nucleus size, cell size, cell division rate, and a wide range of organism-level characteristics that derive from these cytological features. Thus, large amounts of DNA tend to be found in large, slowly dividing cells, which in turn typically make up the bodies of organisms with low metabolisms, slow development, or other such traits. On this basis alone, one would expect to see consequences for the organism if a large quantity of non-coding DNA were eliminated from or added to the genome, even if most of the particular elements in question were neutral or detrimental under normal circumstances. Non-functional is not equivalent to inconsequential. This is especially true when there are factors operating at different levels, for example when an abundant and diverse collective of entities includes components that are variously neutral, beneficial, and detrimental to a host.

Though they cannot prove an argument, analogies are often useful for understanding an issue. In this capacity, consider the following:

There are roughly 10¹³ to 10¹⁴ individual microorganisms living in your digestive tract (Gill et al. 2006), which is on par with, or perhaps even 10x larger than, the number of cells making up your own body. It is also two or three orders of magnitude larger than the number of humans who have ever lived, and of the number of stars in the Milky Way galaxy.
The assemblage of microorganisms in your intestines comprises some 500 species, most of which have never been cultured in the lab or studied in detail (Gilmore and Ferretti 2003). To put this diversity in perspective, there are only about 5,000 species of mammals on Earth today.
The combined “metagenome” of the microorganisms in your gut contains at least 100 times as many genes as your own genome (Gill et al. 2006).

We do not know the specific characteristics of many of the microorganisms in the gut. However, we do know that at least some of them are essential, or at least highly beneficial, for human health. Several of the species found in the gut are important mutualists, assisting with digestion and in return drawing nutrients from the food that we consume. In this sense, it is hard not to agree with Gill et al. (2006), who argue that “humans are superorganisms whose metabolismrepresents an amalgamation of microbial and human attributes”.

The question is, are all 10,000,000,000,000+ microbial cells that we carry with us functional for our well-being? Some certainly are. But many, maybe even most, are probably commensal freeloaders who neither harm nor benefit us, though of course their total abundance is limited to what can be carried by the host without deleterious consequences. By contrast, some gut bacteria are implicated in gastrointestinal disorders. A few are actively parasitic, but their numbers may be kept in check by our own immune system or through competition with non-pathogenic species, or because they kill the host or are killed by antibiotics. Some, such as the well known Escherichia coli, can be harmless or deadly depending on the presence of particular genes. Thus, the total number of microorganisms, and the relative diversity of species that this encompasses, is influenced by a complex interaction of factors internal to the gut (e.g., who invades, which microorganisms are already present, how efficiently they reproduce) and higher-level conditions (e.g., human immune response, dietary effects on which nutrients are present, positive or negative effects on the host).

What we know about bacteria and other microorganisms makes for a reasonable default assumption that much or even most of what is found in the gut is not there because it provides a direct benefit to humans. On the flipside, we have good reason to expect that some, perhaps even a large fraction, of these organisms are beneficial. Therefore, we require evidence to show that any particular species is functional from the human point of view, and that its abundance is determined on this basis. The search for such evidence is important, but it occurs against a backdrop of realizing that bacteria could be there for their own benefit only, whether or not that has any adverse effects on our well-being as hosts. Establishing that a specific strain of bacteria in the digestive tract is beneficial does not justify the conclusion that all bacteria in the gut are mutualistic. It does not even imply that all individuals of the helpful strain are essential, because the optimal abundance for the host and the pressures for reproduction of the microorganisms may not converge on the same quantity.

If one were to remove the microorganisms from the gut, or to significantly alter their species composition or abundance, one would expect to see consequences for host health. This would be true even if most of the particular organisms in question were neutral or detrimental in normal circumstances. As with non-genic elements in the genome, this means that even if many organisms in the gut are non-functional from the host’s perspective, their presence is not inconsequential for the biology of an animal carrying them.

Darwin Day.

Posted on February 12, 2008 by T. Ryan Gregory

Today is the 199th anniversary of Charles Darwin’s (and Abraham Lincoln’s) birth, and is being celebrated around the globe as Darwin Day. Here are some things to check out.

DarwinDay.org

Darwin Day in The Guardian

Darwin Correspondence Project

Darwin Digital Library of Evolution

The Complete Works of Charles Darwin Online

Darwin Today: Celebrating Modern Evolutionary Research

Darwin Exhibit curated by Niles Eldredge of the AMNH in New York

Opening March 8th at the Royal Ontario Museum in Toronto

Darwin Day blog carnival at Scientific Blogging

Including my own post — Do you understand evolutionary trees? (Part One)

The evidence and the quality of the theory, not the man, is the source of authority.
Source

The junk DNA collection.

Posted on February 12, 2008 by T. Ryan Gregory

In this post, I will maintain an up to date list of substantive posts dealing with the topic of “junk DNA” on this blog and various others.

Genomicron

Sandwalk

Non-functional DNA: quantity.

Posted on February 11, 2008 by T. Ryan Gregory

In my previous post, I noted that because of what we understand about the nature, origins, and cross-taxon quantitative diversity of the various sorts of non-genic DNA in large eukaryote genomes, the default assumption is that much or even most of it is not functional at the cell and organism levels. Thus, the burden of proof rests with authors who claim that a large fraction, or indeed most or all, of this DNA is functional for the organisms in which it occurs.

This should not be construed as claiming that all non-genic DNA is assumed to be non-functional. I have pointed out in various preceding posts that even those who postulated non-adaptive explanations for its existence did not rule out — and indeed, explicitly favoured — the notion that a significant portion would turn out to serve a function. You need not take my word for this, as it is not difficult to find unambiguous statements from the original authors themselves.

For example, here are Orgel and Crick (1980) who, along with Doolittle and Sapienza (1980), first proposed the concept of “selfish DNA” in detail:

It would be surprising if the host genome did not occasionally find some use for particular selfish DNA sequences, especially if there were many different sequences widely distributed over the chromosomes. One obvious use … would be for control purposes at one level or another.

Here, too, is Comings (1972), the first person to use the term “junk DNA” in print and the first to provide a substantive discussion of the topic. (The term was coined by Ohno in 1972, but Comings’s paper appeared in print first, citing Ohno as ‘in press’, and Ohno used the term only in the title).

These considerations suggest that up to 20% of the genome is actively used and the remaining 80+% is junk. But being junk doesn’t mean it is entirely useless. Common sense suggests that anything that is completely useless would be discarded. There are several possible functions for junk DNA.

The use of the terms “selfish DNA” or “junk DNA” has changed over time, and both are now often applied to all non-genic DNA, rather than to the sequences to which they originally referred (i.e., transposable elements and pseudogenes, respectively). Moreover, it seems that many authors — at least those whose studies focus primarily on protein-coding genes and DNA sequencing — believe that the assumption has been that all non-genic DNA is “junk” in the sense of totally non-functional. However, amidst any such assumptions there has always been a diversity of views on the subject, ranging from assuming that most non-genic DNA is non-functional (as in the quotes above) to expecting it all to be functional — the latter being a position held by strict adaptationists, and a large part of the motivation for proposing the alternative view of selfish DNA the first place.

As with many issues in evolution, this is a matter of relative quantity, not an exclusive dichotomy. We may reasonably expect a significant fraction of non-genic DNA to show evidence of function, and the pursuit of such evidence is a valid and important endeavour. It does not follow, however, that the pendulum must be perceived to swing from entirely functional to entirely non-functional and back again. We will undoubtedly refine our estimates of the amount of non-genic DNA that is mutualistic at the organism level, how much is commensal, and how much is best characterized as parasitic in nature.

As it stands, the evidence suggests that about 5% of the human genome is functional at the organism level. The total may be higher — as noted, Comings suggested 20% is actively utilized. It is conceivable that 50% or more of the genome is functional, perhaps in structural roles or some other higher-order capacity. It would require evidence to support this contention, however, and the question would remain as to why an onion requires 5x more of this structural or otherwise essential DNA, and why some of its close relatives can get by with half as much while others have twice the onion amount. There is nothing remarkable about onions in this sense, by the way — animal genome sizes alone cover a more than 7,000-fold range, and even among vertebrates there is a 350-fold difference. The range among single-celled protozoa is at least 30,000-fold, though even higher estimates have been presented.

The take home message is simply this. What we know about eukaryote genomes suggests that there are many mechanisms that can add non-coding DNA that do not require it to be functional. This does not in any way preclude the possibility of, or invalidate the search for, function in some, many, or possibly even most of those non-coding components. How much proves to be functional is an empirical question, and at present the indication seems to be that most non-genic DNA is non-functional. That said, non-functional is not the same as inconsequential.

________

Comings, D.E. 1972. The structure and function of chromatin. Advances in Human Genetics 3: 237-431.

Doolittle, W.F. and C. Sapienza. 1980. Selfish genes, the phenotype paradigm and genome evolution. Nature 284: 601-603.

Ohno, S. 1972. So much “junk” DNA in our genome. In Evolution of Genetic Systems (ed. H.H. Smith), pp. 366-370. Gordon and Breach, New York.

Orgel, L.E. and F.H.C. Crick. 1980. Selfish DNA: the ultimate parasite. Nature 284: 604-607.

Non-functional DNA: the burden of proof.

Posted on February 10, 2008 by T. Ryan Gregory

If one studies a genome sequence and comes across a region that is of the length and arrangement of a protein-encoding gene, and is enclosed but not interrupted by start and stop codons, then one can reasonably infer that this sequence is likely to be functional, even if no other evidence is yet in hand for what its potential protein product may do. If someone doubted that this were actually a protein-coding gene, then additional evidence would be expected to show this, for example by showing that it is really an artifact, or a pseudogene, or indicating that it is probably not functional despite the characteristics that suggest that it is.

Many non-genic sequences that are conserved across taxa or which exhibit characteristics consistent with regulatory regions or binding sites or structural components may be treated in the same way as finding an open reading frame as noted above. For most non-coding DNA, there is no such evidence of probable function, however. In fact, as the Sandwalk series on “junk DNA” notes, most non-coding DNA is of a type for which there is little reason to expect function. Inactive vestiges of transposable elements and pseudogenes may occasionally be functional, but there is no reason to assume that they all are given what we know about how they form. Moreover, the massive differences in genome size (i.e., amount of non-genic DNA) among taxa, including species that would be expected to have similar regulatory, mutational buffering, or structural requirements, suggests that much of it is indeed lacking in a universally applied function.

The default assumption by those who accept non-adaptive evolutionary outcomes is that much or even most of a larger genome is not functional for the cell or organism level. This is because of what we do know about these sequences, not because of what we don’t know. Therefore the burden of providing evidence to the contrary is on those who argue that all or even a large percentage of non-coding DNA has a function, which requires an explanation for the variation in DNA amount among eukaryotes in addition to empirical evidence for function at least in general terms, if not as an indication of what the function probably is.

Dinosaurs made from pseudogenes?

Posted on February 9, 2008 by T. Ryan Gregory

Matt Ridley, author of such books as The Red Queen, Genome, and The Origins of Virtue (and not to be confused with biologist Mark Ridley), asks the question “Will we clone a dinosaur?” in Time Magazine. His answer, at least in terms of the Jurassic Park sense of cloning a dinosaur from ancient DNA, is either “no” or “definitely not”.

Yet, Ridley argues for a different possible revival of dinosaur-like animals, ones built through genetic engineering. He notes three things that he considers encouraging in this regard. The first is that dinosaurs aren’t really extinct, or at least that they did leave a diverse line of descendants — namely birds. Second, important regulatory genes, such as the Hox genes that play a major role in directing development, are generally quite conserved across animal lineages. No doubt, the third will be of particular interest to readers of this blog and indeed Ridley singles it out:

Third, and most exciting, geneticists are finding many “pseudogenes” in human and animal DNA–copies of old, discarded genes. It’s a bit like finding the manual for a typewriter bound into the back of the manual for your latest word-processing software. There may be a lot of interesting obsolete instructions hidden in our genes.

Put these three premises together, and the implication is clear: the dino genes are still out there.

I remember an episode of Star Trek – The Next Generation in which the introns of the crew members’ genomes were “reactivated”, and this caused them to de-evolve through various stages in their species’ ancestries. Of course, introns include various types of DNA sequence, most of which are probably not something that could be activated in any sense. The writers probably meant to focus on pseudogenes, as Ridley did.

Pseudogenes are duplicates of protein-coding genes that either maintain the intron/exon structure of the original gene (classical pseudogenes) or lack introns because they were inserted retroactively from an RNA transcript (processed pseudogenes) — either way, they are defined by two characteristics: 1) their obvious similarity to and derivation from protein-coding genes, and 2) the fact that they no longer function in coding for a protein.

Pseudogenes can form at any time in the ancestry of a lineage, may be derived from a wide variety of genes, and may degrade by mutation or be partially deleted without consequence due to a relaxation of selection given that they no longer fulfill sequence-specific functions. Taken together, this means that it can be difficult to identify something as a pseudogene, let alone what the original sequence encoded and in which ancestor the duplication occurred. In other words, pseudogenes are not like an easily legible manual of a particular obsolete technology. They are a jumble of distorted and half-erased text from a manual that is continually being modified haphazardly.

______

Hat tip: Evolving Thoughts