Quotes of interest — science news stories.

We have been told in science news stories since the early 1990s that biologists long neglected the potential significance of noncoding DNA. (Sadly, this is in line with the claims made by creationists, who claim that “Darwinism” is to blame despite the obvious fact that Darwinian adaptationism would expect functions. Some biologists likewise play up the notion that we have ignored noncoding sequences and just now are coming to appreciate them, thanks, no doubt, to their own revolutionary insights, but again, this ignores a diverse literature on the topic spanning the rise of the tools necessary for such work up to the present.) But what about the science stories that were actually written during the supposed period during which noncoding DNA was dismissed as uninteresting (i.e. 1980 to the early 1990s)?

If you had a subscription to Science in the 1980s, you would have read stories like these by their science writer Roger Lewin:

Lewin, R. 1981. Evolutionary history written in globin genes. Science 214: 426-427.

Even though the human β-globin complex contains a relatively large number of active genes, 95 percent of the locus is made up of DNA that does not code for proteins. What is the role of this extra DNA, if any? The pseudogenes constitute just a small proportion of the region, although more pseudogenes might exist. Some of the DNA is made up of representatives of well-known families of repetitive sequences. And the remainder is DNA of no known function or comparable sequence.
“We wanted to test the hypothesis that this extra DNA is ‘junk DNA,'” says Jeffreys, “so we compared the β loci in humans, gorillas, and baboons.” Jeffreys and his colleagues reasoned that if it were junk DNA, then over the 20 to 40 million years of evolution represented by humans, apes, and Old World monkeys both the sequence and the overall quantity of intergenic DNA could be expected to vary. “It turned out that the cluster is remarkably stable,” reports Jeffreys. “The overall pattern and size of the cluster is the same, and the rate of nucleotide substitutions is one-quarter to one-fifth of what be expected in functionless DNA”. The noncoding DNA therefore appears not to be junk, but what function it might perform is still a mystery.

Lewin, R. 1982. Repeated DNA still in search of a function. Science 217: 621-623.
[Reporting about an NIH International Workshop in Highly Repeated DNA July, 1982]

Interest in repetitive DNA sequences goes back many years but, as with many aspects of molecular biology, the advent of recombinant DNA technology and DNA sequencing now permits previously unmatched scrutiny of the structures of interest.

If mobility is a reality, and most agree that it probably is, then it seems likely that at least some members of repeat families will have important effects in the genome, even if they have no formal function. Enhancing recombination and altering rates of gene expression are obvious possibilities, while the initiation of new species is a more recondite proposal.

The truth is, however, that the functions of the large and motley collection of repeated DNA families are proving particularly resistant to elucidation. Putative functions are many, including, variously, involvement in chromosome pairing, control of gene expression, processing of messenger RNA precursors, and participation in DNA replication. So far none has been established, save for the single exception of a small family that gives rise to 7S RNA, a molecule that recently was serendipitously discovered to be an essential component of a particle that mediates the secretion of proteins from cells.

Some repetitive DNA will undoubtedly be shown to have a function, in the formal sense; some will likely be shown to exert important effects; and the remainder may well have no function or effect at all and can therefore be called selfish DNA. Repetitive DNA constitutes a substantial proportion of the genome (up to 90 percent in some cases), and there is considerable speculation on how it will eventually be divided between these three groups. Current bets would put a small fraction in the function category, with distribution of the rest rising steeply through the effect and selfish categories.

Satellite DNA unquestionably is a puzzle. What determines the number of copies in a repeat family? And how does the genome tolerate so much of it? Perhaps, as Singer has recently promulgated, just a small fraction of the satellite sequences is essential to some genomic function while the remainder is harmless surplus. This, she indicates, is a comfortable middle ground between the extreme selfish DNA position, which sees no function in all this “junk DNA,” and the adaptationist position, which looks for functions in every structure. The same questions and speculations can be applied to dispersed repetitive DNA.

One observation that might be taken as evidence of function in repeated sequences is the frequency of transcription into RNA. A significant proportion of nuclear RNA contains transcripts of repeated sequences, although 90 percent of this is lost in RNA processing and exit to the cytoplasm. Davidson and his colleagues have shown that in sea urchin the spectrum of repeat families that are transcribed changes during development, an appealing argument for some regulatory function. Most intriguing, however, is the discovery that only a small proportion of any repeat family is ever transcribed. “Most members appear to be quiescent, which must make you cautious when isolating samples in search of their function.”

It is clear that, from their abundance, their unusual structure, and their frequent transcription, dispersed repetitive DNA families cannot be ignored. But it is equally clear that for the most part they, like their tandemly repeated relatives, remain a phenomenon in search of a function.


Lewin, R. 1982. Adaptation can be a problem for evolutionists. Science 216: 1212-1213.

Molecular biology of recent years has revealed many new and intriguing categories of DNA, some of which appear to have no role. One explanation of this has been that the nonaptive sequences provide raw material for future evolution. But the logic of natural selection does not allow for selection for future use. More likely is that the accumulation of nonaptive DNA is a consequence of the innate property of repeated sequences of nucleic acid to replicate and move around the genome. Later it may be recruited to perform some role, in which case it becomes an exaptation.

Lewin, R. 1983. A naturalist of the genome. Science 222: 402-405.

Some mobile elements are large and complex, measuring as much as 10,000 nucleotides in length and carrying many genes, while others are simple sections of repeated DNA just a few hundred nucleotides long. Some people would classify all such elements as “junk” or “parasitic” DNA. Others strongly demur and insist that, for instance, although there is yet to be found any convincing evidence for the involvement of a limited class of elements in development in organisms other than maize, the possibility should by no means be dismissed. In any case it is clear that the mobility of certain genetic elements is essential in the generation of the huge diversity of antibodies in vertebrates and in the production of different antigenic coats in certain parasites. Jumping genes clearly represent a potentially rich source of mutation. In addition, an evolutionary link between mobile elements and retroviruses now seems incontrovertible, as does a causal relationship with certain cancers.

Lewin, R. 1985. More progress in messenger RNA splicing. Science 228: 977.

This summer marks 8 years since eukaryotic genes were first discovered to be interrupted by noncoding sequences, known variously as intervening sequences or introns. The discovery raised two sets of questions. The first concerns the origin and function-if any-of introns, which, by its very nature, is a very difficult question to test and therefore remains somewhat in the realms of speculation, although significant insights are being made.The second focuses on the mechanics of removal of these sequences in the production of mature RNA molecules, and in principle should be experimentally more tractable. The immense effort directed at this second question has produced during the past 8 years some conventional biochemistry, some novel and surprising nucleic acid chemistry, and a great deal of frustration.

Lewin, R. 1986. “Computer genome” is full of junk DNA. Science 232: 577-578.

Many biologists were unhappy with the idea that much of the DNA might have no function, says Loomis. “There is a very strong feeling that if a molecule, or any kind of biological structure, exists, then it must be serving some kind of selectively advantageous purpose. I disagree with this viewpoint very strongly.” Loomis prefers to turn the question around. “We should ask, ‘what is the selective advantage of getting rid of a particular structure?’ This is not common thinking.”

It is of course very difficult to prove that a structure or a sequence of DNA has no function. “People will always say, ah, but you haven’t looked under the right conditions,” says Loomis. In the case of multigene families, the best data come from mutation experiments.

Lewin, R. 1988. Chance and repetition. Science 240: 603.

With some kind of concerted effort to map and sequence the entire human genome now appearing to be inevitable, there will be much excitement at the prospect of discovering what is encoded in the 3-billion-base “message”. There are certain to be some surprises, perhaps even equivalent in magnitude to the discovery a decade ago of long, noncoding sequences that interrupt the great majority of eukaryotic genes. But there are many biologists who expect large parts of the genome to be devoid of any function at all: “We face the prospect of trudging through huge tracts of junk DNA,” remarked British molecular biologist Sydney Brenner during one of the many recent panel discussions on the project.

At least some proportion of the DNA in the genomes of most organisms is in the form of these so-called middle repetitive sequences, ranging from 3% to as much as 70%: typically, the bigger the genome, the more repetitive DNA. There is a long tradition in biology that, seeing structures as extensive as these, argues that there must be a functional explanation for them.

Biologists have long speculated about the function of middle repetitive sequences, with regulation of gene expression being one popular notion. Loomis and Gilpin’s perspective, however, is that, although some middle repetitive sequences may have acquired a function once they have formed, there is no need to invoke function as a selective pressure for their origin.

____________

Part of the Quotes of interest series.


Quotes of interest — 1980s edition (part two).

This is the second installment in the quotes of interest series that focuses in particular on research and discussions from the 1980s, when noncoding DNA supposedly was ignored as irrelevant. The important message being offered is that there was plenty of research into possible functions or lack thereof in noncoding sequences of all types, and that whichever way authors concluded was based on the evidence available at the time, not ideology. This includes the parallel development of neutral theory, many proponents of which did conclude that pseudogenes were nonfunctional on the basis of their high mutation rates compared with coding sequences. Again, the point is not that no one argued against function (I argue against function at the organism level for most noncoding DNA), but that this is based on evidence, not unsupported assumption.

Members of the Alu family of interspersed repeated sequences and its rodent equivalents may be the normal cellular DNA replication initiation sites. In mammalian cells DNA replication proceeds bidirectionally simultaneously from many sites, and thus the initiation sites for replication might be expected to be interspersed repeated sequences with two-fold rotational symmetry. The inverted repeated examples of the Alu family of interspersed repeated sequences and their Chinese hamster equivalents show these attributes. These considerations raise the question of whether the transcription of these repeated sequences by RNA polymerase III, or the interaction of these sequences with the low molecular weight RNA, or both, may play a role in the initiation of DNA replication.

Jelinek, W.R., T.P. Toomey, L. Leinwand, C.H. Duncan, P.A. Biro, P.V. Choudary, S.M. Weissman, C.M. Rubin, C.M. Houck, P.L. Deininger, and C.W. Schmid. 1980. Ubiquitous, interspersed repeated sequences in mammalian genomes. Proceedings of the National Academy of Sciences of the USA 77: 1398-1402.

We have assigned six members of the human β-actin multigene family to specific human chromosomes. The functional gene, ACTB, is located on human chromosome 7, and the other assigned β-actin-related sequences are dispersed over at least four different chromosomes including one locus assigned to the X chromosome. Using intervening sequence probes, we showed that the functional gene is single copy and that all of the other β-actin related sequences are recently generated in evolution and are probably processed pseudogenes. The entire nucleotide sequence of the functional gene has been determined and is identical to cDNA clones in the coding and 5′ untranslated regions. We have previously reported that the 3′ untranslated region is well conserved between humans and rats (Ponte et al., Nucleic Acids Res. 12:1687-1696, 1984). Now we report that four additional noncoding regions are evolutionarily conserved, including segments of the 5′ flanking region, 5′ untranslated region, and, surprisingly, intervening sequences I and III. These conserved sequences, especially those found in the introns, suggest a role for internal sequences in the regulation of β-actin gene expression.

Our finding of highly conserved blocks of nucleotides in two of the five intervening sequences of β-acting genes raises the possibility that these segments have regulatory functions. Conserved internal regions have been reported previously, such as the internal transcriptional enhancer regions of immunoglobulin genes. However, the locations of these enhancers were initially regarded as a peculiarity of the immunoglobulin gene loci. More recently, internal control regions have been detected (but yet unidentified) for the adenovirus E1A gene, human globin genes, and chicken thymidine kinase gene. Any conclusion that the conserved β-actin intron sequences, especially those of IVS I, function as transcriptional enhancers must await direct experimentation. Nevertheless the evolutionary conservation of the immunoglobulin enhancer segments indicates that other transcriptional enhancers or cis-acting regulatory signals would be under selective pressure. It is interesting to note in this regard that the IVS I of both α- and β-globin genes are the most conserved introns of these genes. The IVS I of the human and mouse β-globin genes, for example, has 81 base pairs matching to give a KN(1) value of 0.302. Therefore these introns may well contain part of the proposed downstream regulatory elements.

Ng, S.-Y., P. Gunning, R. Eddy, P. Ponte, J. Leavitt, T. Shows, and L. Kedes. 1985. Evolution of the functional human β-actin gene and its multi-pseudogene family: conservation of noncoding regions and chromosomal dispersion of pseudogenes. Molecular and Cellular Biology 5: 2720-2732.

Although the presence and similar location of pseudogenes in all the mammalian globin gene clusters suggest that pseudogenes may have some as yet unidentified function, the simplest explanation for their existence is that they are the natural consequence of the mechanisms of gene amplification and sequence divergence. The arrangement of genes within the human α-globin gene cluster is consistent with this possibility.

Proudfoot, N.J. and T. Maniatis. 1980. The structure of a human α-globin pseudogene and its relationship to α-globin gene duplication. Cell 21: 537-544.

In summary, the structural analysis of a number of different globin gene clusters suggests that globin gene families are in evolutionary flux. Perhaps pseudogenes are simply a natural consequence of the mechanisms by which multigene families evolve.

Lacy, E. and T. Maniatis. 1980. The nucleotide sequences of a rabbit β-globin pseudogene. Cell 21: 545-553.

Particularly surprising are the intron-exon splice borders of the H3.3 gene. Not only do they contain the standard splice consensus sequences, but in all cases the introns are flanked by 7-8 base pair direct repeats. The function, if any, of these repeats is unclear, since the repeats include both intron and exon bases. One functional difference between these introns can be inferred from the structures of the previously isolated cDNAs. Three of the cDNAs were shown to contain an unspliced intron, but did not carry introns 2 and 3. This could reflect the preferential splicing out of introns 2 and 3 before the splicing out of intron 1. If there is a tendency toward 5′ to 3′ splicing, the unusual splice junctions seen for the H3.3 gene could act to supersede this tendency. The advantage to the organism to remove intron 1 last is unclear but could point to some as yet undetermined function for this intron. In support of this, we have found that a DNA probe derived from intron 1 hybridizes to a single fragment in a Southern blot of total mouse genomic DNA indicating that the sequences in this intron may be conserved, whereas a DNA probe derived from intron 2 does not hybridize.

Wells, D., D. Hoffman, and L. Kedes. 1987. Unusual structure, evolutionary conservation of non-coding sequences and numerous pseudogenes characterize the human H3.3 histone multigene family. Nucleic Acids Research 15: 2871-2889.

A mouse α-globin-related pseudogene (ψα30.5) completely lacks intervening sequences, and could not code for a functional globin polypeptide because of frameshifts. The widespread occurrence of globin pseudogenes in other species suggests that they are not ‘dead’ genes but may be important in controlling globin expression.

The general hypothesis that pseudogenes control the productive genes in some fashion, nevertheless, remains attractive and we are investigating the hypothesis further, including tests in non-erythroid tissues. Certainly, the widespread occurrence of globin pseudogenes argues strongly for their functional importance.

Vanin, E.F., G.I. Goldberg, P.W. Tucker, and O. Smithies. 1980. A mouse α-globin-related pseudogene lacking intervening sequences. Nature 286: 222-226.

The foregoing data support the concept that the so-called “junk” or genetically inactive DNA centered around the centromeric region has a function in controlling the separation of centromere (or its replication into two daughter centromeres) at the junction of metaphase-anaphase in mitosis.

Vig, B.K. 1982. Sequence of centromere separation: role of centromeric heterochromatin. Genetics 102: 795-806.

A highly conserved repetitive DNA sequence, (TTAGGG)n, has been isolated from a human recombinant repetitive DNA library. Quantitative hybridization to chromosomes sorted by flow cytometry indicates that comparable amounts of this sequence are present on each human chromosome. Both fluorescent in situ hybridization and BAL-31 nuclease digestion experiments reveal major clusters of this sequence at the telomeres of all human chromosomes. The evolutionary conservation of this DNA sequence, its terminal chromosomal location in a variety of higher eukaryotes (regardless of chromosome number or chromosome length), and its similarity to functional telomeres isolated from lower eukaryotes suggest that this sequence is a functional human telomere.

The human genome contains a variety of DNA sequences present in multiple copies. These repetitive DNA sequences are thought to arise by many mechanisms, from direct sequence amplification to the unequal recombination of homologous DNA regions to the reverse flow of genetic information. While it is likely that some of these repetitive DNA sequences influence the structure and function of the human genome, little experimental evidence supports this idea at present.
We reasoned, however, that evolutionary conservation of a particular repetitive DNA sequence family might imply that the sequence is essential to cellular function.

Moyzis, R.K., J.M. Buckingham, L.S. Cram, M. Dani, L.L. Deaven, M.D. Jones, J. Meyne, R.L. Ratliff, and J.-R. Wu. 1988. A highly conserved repetitive DNA sequence, (TTAGGG)n, present in the telomeres of human chromosomes. Proceedings of the National Academy of Sciences of the USA 85: 6622-6626.

____________

Part of the Quotes of interest series.


Quotes of interest — pseudogene.

The term “pseudogene” was coined by Jacq and colleagues in 1977. The standard tale of biologists dogmatically ignoring possible functions of noncoding DNA would have it that such a sequence automatically would be dismissed as “junk” when discovered, especially since the notion of a degraded and now non-coding former gene matches Ohno’s concept of “junk DNA” as originally proposed. The reality is that Jacq et al. (1977) did consider whether the sequence had a function, but based on the available data they concluded that the best explanation is that it is “an evolutionary relic”. They did not cite Ohno.

Summary
The 5S DNA of Xenopus laevis, coding for oocyte-type 5S RNA, consists of many copies of a tandemly repeated unit of about 700 base pairs. Each unit contains a “pseudogene” in addition to the gene. The pseudogene has been partly sequenced and appears to be an almost perfect repeat of 101 residues of the gene. The order of components in the repeat unit is (5′) long spacer-gene-linker-pseudogene (3′) in the “+” strand (or H strand) of the DNA. The possible function of the pseudogene is discussed.

The functions of the different regions of the 5S DNA are only imperfectly understood. The gene region 1-121 codes for the mature oocyte 5S RNA, and the presence of a pppG sequence at residue 1 of the mature 5S RNA defines this residue as the point of initiation of transcription by RNA polymerase III (Roeder, 1976). The point of termination of transcription, however, is less clear. Brown and Brown (1976) have argued that the high A + T-rich sequence of residues 119-123 of the gene region is a signal for the termination of transcription. But low yields of a larger transcription product–about 135 residues long–have been isolated by Denis and Wegnez (1973) in pulse-labeling experiments in Xenopus laevis oocytes. Similar length molecules have also been isolated in heat-shocked Drosophila cells by Rubin and Hogness (1975). While clear evidence that these 135-long molecules are precursors of the mature 5S RNA in Xenopus (or Drosophila) is lacking, their isolation clearly demonstrates that longer transcripts may be synthesized in vivo. It is therefore possible that the structural gene for 5s RNA is larger than the 121 residues of the mature 55 RNA and extends into the region of DNA, linking gene and pseudogene for at least another 15 residues.

Thus the known transcription of the 5S DNA system does not explain the presence of the pseudogene. Moreover, no RNA products corresponding to the pseudogene have been isolated, although it is conceivable that these may well have been overlooked or confused with tRNA in earlier studies (Denis and Wegnez, 1973), especially if they occur only in low yield. We are thus forced to the conclusion that the most probable explanation for the existence of the pseudogene is that it is a relic of evolution. During the evolution of the 5S DNA of Xenopus laevis, a gene duplication occurred producing the pseudogene. Presumably the pseudogene initially functioned as a 5S gene, but then, by mutation, diverged sufficiently from the gene in its sequence so that it was no longer transcribed into an RNA product.

This evolutionary explanation for the presence of the pseudogene, however, is incomplete by itself in that it ignores the conservation in sequence of the pseudogene, and indeed of the entire G + C-rich spacer of 5S DNA. In an attempt to explain this, it has been suggested (Brownlee, 1976) that the pseudogene may be a “transcribed spacer” corresponding to a primary transcript of 5s RNA, which is a transient precursor and has not been detected. If this is so, then most of the G + C-rich region of 5S DNA would be the structural gene for 5S RNA. This function, if true, would provide the necessary selective pressure to conserve the sequence of the “linker” and pseudogene region so that the correct processing of the postulated 300-long precursor was maintained. In the absence of any experimental evidence for such a long precursor, however, this suggestion must be regarded as speculative; it is more probable that the pseudogene is a relic of evolution.

____________

Part of the Quotes of interest series.
____________

Jacq, C., J.R. Miller, and G.G. Brownlee. 1977. A pseudogene structure in 5S DNA of Xenopus laevis. Cell 12: 109-120.


Quotes of interest — Nobel Prize special edition.

The story we have been told by creationists and neo-Panglossian scientists is that most if not all noncoding DNA is functional and that this fact has been obscured by long neglect in the scientific community of the potential importance of noncoding elements. In particular, the “junk DNA” and “selfish DNA” ideas put forth in the 1970s and 1980s are suggested to have stifled interest in the possible biological and medical importance of noncoding sequences, which have long been dismissed as irrelevant. The question is, did the scientific community turn its back on researchers interested in the roles of noncoding elements after 1980?

1983 Nobel Prize in Physiology or Medicine
to Barbara McClintock
For her discovery of
mobile genetic elements
[transposable elements]

Barbara McClintock discovered mobile genetic elements in plants more than 30 years ago. The discovery was made at a time when the genetic code and the structure of the DNA double helix were not yet known. It is only during the last ten years that the biological and medical significance of mobile genetic elements has become apparent. This type of element has now been found in microorganisms, insects, animals and man, and has been demonstrated to have important functions.

Such elements were also found to have an important function in the ability of unicellular parasites (trypanosomes) to change their surface properties, thereby avoiding the immune response of the host organism. Recombination of DNA segments proved to be an essential factor in the ability of lymphoid cells to produce a seemingly infinite number of different antibodies to foreign substances. In recent years, evidence has accumulated that transposition of genes or incomplete genes are involved in the transformation of normal cells into tumour cells. Thus, genes controlling cell growth have been found to undergo translocation from chromosome to another during cancerogenesis. The initial discovery of mobile genetic elements by Barbara McClintock is of great medical and biological significance. It has also resulted in new perspectives on how genes are formed and how they change during evolution.

http://nobelprize.org/nobel_prizes/medicine/laureates/1983/press.html


1993 Nobel Prize in Physiology or Medicine
to
Richard J. Roberts and Phillip A. Sharp
For their discovery of split genes
[introns and exons]

Roberts’ and Sharp’s discovery has changed our view on how genes in higher organisms develop during evolution. The discovery also led to the prediction of a new genetic process, namely that of splicing, which is essential for expressing the genetic information. The discovery of split genes has been of fundamental importance for today’s basic research in biology, as well as for more medically oriented research concerning the development of cancer and other diseases.

As a consequence of the discovery that genes are often split, it seems likely that higher organisms in addition to undergoing mutations may utilize another mechanism to speed up evolution: rearrangement (or shuffling) of gene segments to new functional units. This can take place in the germ cells through crossing-over during pairing of chromosomes. This hypothesis seems even more attractive following the discovery that individual exons in several cases correspond to building modules in proteins, so-called domains, to which specific functions can be attributed. An exon in the genome would thus correspond to a particular subfunction in the protein and the rearrangement of exons could result in a new combination of subfunctions in a protein. This kind of process could drive evolution considerably by rearranging modules with specific functions.

http://nobelprize.org/nobel_prizes/medicine/laureates/1993/press.html


____________

Part of the Quotes of interest series.




Quotes of interest — long neglected, some noncoding DNA is actually functional.

I have started a series listing quotes from papers published during the supposed period of neglect of noncoding DNA that, we are told repeatedly by authors of various persuasions, was inspired by the “junk DNA” and “selfish DNA” ideas. For this installment, I want to quote at length from one article which represents a typical discussion of some eukaryotic “junk DNA” turning out to have functions. This is the sort of thing we see regularly in the media and in the scientific literature, so a single example should be sufficient.

The protein-coding portions of the genes account for only about 3% of the DNA in the human genome; the other 97% encodes no proteins. Most of this enormous, silent genetic majority has long been thought to have no real function — hence its name: “junk DNA”. But one researcher’s trash is another researcher’s treasure, and a growing number of scientists believe that hidden in the junk DNA are intellectual riches that will lead to a better understanding of diseases (possibly including cancer), normal genome repair and regulation, and perhaps even the evolution of multicellular organisms.
Rather than the genes, junk DNA “is actually the challenge right now,” says Eric Lander of the Massachusetts Institute of Technology, who is himself a prominent Human Genome Project researcher. And in rising to meet that challenge, geneticists are beginning to formulate a new view of the genome. Rather than being considered a catalogue of useful genes interspersed with useless junk, each chromosome is beginning to be viewed as a complex “information organelle,” replete with sophisticated maintenance and control systems — some embedded in what was thought to be mere waste.

…when geneticists started studying complex, multicellular organisms, it was easy to dismiss the vast reaches of non-protein-coding DNA as a wasteland. Now, however, that notion is being overturned as researchers find that junk DNA is not a single midden heap, but a complex mix of different types of DNA, many of which are vital to the life of the cell.

Some of the earliest indications that junk DNA might have important functions came from studies on gene control. Those studies found that genes have regulatory sequences, short segments of DNA that serve as targets for the “transcription factors” that activate genes. Many of the regulatory sequences lie outside the protein-coding sequences — in the genetic garbage can. “There’s at least five regulatory elements for each [human] gene, probably many more,” says gene control expert Robert Tjian of the University of California, Berkeley. “For a long time it wasn’t appreciated how widespread those elements can be, but now it seems that patches of really important regulatory elements can be buried among the junk DNA.”

Now, however, it appears that some repetitive sequences may contain stretches of DNA needed for gene regulation. What is more, the function of these stretches must be significant, because if their sequences go astray they may result in cancer.

But housing sequences that control the genes isn’t the only role that so-called genetic trash plays. Some repetitive sequences also seem to have a crucial function in maintaining the structure of the genome.

Thus, in a dramatic reversal, the repetitive sequences, once thought to be the epitome of genetic debris, now seem to be needed to maintain the integrity of the chromosomes. But the repetitive sequences aren’t the only forms of genetic garbage moving up in the world. Whereas the repetitive sequences are usually found outside genes, a second type of genetic junk, the introns, are scattered through the genes of higher organisms.

Koop and Hood have found that the DNA of the T cell receptor complex, a crucial immune system protein, shows 71% identity between humans and mice. That finding is startling, since only 6% of the DNA encodes the actual protein sequence, while the rest consists of introns and noncoding regions. “[The finding] certainly questions the assumption that introns are junk,” says Koop. Instead, he says, “it fits the view that chromosomes are information organelles that carry out a variety of functions besides encoding genes, such as maintenance of genome structure and gene regulation.”
That opinion appeals to John Mattick, a molecular biologist at the University of Queensland in Australia, currently on sabbatical at Cambridge University in England. Mattick has proposed that introns provide a previously unsuspected system for regulating gene expression.

“[Mattick’s] idea is very interesting indeed,” says evolutionary geneticist Laurence Hurst of Cambridge University, England. “And it’s perfectly testable.” For example, he says, Mattick’s model predicts that certain genes, like regulatory developmental genes, that must be finely controlled, will likely bear intron-encoded regulatory RNAs.

“There’s too many cases of odd RNAs,” says molecular geneticist Marvin Wickens of the University of Wisconsin, Madison. “It smells like there might be a whole family of regulatory RNAs.” And if that suspicion proves correct, it would be a big boost for Mattick’s new theory, as well as for the status of junk DNA — a status that is likely to keep on rising over the next couple of years. Enough gems have already been uncovered in the genetic midden to show that what was once thought to be waste is definitely being transmuted into scientific gold.

You may be curious, out of all the discussions like this that are being published, why would this be the one that is singled out?

For a simple reason: it was written 14 years ago.

Nowak, R. 1994. Mining treasures from ‘junk DNA’. Science 263: 608-610.

I will talk about the timeline of the “junk DNA” discussion more comprehensively later, but here is what we can tell so far. The term “junk DNA” was coined by Ohno (1972), and in the first detailed discussion of the topic (Comings 1972), the likelihood that some noncoding DNA would be functional was explicitly noted. In any case, Ohno (1972) seems not to have had much influence during the first decade after he coined the term, because in 1980, when “selfish DNA” was introduced, the overwhelming tendency was to assume that all noncoding DNA was present because it was adaptive — this is why Orgel and Crick (1980) and Doolittle and Sapienza (1980) wrote their papers. There was strong resistance to the idea of selfish DNA for at least the first few years after the idea was proposed (Doolittle 1982), and even in the late 1980s there was at most discussion about how much noncoding DNA might be parasitic versus functional. Keep in mind also that DNA sequencing did not become a common method until the late 1970s/early 1980s, and that introns weren’t even discovered until 1977, and then much of the study focused on seeing how abundant they were and on their origin (were they present from the beginning, or did they arise only among eukaryotes?). The term “pseudogene” was coined in 1977 as well. So, the kind of work that people expect when they say detailed functional research wasn’t done could not have started until the 1980s in any case, and in fact there was abundant research investigating possible roles of satellite DNA, introns, and transposable elements during that decade. By the early 1990s, people had begun proposing additional functions for noncoding DNA, including Mattick’s idea about regulatory RNA sequences.

In other words, there was no real period in which noncoding DNA was dismissed by the scientific community, though there was a much-needed shift away from strictly adaptive interpretations in the 1980s. Some individual researchers ignored noncoding regions, but there is no gap in the literature other than limits on what could be done in a methodological capacity. The “new” view of noncoding DNA as potentially important has been proclaimed regularly for at least as long as the claimed period of neglect between 1980 and 1994.

One wonders just how long we will be told that we have long been neglecting noncoding DNA.

________

Part of the Quotes of interest series.
________

Comings, D.E. 1972. The structure and function of chromatin. Advances in Human Genetics 3: 237-431.

Doolittle, W.F. and C. Sapienza. 1980. Selfish genes, the phenotype paradigm and genome evolution. Nature 284: 601-603.

Doolittle, W.F. 1982. Selfish DNA after fourteen months. In Genome Evolution (eds. G.A. Dover and R.B. Flavell), pp. 3-28. Academic Press, New York.

Ohno, S. 1972. So much “junk” DNA in our genome. In Evolution of Genetic Systems (ed. H.H. Smith), pp. 366-370. Gordon and Breach, New York.

Orgel, L.E. and F.H.C. Crick. 1980. Selfish DNA: the ultimate parasite. Nature 284: 604-607.

_________

Part of the Quotes of interest series.

Quotes of interest — 1980s edition (part one).

I previously posted a few quotes from the original authors of the “junk DNA” and “selfish DNA” hypotheses. These showed that the early discussions of these notions did not rule out possible functions for noncoding DNA. Nevertheless, creationists, many science writers, and far too many biologists insist on claiming that noncoding DNA was long dismissed as unimportant because of these ideas. I will be discussing the history of research in this field in some detail later, but for the time being I thought it would be interesting to give some more quotes from papers written in top journals during the supposed period of disregard of noncoding sequences. This is quote-mining, of course, so you are encouraged to consult the original sources. I have not hand-picked these, rather these are the types of papers that come up in searches from this period. By all means, if you know of works from any time that claimed that all noncoding DNA is nonfunctional or discouraged research into possible functions let me know the citation.

There is obviously a continuum of possible selective advantages (positive or negative) to the organism. We had excluded from our definition of selfish DNA those cases where the selective advantage is very high. To decide whether a repeated sequence is parasitic or not, one must determine whether the presence of the repeated sequence in the population is mainly due to the efficiency with which the sequence spreads intragenomically or mainly due to the reproductive success of those individuals in the population who possess repeated copies of the sequence. Only in the former case do we consider it useful to use the term selfish or parasitic DNA, as opposed to useful or symbiotic DNA — the borderline between the two may not be sharp.

In our recent experience most people will agree, after discussion, that ignorant DNA, parasitic DNA, symbiotic DNA (that is, parasitic DNA which has become useful to the organism) and ‘dead’ DNA of one sort or another are all likely to be present in the chromosomes of higher organisms. Where people differ is in their estimates of the relative amounts. We feel that this can only be decided by experiment.

Orgel, L.E., F.H.C. Crick, and C. Sapienza. 1980. Selfish DNA. Nature 288: 645-646.

Perhaps the most surprising discovery in the initial studies of eukaryotic gene structure has been that many genes contain interruptions in the coding sequences. The origin and the function of these intervening sequences (IVS or introns) are not yet well understood but are the subject of intense investigation.

Wallace, R.B., P.F. Johnson, S. Tanaka, M. Schöld, K. Itakura, and J. Abelson. 1980. Directed deletion of a yeast transfer RNA intervening sequence. Science 209: 1396-1400.

As long ago as 1970, Ohno argued, on the basis of genetic load, that much of the eukaryote genome was little more than junk. This viewpoint, which is still unpalatable to many biologists, now has a substantial supporting DNA data base. More recently, this has led Ohno to conclude that genes in the mammalian genome are like ‘oases in a barren desert’ (Ohno 1982) and that for every copy of a new gene that has arisen during evolution, hundreds of other copies have ‘degenerated’ to swell the ranks of junk DNA (Ohno 1985).

It has, in the past, been commonplace to assume that most, if not all, aspects of the morphology, physiology and behaviour of an organism represent adaptive responses to the environment in which that organism lives. This assumption, however, is difficult to test objectively and represents more an article of faith than of fact. Indeed, biologists have become addicted to the adaptationist viewpoint not so much because of the compelling evidence in favour of it, but rather because it seems so eminently logical and reasonable. This view, of course, assumes that functional explanations must necessarily exist for all facets of the bewildering diversity we see within and between genomes. An alternative extreme viewpoint is that eukaryote genomes are, in effect, simply larger, more sophisticated and embellished prokaryote genomes, loaded with non-coding DNA sequences which are in a constant state of flux but without any significant short-term impact on the phenotype.
To decide which, if either, of these interpretations is the more realistic, we need to determine the number of functional genes within a genome and the proportion of these that are developmentally significant. We also require precise information on the changes that go on within a genome at the molecular level and the extent to which these lead to meaningful evolutionary change. Compared to the differences in structural gene composition between related species, we now know that there are much more striking molecular differences in their repeated DNA components. This raises the question of whether this is because such sequences are important or unimportant. There is also a clear need to distinguish between historical chance and biological necessity as causative factors in determining genome structure.

John, B. and G.L.G. Miklos. 1988. The Eukaryote Genome in Development and Evolution. Allen & Unwin, London. p.24-25.

Interest in repetitive DNA sequences goes back many years but, as with many aspects of molecular biology, the advent of recombinant DNA technology and DNA sequencing now permits previously unmatched scrutiny of the structures of interest.

If mobility is a reality, and most agree that it probably is, then it seems likely that at least some members of repeat families will have important effects in the genome, even if they have no formal function. Enhancing recombination and altering rates of gene expression are obvious possibilities, while the initiation of new species is a more recondite proposal.

The truth is, however, that the functions of the large and motley collection of repeated DNA families are proving particularly resistant to elucidation. Putative functions are many, including, variously, involvement in chromosome pairing, control of gene expression, processing of messenger RNA precursors, and participation in DNA replication. So far none has been established, save for the single exception of a small family that gives rise to 7S RNA, a molecule that recently was serendipitously discovered to be an essential component of a particle that mediates the secretion of proteins from cells.

Some repetitive DNA will undoubtedly be shown to have a function, in the formal sense; some will likely be shown to exert important effects; and the remainder may well have no function or effect at all and can therefore be called selfish DNA. Repetitive DNA constitutes a substantial proportion of the genome (up to 90 percent in some cases), and there is considerable speculation on how it will eventually be divided between these three groups. Current bets would put a small fraction in the function category, with distribution of the rest rising steeply through the effect and selfish categories.

Satellite DNA unquestionably is a puzzle. What determines the number of copies in a repeat family? And how does the genome tolerate so much of it? Perhaps, as Singer has recently promulgated, just a small fraction of the satellite sequences is essential to some genomic function while the remainder is harmless surplus. This, she indicates, is a comfortable middle ground between the extreme selfish DNA position, which sees no function in all this “junk DNA,” and the adaptationist position, which looks for functions in every structure. The same questions and speculations can be applied to dispersed repetitive DNA.

One observation that might be taken as evidence of function in repeated sequences is the frequency of transcription into RNA. A significant proportion of nuclear RNA contains transcripts of repeated sequences, although 90 percent of this is lost in RNA processing and exit to the cytoplasm. Davidson and his colleagues have shown that in sea urchin the spectrum of repeat families that are transcribed changes during development, an appealing argument for some regulatory function. Most intriguing, however, is the discovery that only a small proportion of any repeat family is ever transcribed. “Most members appear to be quiescent, which must make you cautious when isolating samples in search of their function.”

It is clear that, from their abundance, their unusual structure, and their frequent transcription, dispersed repetitive DNA families cannot be ignored. But it is equally clear that for the most part they, like their tandemly repeated relatives, remain a phenomenon in search of a function.

Lewin, R. 1982. Repeated DNA still in search of a function. Science 217: 621-623.
[Reporting about International Workshop in Highly Repeated DNA, NIH, July, 1982]

Even though the human β-globin complex contains a relatively large number of active genes, 95 percent of the locus is made up of DNA that does not code for proteins. What is the role of this extra DNA, if any? The pseudogenes constitute just a small proportion of the region, although more pseudogenes might exist. Some of the DNA is made up of representatives of well-known families of repetitive sequences. And the remainder is DNA of no known function or comparable sequence.
“We wanted to test the hypothesis that this extra DNA is ‘junk DNA,'” says Jeffreys, “so we compared the β loci in humans, gorillas, and baboons.” Jeffreys and his colleagues reasoned that if it were junk DNA, then over the 20 to 40 million years of evolution represented by humans, apes, and Old World monkeys both the sequence and the overall quantity of intergenic DNA could be expected to vary. “It turned out that the cluster is remarkably stable,” reports Jeffreys. “The overall pattern and size of the cluster is the same, and the rate of nucleotide substitutions is one-quarter to one-fifth of what be expected in functionless DNA”. The noncoding DNA therefore appears not to be junk, but what function it might perform is still a mystery.

Lewin, R. 1981. Evolutionary history written in globin genes. Science 214: 426-427.

Since the discovery that many eukaryotic genes are discontinuous, a number of studies have been directed towards identifying a function for intervening sequences (IVSs).

Whilst the results presented here point out a clear role for the intron in one tRNA gene family, a common function for all tRNA intervening sequences is not evident. Perhaps tRNA IVSs represent remnants of evolutionary gene rearrangements and only occasionally evolve a role in RNA synthesis. Alternatively, there may be a common but as yet identified function for these IVSs, and the role for the IVS described here for tRNA(tyr) may represent an auxiliary use of the precursor RNA. Clearly, analysis of IVS mutants in other tRNA gene families will be necessary to obtain definitive answers to these questions.

Johnson, P.F. and J. Abelson. 1983. The yeast tRNA(tyr) gene intron is essential for correct modification of its tRNA product. Nature 302: 681-687.

Repetitive sequences are interspersed with single-copy regions in the human genome. Because this arrangement is conserved in hetergeneous nuclear (hn) RNA, the role of repetitive sequences in the control of gene expression at the transcriptional and posttranscriptional levels is conjectured.

A large amount of evidence suggests that most double-stranded regions in hnRNA are transcripts of Alu repeats. The presence of the Alu repeat in mRNA may result from incomplete removal of Alu sequences in the nucleus, such that a region of homology to the Alu repeat is preserved. In this regard, we note that the region of association in RNA complexes (120bp) and the average size of R loops in groups III, IV, and V are significantly smaller than the Alu DNA sequence. This observation could also reflect involvement of Alu sequences in mRNA processing. Recently, evidence of molecular interactions among different species of cytoplasmic RNA has been reported. The presence of Alu repeat transcripts in different cytoplasmic molecules of either mRNA or 7S RNA suggests the potential for in vivo occurrence of interactions involving Alu repeat transcripts. Such interactions may also play a role in the cytoplasmic stability or translation efficiency of mRNA.
Finally, we find that it is most intriguing to have detected a significant frequency of complexes in hybridized RNA of normal T lymphocytes but not of placental tissue. This observation could reflect tissue-specific transcription of Alu sequences.

Calabretta, B., D.L. Robberson, A.L. Maizel, and G.F. Saunders. 1981. mRNA in human cells contains sequences complementary to the Alu family of repeated DNA. Proceedings of the National Academy of Sciences of the USA 78: 6003-6007.

The most striking feature of the Alu repeat family is its large numerical representation in the human genome, which suggests that Alu repeat sequences might be involved in genetic rearrangements, a role which could be identified if we consider the human genome to be a dynamic structure. Although most members of the Alu family are scattered throughout the human genome, some may be clustered in certain genomic regions. Such an arrangement would provide a good opportunity to test the hypothesis that repetitive sequences facilitate genetic rearrangements.

The pattern of interspersion may have been fixed in evolution, with certain Alu repeat members having been recruited for specific cellular functions, for example, in the initiation of DNA replication and as promotor sites for RNA polymerase III.

In general, the human genome seems to be a dynamic structure in which variations can be introduced by sequence rearrangement, certain of which can lead to the formation of circular duplex DNA molecules. This genetic plasticity is quite characteristic of transposable elements, and the consequent genome alterations are relevant to evolutionary changes, while the DNA rearrangements may be involved in human cancer.

Calabretta, B., D.L. Robberson, H.A. Barrera-Saldana, T.P. Lambrou, and G.F. Saunders. 1982. Genome instability in a region of human DNA enriched in Alu repeat sequences. Nature 296: 219-225.

Many students of DNA analysis have been unsuspectingly struck by the regularity and length of banding patterns on sequencing gels produced by simple repetitive DNA. When discussing the meaning of these simple DNA sequences, most professional genome watchers hesitate and simply refer to the enormous amount of sometimes unpalatable literature on the subject. They stress the complexity and intractability of the problem of simple sequences (and repetitive DNA as a whole). Investigators working directly in the field of repetitive sequences must justify the relevance of their efforts before their peers and granting agencies. On this truly meaningful and even existential note, I review here certain related members of the family of simple sequences — the GA(TC)A repeats.

Finally, possible functional implications are touched upong by covering RNA expression data of GA(TC)A-containing sequences. Hypotheses on the control of gene expression by GA(TC)A sequences are not covered because the experimental basis is at best scarce in animal systems. Nevertheless, it should be evident from this review that the conception that all the simple repetitive sequences are just “junk” or genes is simplistic. It is interesting but exceedingly difficult to speculate on why they are a characteristic component of the genomes of present-day animals.

All attempts to identify any natural GA(TC)A translation products in eukaryotes, for example, in monoclonal antibodies, proved fruitless. Hence the question of the functional meaning, if any, of simple, tandemly repeated sequences such as GA(TC)A DNA remains unanswered.
Because of the high copy numbers, the analysis of simple repetitive DNA is a serious, difficult, and unspectacular Sisyphean labor. We have learned to question many of the general preconceptions about the functionality of DNA sequences. Merely because they exist in the genomes of more or less related animal species does not mean that they have a function.

Epplen, J.T. 1988. On simple repeated GA(TC)A sequences in animal genomes: a critical reappraisal. Journal of Heredity 79: 409-417.

The slime moulds can therefore help us to investigate the structure and evolution of repetitive DNA in ‘simple’ eukaryotes and to understand how these sequences contribute to the architecture and function of the eukaryotic genome. Several questions remain, including perhaps the most important: do repetitive sequences perform some definable function?

DNA satellites and mobile genetic elements have both seemingly developed or adapted mechanisms which permit their sequences to multiply in eukaryotic genomes. As suggested a number of years ago, and recently reviewed, this line of thinking suggests that most, if not all, families of repetitive sequence may serve no useful function in eukaryotic DNA. This is the ‘selfish’ or ‘junk’ DNA hypothesis. There have been many supporters of the opposing view that at least some families of repeated sequence must perform some useful function, but so far no fully convincing case has been made for a clearly identifiable role for any repeated sequence family other than repeated genes such as those for rRNA. This may mean either that no such functions exist, or that experimentalists have hitherto possibly not been looking in the right direction. What new information has arisen from recent work that may provide clues as to which new directions to take? …

Hardman, N. 1986. Slime moulds and the origin of foldback DNA. BioEssays 5: 105-111.

There have been several suggested explanations for the presence of noncoding intervening sequences in many eukaryotic structural genes. They may be examples of ‘selfish DNA’, conferring little phenotypic advantage, or they may have some importance in gene expression and/or evolution.

It is possible that the relationship between the location of the splice junction in the gene at the surface of the protein confers a biological advantage and hence is a result of natural selection. Introns and their associated splicing systems could be exploited in many ways during the evolution of a protein.

Craik, C.S., S. Sprang, R. Fletterick, and W.J. Rutter. 1982. Intron-exon splice junctions map at protein surfaces. Nature 299: 180-182.

We conclude from this experiment that the intron in the yeast actin gene does not have an observable function. It is possible that the role of the intron is too subtle to be observed in laboratory conditions of growth or that the intron, while having evolutionary significance, has no present role. To conclude that this is true for all yeast genes that contain introns would of course be premature, but there exist strains in which mitochondrial introns have been removed with no observable effect.

Ng, R., H. Domdey, G. Larson, J.J. Rossi, and J. Abelson. 1985. A test for intron function in the yeast actin gene. Nature 314: 183-184.

Solutions to problems of how introns are dealt with by cells do not address the question of why introns are there at all, questions about intron function. Some introns in some genes perform clearly regulatory roles, since splicing factors specific to the tissue or developmental stage decide when and where splicing should occur (Breitbart et al. 1985). In addition, some introns in some genes contain enhancers or modulators of the expression of those genes (Slater et al. 1985). However, the great majority of introns in protein-coding genes have no such “functions.” Direct experimental as well as indirect comparative data show that most introns can be removed from genes without phenotypic effect (Blake 1985). Thus, in terms of beneficial effects on the fitnesses of organisms, we almost certainly cannot account for the presence of the majority of individual introns, nor for the propensity to have introns at all, even though introns may on the average represent as much as 90% of the length of a gene and perhaps as much as half of the total DNA in some complex eukaryotes such as humans.

Thinking about introns challenges basic concepts of adaptation and function. In particular, it challenges the rather strict adaptationist approach that molecular biologists have traditionally taken toward elements of gene structure.

Doolittle, W.F. 1987. The origin and function of intervening sequences in DNA: a review. American Naturalist 130: 915-928.

Ever since the discovery of split genes, there has been a debate about why they are split. This can be resolved into three separate problems: the origin of the introns that split the genes (separating exons from each other), the role of introns in evolution, and their present function, if any.

Rogers, J. 1985. Exon shuffling and intron insertion in serine protease genes. Nature 315: 458-459.

____________

Part of the Quotes of interest series.