I previously posted a few quotes from the original authors of the “junk DNA” and “selfish DNA” hypotheses. These showed that the early discussions of these notions did not rule out possible functions for noncoding DNA. Nevertheless, creationists, many science writers, and far too many biologists insist on claiming that noncoding DNA was long dismissed as unimportant because of these ideas. I will be discussing the history of research in this field in some detail later, but for the time being I thought it would be interesting to give some more quotes from papers written in top journals during the supposed period of disregard of noncoding sequences. This is quote-mining, of course, so you are encouraged to consult the original sources. I have not hand-picked these, rather these are the types of papers that come up in searches from this period. By all means, if you know of works from any time that claimed that all noncoding DNA is nonfunctional or discouraged research into possible functions let me know the citation.
There is obviously a continuum of possible selective advantages (positive or negative) to the organism. We had excluded from our definition of selfish DNA those cases where the selective advantage is very high. To decide whether a repeated sequence is parasitic or not, one must determine whether the presence of the repeated sequence in the population is mainly due to the efficiency with which the sequence spreads intragenomically or mainly due to the reproductive success of those individuals in the population who possess repeated copies of the sequence. Only in the former case do we consider it useful to use the term selfish or parasitic DNA, as opposed to useful or symbiotic DNA — the borderline between the two may not be sharp.
In our recent experience most people will agree, after discussion, that ignorant DNA, parasitic DNA, symbiotic DNA (that is, parasitic DNA which has become useful to the organism) and ‘dead’ DNA of one sort or another are all likely to be present in the chromosomes of higher organisms. Where people differ is in their estimates of the relative amounts. We feel that this can only be decided by experiment.
Orgel, L.E., F.H.C. Crick, and C. Sapienza. 1980. Selfish DNA. Nature 288: 645-646.
Perhaps the most surprising discovery in the initial studies of eukaryotic gene structure has been that many genes contain interruptions in the coding sequences. The origin and the function of these intervening sequences (IVS or introns) are not yet well understood but are the subject of intense investigation.
Wallace, R.B., P.F. Johnson, S. Tanaka, M. SchÃ¶ld, K. Itakura, and J. Abelson. 1980. Directed deletion of a yeast transfer RNA intervening sequence. Science 209: 1396-1400.
As long ago as 1970, Ohno argued, on the basis of genetic load, that much of the eukaryote genome was little more than junk. This viewpoint, which is still unpalatable to many biologists, now has a substantial supporting DNA data base. More recently, this has led Ohno to conclude that genes in the mammalian genome are like ‘oases in a barren desert’ (Ohno 1982) and that for every copy of a new gene that has arisen during evolution, hundreds of other copies have ‘degenerated’ to swell the ranks of junk DNA (Ohno 1985).
It has, in the past, been commonplace to assume that most, if not all, aspects of the morphology, physiology and behaviour of an organism represent adaptive responses to the environment in which that organism lives. This assumption, however, is difficult to test objectively and represents more an article of faith than of fact. Indeed, biologists have become addicted to the adaptationist viewpoint not so much because of the compelling evidence in favour of it, but rather because it seems so eminently logical and reasonable. This view, of course, assumes that functional explanations must necessarily exist for all facets of the bewildering diversity we see within and between genomes. An alternative extreme viewpoint is that eukaryote genomes are, in effect, simply larger, more sophisticated and embellished prokaryote genomes, loaded with non-coding DNA sequences which are in a constant state of flux but without any significant short-term impact on the phenotype.
To decide which, if either, of these interpretations is the more realistic, we need to determine the number of functional genes within a genome and the proportion of these that are developmentally significant. We also require precise information on the changes that go on within a genome at the molecular level and the extent to which these lead to meaningful evolutionary change. Compared to the differences in structural gene composition between related species, we now know that there are much more striking molecular differences in their repeated DNA components. This raises the question of whether this is because such sequences are important or unimportant. There is also a clear need to distinguish between historical chance and biological necessity as causative factors in determining genome structure.
John, B. and G.L.G. Miklos. 1988. The Eukaryote Genome in Development and Evolution. Allen & Unwin, London. p.24-25.
Interest in repetitive DNA sequences goes back many years but, as with many aspects of molecular biology, the advent of recombinant DNA technology and DNA sequencing now permits previously unmatched scrutiny of the structures of interest.
If mobility is a reality, and most agree that it probably is, then it seems likely that at least some members of repeat families will have important effects in the genome, even if they have no formal function. Enhancing recombination and altering rates of gene expression are obvious possibilities, while the initiation of new species is a more recondite proposal.
The truth is, however, that the functions of the large and motley collection of repeated DNA families are proving particularly resistant to elucidation. Putative functions are many, including, variously, involvement in chromosome pairing, control of gene expression, processing of messenger RNA precursors, and participation in DNA replication. So far none has been established, save for the single exception of a small family that gives rise to 7S RNA, a molecule that recently was serendipitously discovered to be an essential component of a particle that mediates the secretion of proteins from cells.
Some repetitive DNA will undoubtedly be shown to have a function, in the formal sense; some will likely be shown to exert important effects; and the remainder may well have no function or effect at all and can therefore be called selfish DNA. Repetitive DNA constitutes a substantial proportion of the genome (up to 90 percent in some cases), and there is considerable speculation on how it will eventually be divided between these three groups. Current bets would put a small fraction in the function category, with distribution of the rest rising steeply through the effect and selfish categories.
Satellite DNA unquestionably is a puzzle. What determines the number of copies in a repeat family? And how does the genome tolerate so much of it? Perhaps, as Singer has recently promulgated, just a small fraction of the satellite sequences is essential to some genomic function while the remainder is harmless surplus. This, she indicates, is a comfortable middle ground between the extreme selfish DNA position, which sees no function in all this “junk DNA,” and the adaptationist position, which looks for functions in every structure. The same questions and speculations can be applied to dispersed repetitive DNA.
One observation that might be taken as evidence of function in repeated sequences is the frequency of transcription into RNA. A significant proportion of nuclear RNA contains transcripts of repeated sequences, although 90 percent of this is lost in RNA processing and exit to the cytoplasm. Davidson and his colleagues have shown that in sea urchin the spectrum of repeat families that are transcribed changes during development, an appealing argument for some regulatory function. Most intriguing, however, is the discovery that only a small proportion of any repeat family is ever transcribed. “Most members appear to be quiescent, which must make you cautious when isolating samples in search of their function.”
It is clear that, from their abundance, their unusual structure, and their frequent transcription, dispersed repetitive DNA families cannot be ignored. But it is equally clear that for the most part they, like their tandemly repeated relatives, remain a phenomenon in search of a function.
Lewin, R. 1982. Repeated DNA still in search of a function. Science 217: 621-623.
[Reporting about International Workshop in Highly Repeated DNA, NIH, July, 1982]
Even though the human Î²-globin complex contains a relatively large number of active genes, 95 percent of the locus is made up of DNA that does not code for proteins. What is the role of this extra DNA, if any? The pseudogenes constitute just a small proportion of the region, although more pseudogenes might exist. Some of the DNA is made up of representatives of well-known families of repetitive sequences. And the remainder is DNA of no known function or comparable sequence.
“We wanted to test the hypothesis that this extra DNA is ‘junk DNA,'” says Jeffreys, “so we compared the Î² loci in humans, gorillas, and baboons.” Jeffreys and his colleagues reasoned that if it were junk DNA, then over the 20 to 40 million years of evolution represented by humans, apes, and Old World monkeys both the sequence and the overall quantity of intergenic DNA could be expected to vary. “It turned out that the cluster is remarkably stable,” reports Jeffreys. “The overall pattern and size of the cluster is the same, and the rate of nucleotide substitutions is one-quarter to one-fifth of what be expected in functionless DNA”. The noncoding DNA therefore appears not to be junk, but what function it might perform is still a mystery.
Lewin, R. 1981. Evolutionary history written in globin genes. Science 214: 426-427.
Since the discovery that many eukaryotic genes are discontinuous, a number of studies have been directed towards identifying a function for intervening sequences (IVSs).
Whilst the results presented here point out a clear role for the intron in one tRNA gene family, a common function for all tRNA intervening sequences is not evident. Perhaps tRNA IVSs represent remnants of evolutionary gene rearrangements and only occasionally evolve a role in RNA synthesis. Alternatively, there may be a common but as yet identified function for these IVSs, and the role for the IVS described here for tRNA(tyr) may represent an auxiliary use of the precursor RNA. Clearly, analysis of IVS mutants in other tRNA gene families will be necessary to obtain definitive answers to these questions.
Johnson, P.F. and J. Abelson. 1983. The yeast tRNA(tyr) gene intron is essential for correct modification of its tRNA product. Nature 302: 681-687.
Repetitive sequences are interspersed with single-copy regions in the human genome. Because this arrangement is conserved in hetergeneous nuclear (hn) RNA, the role of repetitive sequences in the control of gene expression at the transcriptional and posttranscriptional levels is conjectured.
A large amount of evidence suggests that most double-stranded regions in hnRNA are transcripts of Alu repeats. The presence of the Alu repeat in mRNA may result from incomplete removal of Alu sequences in the nucleus, such that a region of homology to the Alu repeat is preserved. In this regard, we note that the region of association in RNA complexes (120bp) and the average size of R loops in groups III, IV, and V are significantly smaller than the Alu DNA sequence. This observation could also reflect involvement of Alu sequences in mRNA processing. Recently, evidence of molecular interactions among different species of cytoplasmic RNA has been reported. The presence of Alu repeat transcripts in different cytoplasmic molecules of either mRNA or 7S RNA suggests the potential for in vivo occurrence of interactions involving Alu repeat transcripts. Such interactions may also play a role in the cytoplasmic stability or translation efficiency of mRNA.
Finally, we find that it is most intriguing to have detected a significant frequency of complexes in hybridized RNA of normal T lymphocytes but not of placental tissue. This observation could reflect tissue-specific transcription of Alu sequences.
Calabretta, B., D.L. Robberson, A.L. Maizel, and G.F. Saunders. 1981. mRNA in human cells contains sequences complementary to the Alu family of repeated DNA. Proceedings of the National Academy of Sciences of the USA 78: 6003-6007.
The most striking feature of the Alu repeat family is its large numerical representation in the human genome, which suggests that Alu repeat sequences might be involved in genetic rearrangements, a role which could be identified if we consider the human genome to be a dynamic structure. Although most members of the Alu family are scattered throughout the human genome, some may be clustered in certain genomic regions. Such an arrangement would provide a good opportunity to test the hypothesis that repetitive sequences facilitate genetic rearrangements.
The pattern of interspersion may have been fixed in evolution, with certain Alu repeat members having been recruited for specific cellular functions, for example, in the initiation of DNA replication and as promotor sites for RNA polymerase III.
In general, the human genome seems to be a dynamic structure in which variations can be introduced by sequence rearrangement, certain of which can lead to the formation of circular duplex DNA molecules. This genetic plasticity is quite characteristic of transposable elements, and the consequent genome alterations are relevant to evolutionary changes, while the DNA rearrangements may be involved in human cancer.
Calabretta, B., D.L. Robberson, H.A. Barrera-Saldana, T.P. Lambrou, and G.F. Saunders. 1982. Genome instability in a region of human DNA enriched in Alu repeat sequences. Nature 296: 219-225.
Many students of DNA analysis have been unsuspectingly struck by the regularity and length of banding patterns on sequencing gels produced by simple repetitive DNA. When discussing the meaning of these simple DNA sequences, most professional genome watchers hesitate and simply refer to the enormous amount of sometimes unpalatable literature on the subject. They stress the complexity and intractability of the problem of simple sequences (and repetitive DNA as a whole). Investigators working directly in the field of repetitive sequences must justify the relevance of their efforts before their peers and granting agencies. On this truly meaningful and even existential note, I review here certain related members of the family of simple sequences — the GA(TC)A repeats.
Finally, possible functional implications are touched upong by covering RNA expression data of GA(TC)A-containing sequences. Hypotheses on the control of gene expression by GA(TC)A sequences are not covered because the experimental basis is at best scarce in animal systems. Nevertheless, it should be evident from this review that the conception that all the simple repetitive sequences are just “junk” or genes is simplistic. It is interesting but exceedingly difficult to speculate on why they are a characteristic component of the genomes of present-day animals.
All attempts to identify any natural GA(TC)A translation products in eukaryotes, for example, in monoclonal antibodies, proved fruitless. Hence the question of the functional meaning, if any, of simple, tandemly repeated sequences such as GA(TC)A DNA remains unanswered.
Because of the high copy numbers, the analysis of simple repetitive DNA is a serious, difficult, and unspectacular Sisyphean labor. We have learned to question many of the general preconceptions about the functionality of DNA sequences. Merely because they exist in the genomes of more or less related animal species does not mean that they have a function.
Epplen, J.T. 1988. On simple repeated GA(TC)A sequences in animal genomes: a critical reappraisal. Journal of Heredity 79: 409-417.
The slime moulds can therefore help us to investigate the structure and evolution of repetitive DNA in ‘simple’ eukaryotes and to understand how these sequences contribute to the architecture and function of the eukaryotic genome. Several questions remain, including perhaps the most important: do repetitive sequences perform some definable function?
DNA satellites and mobile genetic elements have both seemingly developed or adapted mechanisms which permit their sequences to multiply in eukaryotic genomes. As suggested a number of years ago, and recently reviewed, this line of thinking suggests that most, if not all, families of repetitive sequence may serve no useful function in eukaryotic DNA. This is the ‘selfish’ or ‘junk’ DNA hypothesis. There have been many supporters of the opposing view that at least some families of repeated sequence must perform some useful function, but so far no fully convincing case has been made for a clearly identifiable role for any repeated sequence family other than repeated genes such as those for rRNA. This may mean either that no such functions exist, or that experimentalists have hitherto possibly not been looking in the right direction. What new information has arisen from recent work that may provide clues as to which new directions to take? …
Hardman, N. 1986. Slime moulds and the origin of foldback DNA. BioEssays 5: 105-111.
There have been several suggested explanations for the presence of noncoding intervening sequences in many eukaryotic structural genes. They may be examples of ‘selfish DNA’, conferring little phenotypic advantage, or they may have some importance in gene expression and/or evolution.
It is possible that the relationship between the location of the splice junction in the gene at the surface of the protein confers a biological advantage and hence is a result of natural selection. Introns and their associated splicing systems could be exploited in many ways during the evolution of a protein.
Craik, C.S., S. Sprang, R. Fletterick, and W.J. Rutter. 1982. Intron-exon splice junctions map at protein surfaces. Nature 299: 180-182.
We conclude from this experiment that the intron in the yeast actin gene does not have an observable function. It is possible that the role of the intron is too subtle to be observed in laboratory conditions of growth or that the intron, while having evolutionary significance, has no present role. To conclude that this is true for all yeast genes that contain introns would of course be premature, but there exist strains in which mitochondrial introns have been removed with no observable effect.
Ng, R., H. Domdey, G. Larson, J.J. Rossi, and J. Abelson. 1985. A test for intron function in the yeast actin gene. Nature 314: 183-184.
Solutions to problems of how introns are dealt with by cells do not address the question of why introns are there at all, questions about intron function. Some introns in some genes perform clearly regulatory roles, since splicing factors specific to the tissue or developmental stage decide when and where splicing should occur (Breitbart et al. 1985). In addition, some introns in some genes contain enhancers or modulators of the expression of those genes (Slater et al. 1985). However, the great majority of introns in protein-coding genes have no such “functions.” Direct experimental as well as indirect comparative data show that most introns can be removed from genes without phenotypic effect (Blake 1985). Thus, in terms of beneficial effects on the fitnesses of organisms, we almost certainly cannot account for the presence of the majority of individual introns, nor for the propensity to have introns at all, even though introns may on the average represent as much as 90% of the length of a gene and perhaps as much as half of the total DNA in some complex eukaryotes such as humans.
Thinking about introns challenges basic concepts of adaptation and function. In particular, it challenges the rather strict adaptationist approach that molecular biologists have traditionally taken toward elements of gene structure.
Doolittle, W.F. 1987. The origin and function of intervening sequences in DNA: a review. American Naturalist 130: 915-928.
Ever since the discovery of split genes, there has been a debate about why they are split. This can be resolved into three separate problems: the origin of the introns that split the genes (separating exons from each other), the role of introns in evolution, and their present function, if any.
Rogers, J. 1985. Exon shuffling and intron insertion in serine protease genes. Nature 315: 458-459.
Part of the Quotes of interest series.