I am hopeful that our exploration of the peer-reviewed scientific literature and related news stories in scientific journals from the 1960s to the 1990s convincingly reveals that those who claim that junk DNA was “long dismissed as irrelevant” have it exactly backwards. Throughout this period, but especially before the non-adaptationist (though not exclusive) alternative offered by the selfish DNA hypothesis began to influence thinking on the topic by the mid-1980s, it was assumed, following Darwinian logic, that the very existence of so much DNA meant that it must be functional for the organism. It is only after considerable empirical investigation of potential functions that it became a common view that most (but certainly not all) noncoding DNA is unlikely to be functional at the level of the organismal phenotype.
I have already mentioned Alu elements — by far the most common single type of noncoding DNA element in the human genome. Alu elements are part of the category of repetitive DNA known as SINEs, which stands for short interspersed repeated sequences (or short interspersed nuclear elements). These sequences are now recognized as a type of transposable element that uses an RNA intermediate (i.e., undergoes retrotransposition) but which cannot do so without borrowing (some say parasitizing) the molecular transposition apparatus of other elements, namely long interspersed repeated sequences (LINEs). LINEs are not as common in the human genome as SINEs, but as they are much larger, they make up more of the total DNA. Whereas there are about 1.5 million SINEs (1 million of them Alu) making up about 13% of the genome sequence, the 870,000 or so copies of LINE elements (more than 500,000 of them LINE-1) constitute more than 20% of human DNA.
The terms SINE and LINE were coined by Maxine Singer in 1982 (Singer 1982a). By that time, the term “junk DNA” (Ohno 1972; Comings 1972) had been in circulation for a decade, and this was also two years after the “selfish DNA” hypothesis was put forward by Orgel and Crick (1980) and Doolittle and Sapienza (1980). Singer (1982b) cited these latter papers (but not Ohno’s) in her longer review of mammalian repeated DNA sequences. So once again, we have a prime candidate for assessing the general attitude in the scientific community regarding possible function of noncoding DNA sequences during the supposed period of neglect.
Were SINEs and LINEs dismissed as mere junk unworthy of further exploration?
The critical question about SINEs and LINEs concerns their function, if they have any. The catalog of proposed functions for SINEs includes many of the unsolved problems in molecular biology, but none has been demonstrated directly. The existence of RNA transcripts from some SINE-family members is the most compelling argument available that they have a function, although functions independent of transcription (and in addition to transposition) have also been suggested. (The possibility that LINEs are transcribed requires investigation). Particularly striking is the fact that the 4.5S transcripts of Alu-like SINEs of hamster and mice are more than 95% identical in sequence, which is significantly closer than the variation among the different copies of a SINE family in a single species. If we assume that one or a few SINEs encode the 4.5S RNAs, is there any functional significance to the many other dispersed copies of family members? It seems reasonable to expect that there is some trade-off between an advantage imparted to cells by SINEs and the disadvantage of a promiscuous and abundant mobile element that is presumably destructive if implanted in an essential coding region.
[A number of in-line citations have been omitted for clarity]
Are SINES functional?
As a background, it is interesting to recall proposals suggesting that highly repeated dispersed sequences may be without function (Orgel and Crick 1980; Doolittle and Sapienza 1980) and also disagreement concerning those proposals (Cavalier-Smith 1980; Dover 1980; T.F. Smith 1980; Orgel et al. 1980; Dover and Doolittle 1980). Specific functions that have been suggested include the control of gene expression, perhaps by involvement of transcripts of SINES in the maturation of messenger RNA, and service as origins of DNA replication.
The following additional point may be important, in view of the suggestions that highly repeated sequences have no function at all. A mobile element may generate diversity with a potential selective advantage, but it can also generate disadvantage if it moves into an essential gene. Mutation by movable elements has been demonstrated in yeast and Drosophila. The high frequency of mutation caused by the presence of large numbers of movable elements within a mammalian genome might have proven intolerable and been selected against, unless it was counterbalanced by some positive functional advantage.
Finally, the suggestion that SINES may serve as origins for DNA replication should be considered. The basis for the suggestion is the presence in SINES of a short (14bp) homology to a sequence associated with the origin of replication of murine and primate popaviruses. Georgiev et al. (1981) describe some preliminary experiments that are consistent with this suggestion. However, in popavirus genomes this region is part of a complex control region and may be involved in the control of transcription as well as replication. Only additional experiments will resolve these questions.
Are LINES functional?
The discovery of LINE families in mammals is recent and there is very little information available regarding function. Adams et al. (1980) found no transcripts homologous to the human Kpn-LINE family in bone marrow cells and Manuelidis  also reports negative preliminary experiments. There is no information available regarding the possibility that LINES are mobile in mammalian genomes.
As noted previously, the SINE Alu was first described in 1979, and the first LINEs were discovered using similar methods around 1980. Singer (1982b) cites several publications and articles in press detailing sequences of this type from the human and mouse genomes. Most of these papers did not include any discussion one way or the other about function and focused instead on the technique used or the specific molecular characteristics of the sequences. However, one of the early papers did discuss function (and non-function).
Adams et al. (1980):
As to the function or genesis of this sequence we can make only vague hypotheses. The fact that it is not expressed into RNA, at least in bone marrow cells, at levels proportionate to its reiteration frequency, suggests that it does not code for a protein or major nuclear RNA in this tissue. However, there may be a low-level transcript which has some functional role, or there may be transcription in some other tissue. Alternatively this sequence may be a binding site for a chromosomal protein, or serve as a signal for chromosomal folding. As such it could conceivably have some role in the regulation of expression of the β-globin or other nearby genes. The interspersion of this sequence among other DNA is consistant with but not by itself supportive of such a role. Finally it is possible that this repeated sequence has no function relevant to the organism, but is carried in the genome in an essentially parasitic fashion (Doolittle and Sapienza 1980).
Part of the Quotes of interest series.
Adams, J.W., R.E. Kaufman, P.J. Kretschmer, M. Harrison, and A.W. Nienhuis. 1980. A family of long reiterated DNA sequences, one copy of which is next to the human beta globin gene. Nucleic Acids Research 8: 6113-6128.
Cavalier-Smith, T. 1980. How selfish is DNA? Nature 285: 617-618.
Comings, D.E. 1972. The structure and function of chromatin. Advances in Human Genetics 3: 237-431.
Doolittle, W.F. and C. Sapienza. 1980. Selfish genes, the phenotype paradigm and genome evolution. Nature 284: 601-603.
Dover, G. and W.F. Doolittle. 1980. Modes of genome evolution. Nature 288: 646-647.
Georgiev, G.P., Y.V. Ilyin, V.G. Chmeliauskaite, A.P. Ryskov, D.A. Kramerov, K. G. Skryabin, A. S. Krayev, E. M. Lukanidin, and M. S. Grigoryan. 1981. Mobile dispersed genetic elements and other middle repetitive DNA sequences in the genomes of Drosophila and mouse: transcription and biological significance. Cold Spring Harbor Symposia on Quantitative Biology 45: 641-654.
Manuelidis, L. 1982. Repeated DNA sequences and nuclear structure. In Genome Evolution (eds. G. Dover and A. Flavell), pp. 263-285. Academic Press, New York.
Ohno, S. 1972. So much “junk” DNA in our genome. In Evolution of Genetic Systems (ed. H.H. Smith), pp. 366-370. Gordon and Breach, New York.
Orgel, L.E. and F.H.C. Crick. 1980. Selfish DNA: the ultimate parasite. Nature 284: 604-607.
Orgel, L.E., F.H.C. Crick, and C. Sapienza. 1980. Selfish DNA. Nature 288: 645-646.
Singer, M.F. 1982a. SINEs and LINEs: highly repeated short and long interspersed sequences in mammalian genomes. Cell 28: 433-434.
Singer, M.F. 1982b. Highly repeated sequences in mammalian genomes. International Review of Cytology 76: 67-112.
Smith, T.F. 1980. Occam’s razor. Nature 285: 620.