Quotes of interest — beware single citations and non-citations.

As readers who have been following the Quotes of interest series will know, I have been arguing that from the discovery of repetitive DNA until at least the mid-1980s, the general expectation was that it must somehow be functional for the organism. By 1989 or 1990, we start to see claims that noncoding sequences were “long dismissed” as mere “junk” or “parasites”, and that biologists finally are beginning to recognize that it is interesting. We have heard this line, more or less unaltered, ever since. This statement is usually not backed up with any reference to the literature, it is more of a “general sense” sort of claim. Everybody knows junk DNA was dismissed as irrelevant, so using the standard statement to that effect need not be supported by any actual citations. Either that, or authors who make the statement will cite Ohno (1972), and possibly also Doolittle and Sapienza (1980) and Orgel and Crick (1980), even though these original publications did not in any way dismiss possible functions for some noncoding elements.

This trend of picking one or two references to cite (if any) in support of the claim that all of biology neglected the possible functions of noncoding DNA goes back nearly two decades. I have already provided this quote in a previous post, but I want to use it as an example of how single citations (not to mention non-citations) can be extraordinarily misleading about the past state of the field when it comes to this issue. In 1990, Willard argued that,

Although it has been recognized for over 20 years that the centromeric heterochromatin in chromosomes from virtually all complex eukaryotic organisms consists of various families of satellite DNA, they have only recently been taken seriously as candidates for something other than ‘junk’ DNA or genomic ‘flotsam and jetsam’ (Miklos 1985).

I think it is obvious from the summary of the literature on satellite DNA that the claim that its possible role in chromosomal structure was neglected is incorrect. But what of the citation of Miklos (1985)? Did he represent at least a minority of biologists who long dismissed satellite DNA as nothing more than “junk” or “flotsam and jetsam”?

As it turns out, I have already cited a few of Miklos’s publications from the 1970s and 1980s. Here are some highlights:

Yamamoto and Miklos (1978):

The most important aspect of satellite DNA remains the nature of its functions. Although a large body of data has been gathered concerning its structure, distribution and properties in several different organisms, most of these results have in fact neither supported nor disproved any one of the particular hypotheses of function. The most popular hypothesis on satellite DNA function has been, and still is, that satellite DNA is involved in some aspect of chromosome mechanics such as chromosome pairing.

John and Miklos (1979):

Thus the large amount of satellite DNA in some species, its apparent rigid conservation in sequence and, as we shall see, its effects on the genome when it is altered in amount or position lead us to be unimpressed in general with the argument that most of it constitutes a functionless burden which many eukaryotes must bear. However, for the moment we will retain an open mind and examine the hard data pertaining to function before casting a final judgment at the end of this article.

So, it would appear that Miklos had an interest in the possible functions of satellite DNA, and he certainly took the possibility seriously that they could be important for the chromosome. By 1985, his view had changed to a large extent on this, but this was due to his evaluation of the data, not a long-held assumption about nonfunction.

As he wrote in 1985,

It is this blatant profligacy of repeated sequences within some eukaryotic genomes that fascinates many investigators, and has led to two general and mutually exclusive conclusions concerning their status in the genome. One is that these sequences have functions; the other is that they represent the natural outcomes of genomic turnover processes that involve selfish, ignorant, or junk DNA. The functionalists rationalize the existence of these sequences have been held to function, for example, in determining centromere strength, chromosome pairing, recombination, the three-dimensional architecture of the nucleus, genomic reorganization, and speciation.

On the other hand the proponents of the second class of hypotheses propose that these sequences are inevtiable byproducts of molecular mechanisms involved in DNA replication and recombination, and that such genomic turnover is not supervised by natural selection acting through the phenotype.

These viewpoints accurately reflect other deep divisions in contemporary biology, such as those concerning selection versus neutrality and function versus junk. An overt unease is now apparent in studies on highly repetitive DNAs, since, despite focusing the awesome power of recombinant DNA technology onto the structure of these sequences, the expected biological roles have still proved elusive.

The whole problem of whether there was a function at all for highly repetitive sequences exploded into the harsh glare of criticism when the concept of selfish DNA was propounded. Doolittle and Sapienza (1980), presenting arguments based on the properties of mobile elements, and Orgel and Crick (1980), using similar arguments pertinent to highly repetitive DNAs, argued that sequences having little effect on the phenotype could be spread throughout a genome simply on the basis of their preferential replicative properties.

I have spelled out these classical hypotheses because they can be compared with the themes of the 1970s and 1980s, which are expressed in terms of DNA sequences. Thus satellite or highly repetitive DNAs have been invoked in chromosome pairing, chromosomal rearrangements, speciation, germ line processes, alteration of regulatory pathways via genomic reorganization, and the three-dimensional structure of the interphase nucleus. Highly repetitive DNA sequences may, alternatively, be thought of as examples of selfish DNA or ignorant DNA. Thus highly repetitive DNAs are considered to be subject to natural selection, to be neutral, or to be subject to molecular drive.

This litany of functional explanations led Doolittle and Sapienza (1980) and Orgel and Crick (1980) to formulate their concept of selfishness, in which no phenotypic or evolutionary function was called for.

After this introduction, Miklos (1985) spent 58 pages reviewing the available data on highly repetitive DNA sequences. In the end, it is clear that two major developments influenced Miklos’s views on satellite DNA: 1) despite considerable effort to generate supporting data for proposed functions, these remained lacking in his view, and 2) there was now an alternative, non-adaptive explanation that could account for the existence of much noncoding DNA.

Miklos (1985) further discussed the significance of the new, non-adaptationist, selfish DNA hypothesis near the end of his chapter.

Doolittle and Sapienza (1980), Doolittle (1982), and Orgel and Crick (1980) have drawn attention to what has bogged down our evolutionary concepts of the genome. We have been wedded to thinking of it in functional terms and have considered natural selection as an omnipotent reaper that would rapidly slash any nonproductive DNA from the genome. The existence of highly repetitive sequences has inevtiably led to a search for their function, and this in turn has revolved around a role in chromosome mechanics or chromosome architecture. The data, however, do not support either of these theories. Consequently, evolutionary avenues have been explored. Thus, highly repetitive sequences have been invoked in facilitating chromosome rearrangements, but once again the data base does not favor such excursions. Similarly, a role has been sought for them in speciation processes, but again no unbroken chain of data leads to this as a likely possibility. Doolittle and Sapienza (1980), Doolittle (1982), stressed that most “explanations” are unrealistic. If there are DNA sequences without effects on phenotype, then they can arise, spread, and be maintained in a functional vacuum. The data indicate that satellite sequences per se have no known phenotypic effects.

Miklos was not one to simply dismiss noncoding DNA as “junk” without a thought. If anything, he was among the most ardent advocates of the need for data to test functional hypotheses about noncoding DNA, and he contributed a significant number of these data himself. His view that most noncoding DNA is non-functional for organisms came only after careful evaluation of the data. As he also noted in 1985,

The cloned sequence data together with the biological data are impressive in their implications; they have negated every functional test to which satellite DNA has been put in a mechanistic cellular domain and they have seriously compromised many of the evolutionary concepts pertaining to this class of DNA. In this field there has been an almost overpowering reluctance to lay aside hypotheses that were erected in the absence of relevant data. There has been a desire to avoid the unpalatable, or the unthinkable — that tandem arrays of highly repetitive sequences have no functional characteristics whatsoever but are the ignorant playthings of genomic turnover processes. There is a fear that if not one [adaptive] hypothesis is left standing, then little of substance will remain in this field. However, highly repetitive DNAs have provided a vanguard for the problems that must be encountered in interfacing molecular biology with developmental and evolutionary biology at any level. These sequences have revealed many of the hurdles likely to be encountered in both coding and noncoding parts of the genome. These problems will not go away, nor will our understanding of genomic relevance be helped if the hypotheses that are erected are of little direct predictive value.

I believe that an acceptable state of affairs has been reached for highly repetitive localized arrays at the cellular level. Most hypotheses have been dispensed with, and we are confronted with the stark realism that the obsessive hunt for function has been tangential and illusory.

Revising one’s view in the face of new theoretical explanations and accumulated data (and indeed, producing some of the data oneself) is exactly how science is supposed to work.

____________

Part of the Quotes of interest series.
____________

Doolittle, W.F. and C. Sapienza. 1980. Selfish genes, the phenotype paradigm and genome evolution. Nature 284: 601-603.

John, B. and G.L.G. Miklos. 1979. Functional aspects of satellite DNA and heterochromatin. International Review of Cytology 58: 1-114.

Miklos, G.L.G. 1985. Localized highly repetitive DNA sequences in vertebrate and invertebrate genomes. In Molecular Evolutionary Genetics (ed. R.J. MacIntyre), pp. 241-321. Plenum Press, New York.

Ohno, S. 1972. So much “junk” DNA in our genome. In Evolution of Genetic Systems (ed. H.H. Smith), pp. 366-370. Gordon and Breach, New York.

Orgel, L.E. and F.H.C. Crick. 1980. Selfish DNA: the ultimate parasite. Nature 284: 604-607.

Willard, H.F. 1990. Centromeres of mammalian chromosomes. Trends in Genetics 6: 410-416.

Yamamoto, M. and G.L.G. Miklos. 1978. Genetic studies on heterochromatin in Drosophila melanogaster and their implications for the functions of satellite DNA. Chromosoma 66: 71-98.


Quotes of interest — 1970s edition (part one).

I have argued that prior to 1980, when the selfish DNA hypothesis was proposed, it was taken more or less as a given by most biologists that noncoding DNA had some function(s), even if the specific adaptive significance of these sequences had yet to be demonstrated. This was based on simple Darwinian, adaptationist logic: if it’s there, it must be doing something good for the organism or else natural selection would have done away with it a long time ago.

The point of the selfish DNA hypothesis was to provide an alternative view that did not preclude the potential function of some noncoding DNA, but also did not require it all to be adaptive for its presence to be accounted for. In reality, this simply shifted the process of natural selection down one level, from organisms in populations to sequences of DNA within the genome, but the consequence was important because it meant that evidence had to be provided for organism-level function — this could not just be assumed simply because noncoding DNA exists.

I have already quoted from many papers published during the 1980s, which showed that many biologists continued to explore possible functions for noncoding DNA of different kinds during the 1980s, as they continue to do today. In this post, I want to back up the claim that a common assumption during the 1970s was that most DNA in the genome is functional for the organism. By the late 1970s we start to see a shift away from this position and toward (untenable) ideas regarding “evolutionary” functions — but that is a topic for another time. Here, then, are some samples of the peer-reviewed literature from the period from 1970 to 1979.

At present, one can only speculate on the functions of much of the repeated DNA in eukaryotes. For example, repetitious DNA might contain binding sites for chromosomal proteins or chromosomal RNA… A certain portion of the repetitious DNA may play a role in determining how DNA fibers fold in the chromosome… A fraction of repetitious DNA may not contain any genetic “information” at all but may have evolved to function merely as space-filler in the genome.
[Note: this quote is heavily abbreviated from more than one paragraph, so please consult the original for the full text].

We can summarize these speculations about the genetic functions of repetitious DNA by concluding that, whatever they are, they are probably diverse. Regardless of the function of repetitious DNA, however, its very existence raises fundamental problems for [adaptationist] evolutionary theory.

Edelman, G.M. and J.A. Gally. 1970. Arrangement and evolution of eukaryotic genes. In The Neurosciences Second Study Program (ed. F.O. Schmitt), pp. 962-972. Rockefeller University Press, New York.

It was first demonstrated by Boivin and the Vendrelys that mammalian cells from different tissues contain the same amount of DNA, and furthermore that sperm cells contain half this amount. In several species of fungi vegetative diploid cells contain twice the amount of DNA contained by haploid ones. It is also well known that DNA is metabolically stable. These observations strongly suggest that most, if not all, the DNA has a genetic function. The difficulty arises when the actual amounts of DNA per nucleus in different species are considered.

The problem of large amounts of DNA per genome might be explained if only a proportion of the DNA had a genetic function and the rest was redundant or played some other role in the cell.

In summary, there appear to be several possible functions for DNA which does not encode for messenger RNA and protein synthesis. Direct information about such functions is negligible and one can only guess at the total amount of DNA which may be involved.

Holliday, R. 1970. The organization of DNA in eukaryotic chromosomes. Symposia of the Society for General Microbiology 20: 359-379.

Alternative hypotheses concerning the function of repetitious DNA have recently been proposed, and it seems an appropriate time to review the evidence for the existence of repetitious DNA and see how it adds to our understanding of the evolution and organization of DNA within organisms.

None of the recognition functions, i.e., recognition of centromeres, initiation sites, pairing sites, recombination sites, folding sites, or regulatory sites, that we have discussed is mutually exclusive of the others. They all relate to cellular phenomena that have been demonstrated or inferred from other data. All these phenomena probably exist within every higher organism. Therefore, DNA involved in each of these functions could contribute in varying degrees to the repeated portion of the genome.

Bostock, C. 1971. Repetitious DNA. Advances in Cell Biology 2: 153-223.

There is no doubt, however, that the presence of satellite DNA confers some selective advantage; otherwise we cannot account for its spread throughout a population as large as that of the house mouse.


The origin and spread through the population of such diverse sequences therefore present considerable problems, and I would like to suggest that the best explanation might be that satellite DNA confers a direct advantage on the chromosome which carries it, because such a chromosome survives the mechanical processes of, primarily, meiosis better than a sister chromosome not so well endowed.

Walker, P.M.B. 1971. Origin of satellite DNA. Nature 229: 306-308.

With the assumption that a portion that comprises some 10 percent of the genomes in higher organisms cannot be without a raison d’être, an extensive review led us to conclude that a certain amount of constitutive heterochromatin is essential in multicellular organisms at two levels of organization, chromosomal and nuclear. At the chromosomal level, constitutive heterochromatin is present around vital areas within the chromosomes. Around the centromeres, for example, heterochromatin is believed to confer protection and strength to the centromeric chromatin. Around secondary constrictions, heterochromatic blocks may ensure against evolutionary change of ribosomal cistrons by decreasing the frequency of crossing-over in these cistrons in meiosis and absorbing the effects of mutagenic agents. During meiosis heterochromatin may aid in the initial alignment of chromosomes prior to synapsis and may facilitate speciation by allowing chromosomal rearrangement and providing, through the species specificity of its DNA, barriers against cross-fertilization.

At the nuclear level of organization, constitutive heterochromatin may help maintain the proper spatial relationships necessary for the efficient operation of the cell through the stages of mitosis and meiosis. In the unicellular procaryotes, the presence of a small amount of genetic information in one chromosome obviates the need for constitutive heterochromatin and a nuclear membrane. At higher levels of organization, with an increase in the size of the genome and with evolution of cellular and sexual differentiation, the need for compartmentalization and structural components in the nucleus became imminent. The portion of the genome that was concerned with synthesis of ribosomal RNA was enlarged and localized in specific chromosomes, and the centromere became part of each chromosome when the mitotic spindle was developed in evolution. Concomitant with these changes in the genome, repetitive sequences in the form of constitutive heterochromatin appeared, probably as a result of large-scale duplication. The repetitive DNA’s were kept through natural selection because of their importance in preserving these vital regions and in maintaining the structural and functional integrity of the nucleus.

The association of satellite (or highly repetitive) DNA with constitutive heterochromatin is understandable, since it stresses the importance of the structural rather than transcriptional roles of these entities. Nuclear satellite DNA’s have one property in common despite their species specificity, namely heterochromatization. In this sense the apparent species specificity of satellite DNA may be the result of natural selection for duplicated short polynucleotide segments that are nontranscriptional and can be utilized in specific structural roles.

Yunis, J.J. and W.G. Yasmineh. 1971. Heterochromatin, satellite DNA, and cell function. Science 174: 1200-1209.

The great variability between species (and even populations) in the type and quantity of satellite and/or cryptic sequence DNA suggests that organisms function within wide limits for organizing the type and extent of constitutive heterochromatin. On the other hand, the striking species specificity of major satellite sequences, together with a rather low level of base sequence divergence within a species and the precise localization of much of this DNA to centric heterochromatin, suggest that these DNA’s perform important nuclear functions.

Rae, P.M.M. 1972. The distribution of repetitive DNA sequences in chromosomes. Advances in Cell and Molecular Biology 2: 109-149.

Today there are over 100 laboratories studying different aspects of the problem concerning highly repetitive sequences and, as is always the case, a new and often confusing jargon has grown up around the rapidly evolving field. It is the purpose of this chapter to define and describe these new terms and to focus on the following three questions: What are highly repetitive sequences? How did they evolve? What is their function?

Flamm, W.G. 1972. Highly repetitive sequences of DNA in chromosomes. International Review of Cytology 32: 1-51.

[Satellite DNAs in mice] are widely established in populations throughout the world and thus clearly cannot be relegated to a random and functionless quirk of evolution. Simple-sequence DNAs must obviously possess a certain role in the genome.

There is one major hope for making sense of the fact that many higher organisms seem to carry in every nucleus a large portion of their DNA that looks, superficially, to be completely worthless. This lies in the comparative approach. When do simple-sequence DNAs arise in evolution? Can we find two closely related species one with and one without a major block of heterochromatin?

Swift, H. 1973. The organization of genetic material in eukaryotes: progress and prospects. Cold Spring Harbot Symposia on Quantitative Biology 38: 963-979.

Alternatively, large variations in genome size could readily be accommodated if a high proportion of the DNA were used for functions other than coding for proteins. A number of such functions have been proposed and incorporated into hypothetical structures for the eukaryotic genome.

The outstanding problem presented by eukaryotic DNA is that of finding a role for these large fractions not used in coding for proteins or cytoplasmic RNAs.

Southern, E. 1974. Eukaryotic DNA. In MTP International Review of Science, Biochemistry Series One, Volume 6, Biochemistry of Nucleic Acids (ed. K. Burton), pp. 101-139. University Park Press, Baltimore.

Genetic redundancy is common in eukaryotes and thus must confer substantial selective advantages upon this group of organisms. One function of certain redundant DNA sequences (satellite DNAs) that are located in the centromeric heterochromatin may be to facilitate proper chromosome pairing and segregation.

Tartof, K.D. 1975. Redundant genes. Annual Review of Genetics 9: 355-385.

Proposed functions for satellite DNA were evaluated and formally set forth by Walker (1971) and have since been expanded by Mazrimas and Hatch (1972), Lagowski et al. (1973), Lee (1975), Bostock (1971), Walker (1972), and Comings (1972). In a masterful summary and evaluation of current ideas relating repeated DNA to the organization of the eukaryote chromosome (Cold Spring Harbour Symposia on Quantitative Biology 1973) Swift stated that the function of simple sequence DNAs not only appeared to have most investigators mystified, but that the present theories concerning their function were not accepted with much enthusiasm. He did, however, point out that “There is one major hope for making sense of the fact that many higher organisms seem to carry in every nucleus a large portion of their DNA that looks superficially to be completely worthless. This lies in the comparative approach. When do simple sequence DNAs arise in evolution? Can we find two closely related species one with and one without a major block of heterochromatin?”

Miklos, G.L.G. and R.N. Nankivell. 1976. Telomeric satellite DNA functions in regulating recombination. Chromosoma 56: 143-167.

Satellites constitute from 1% to 65% of the total DNA of numerous organisms, including that of animals, plants, and prokaryotes. Their existence has been known for about 15 years, but, although it is thought that they must be biologically important, with few exceptions … their functions are still largely in the realm of speculation. This remains true despite their ubiquity and, except for polytenized tissues, their constancy as a fraction of the total DNA in all tissues of the particular animal or plant species in which they are observed.

Skinner, D.M. 1977. Satellite DNA’s. BioScience 27: 790-796.

The most important aspect of satellite DNA remains the nature of its functions. Although a large body of data has been gathered concerning its structure, distribution and properties in several different organisms, most of these results have in fact neither supported nor disproved any one of the particular hypotheses of function (see Comings, 1972; Swift, 1973; Hsu, 1975; Miklos and Nankivell, 1976; for evaluations of functions). The most popular hypothesis on satellite DNA function has been, and still is, that satellite DNA is involved in some aspect of chromosome mechanics such as chromosome pairing.

Yamamoto, M. and G.L.G. Miklos. 1978. Genetic studies on heterochromatin in Drosophila melanogaster and their implications for the functions of satellite DNA. Chromosoma 66: 71-98.

In spite of the large amount of information which now exists on the structure of satellite DNA, it is clear that the central issue, namely, function, has not been directly tackled. Probably the most important reason for this unsatisfactory state of affairs has been the signal failure to approach the problem of function experimentally, despite the considerable effort that has gone toward elucidating structural properties. In part this refractory state of affairs stems from the assumption that a knowledge of function necessarily follows from a knowledge of structure. In part too it is explained by the fact that the properties of satellite DNA have been evaluated within the framework of prokaryotic dogma without sufficient consideration of the higher-order phenomena which characterize the biology of eukaryotes.

It appears very obvious that we have now reached a stage in satellite DNA research where additional structural analyses are not revealing the nature of its function — and indeed there is a very good reason for this. The initial success of the prokaryotic approach to genetic function was due to its manipulative aspects. This approach, involving perturbation of a system by mutation, deletion, substitution and translocation, proved critical. Only recently has a similar approach been applied specifically in investigating satellite DNA function, although an enormous literature exits on experimental and natural modifications of heterochromatin, which bear directly on this issue.

In the absence of experimental evidence the problem has in general been discussed in terms largely modified from earlier theoretical considerations relating to the functions of heterochromatin. A summary of the comparisons of heterochromatin and satellite DNA functions is presented in Table II. As can be seen from this table, the assumption has generally been made that there is at least one positive function. However, since similar organisms have widely different amounts of satellite DNA, and since such differences are found even between species that form viable hybrids, some investigators have suggested that these sequences are simply evolutionary by-products with no particular function. This fails to explain why so many eukaryotes have been found to contain highly repeated DNA and why its amount varies so considerably even between closely related species. Equally difficult to explain is why in some cases mechanisms have evolved to regulate replication of this DNA in particular tissues independently of the rest of the genome or indeed of other repetitious sequences in the same nucleus. Thus the large amount of satellite DNA in some species, its apparent rigid conservation in sequence and, as we shall see, its effects on the genome when it is altered in amount or position lead us to be unimpressed in general with the argument that most of it constitutes a functionless burden which many eukaryotes must bear. However, for the moment we will retain an open mind and examine the hard data pertaining to function before casting a final judgment at the end of this article.

Click for larger image

John, B. and G.L.G. Miklos. 1979. Functional aspects of satellite DNA and heterochromatin. International Review of Cytology 58: 1-114.

____________

Part of the Quotes of interest series.


Quotes of interest — Alu.

Whereas each copy of the human genome contains about 20,000 protein-coding genes, it is also home to more than 1 million copies of a short interspersed repetitive element (SINE) known as Alu. For this reason, Doolittle (1997), perhaps only half jokingly, suggested that the genomes of humans “might be ironically viewed as vehicles for the replication of Alu sequences”.

Alu elements are now known to be transposable elements and are restricted to primate genomes, though neither of these facts was recognized until several years after they were first discovered. They are not capable of autonomous movement and replication in the genome, rather they are “parasites” of other elements like LINE-1 which encode their own means of transposition. Their origin seems to trace to a duplication of a 7SL RNA gene near the origin of the primates. Today, some Alus are implicated in genomic functions while others continue to cause disease — as a general group of sequences, some are parasitic, some are mutualistic, and many are probably commensal, neither conferring benefit nor doing harm. For an excellent overview the biology of Alu elements, see Batzer and Deininger (2002).

The history that is most commonly recounted when it comes to the study of genomic components like pseudogenes and transposable elements is that they were long dismissed as irrelevant “junk”. As I noted with reference to pseudogenes, they were not interpreted this way when discovered, even though this happened 5 years after the idea of “junk DNA” was proposed (Jacq et al. 1977).

Alu elements were first isolated in 1979, and are so named because this involved digestion of genomic DNA with the AluI restriction enzyme (which in turn is named for the bacterium from which it is derived, Arthrobacter luteus) (Houck et al. 1979). Again, if the typical story about noncoding DNA is true, then we should expect the discovery of these elements to have been discussed in terms of their biological insignificance.

Here is what Houck et al. (1979) actually said about the newly discovered elements:

Renatured DNA from human and many other eukaryotes is known to contain 300-nucleotide duplex regions from renatured repeated sequences. These short repeated DNA sequences are widely believed to be interspersed with single copy DNA sequences. In this work we show that at least half of these 300-nucleotide duplexes share a cleavage site for the restriction enzyme AluI. This site is located 170 nucleotides from one end. This Alu family of repeated sequences makes up at least 3% of the genome and is present in several hundred thousand copies.

Since these 300-nucleotide sequences, as well as their interspersed unique sequences, occupy such a large fraction of the genome in widely divergent eukaryotes, one imagines that they serve some important biological function. Among other possibilities, it has been proposed that they are involved in gene regulation. Unfortunately, their function remains unproven. In deciding what biological function these repeated sequences might serve, it is important to know the number of different families to which they belong.

It has been proposed that the 300-nucleotide interspersed repeated sequences perform a regulatory function either at the DNA or RNA level. The inclusion of over half of these 300-nucleotide sequences in a single family of repetitive sequences (the Alu family) would limit their ability to function as complex regulatory elements.

We have found in this work that at least half of 300-nucleotide inverted repeated DNA sequences and half of all other 300-nucleotide repeated sequences belong to one family. Comparing our independent results on inverted repeated DNA sequences, it seems likely that the heterogeneous nuclear RNA duplexes studied by Jalinek are transcribed from the Alu family of repeated sequences. We are currently testing this hypothesis by RNA-DNA hybridization and DNA sequencing. This hypothesis suggests that the function of the Alu family occurs at the level of the heterogeneous nuclear RNA. It has been proposed that such repeated sequences might be processing sites for heterogeneous nuclear RNA. Although other possibilities cannot be ruled out at this time, we find this to be an especially attractive proposal for the function of a single simple class of repeated sequences that are so widely distributed throughout the genome.

In a second paper published in the following year, Rubin et al. (1980) said:

The biological function of this family of sequences is unknown. We and our colleagues have recently noted sequence similarities between a selected portion of the Alu family and several other RNA or DNA sequences, which are known or suspected to be involved in DNA replication, transcription control, and mRNA processing. Together these observations reinforce our belief that a family of DNA sequences which includes 300,000 highly conserved members interspersed throughout much of the mammalian genome, must have an important function.

____________

Part of the Quotes of interest series.
____________

Batzer, M.A. and P.L. Deininger. 2002. Alu repeats and human genomic diversity. Nature Reviews Genetics 3: 370-380.

Doolittle, W.F. 1997. Why we still need basic research. Annals of the Royal College of Physicians and Surgeons of Canada 30: 76-80.

Houck, C.M., F.P. Rinehart, and C.W. Schmid. 1979. A ubiquitous family of repeated DNA sequences in the human genome. Journal of Molecular Biology 132: 289-306.

Jacq, C., J.R. Miller, and G.G. Brownlee. 1977. A pseudogene structure in 5S DNA of Xenopus laevis. Cell 12: 109-120.

Rubin, C.M., C.M. Houck, P.L. Deininger, T. Friedmann, and C.W. Schmid. 1980. Partial nucleotide sequence of the 300-nucleotide interspersed repeated human DNA sequences. Nature 284: 372-374.


Quotes of interest — Ohno (1973) and discussion.

The term “junk DNA” was coined by Susumu Ohno in conference presentations at the Brookhaven National Laboratory in Upton, New York and in Rhein, Germany, which were printed in conference proceedings volumes a short time later (Ohno 1972, 1973). Ohno used the term “junk DNA” only once per article (in the titles), and most of the argument focused on his view that gene duplication was necessary for evolutionary change, that a small fraction of duplicated genes would mutate in such a way as to acquire novel protein-coding functions and consequently that for every new gene that emerged a large number of non-functional gene copies would result, and that indeed much of the genome cannot be directly essential or any mutation anywhere in the genome would be deleterious and the organisms would not withstand even a mild mutation rate.

My suspicion is that most people who cite these papers have not read them, because they tend to do so by citing these (especially Ohno 1972) as “the” claim, supposedly adopted generally soon thereafter, that noncoding DNA was totally unimportant and should be dismissed in favour of studying protein-coding sequences. This history of the field is totally incorrect, as I have attempted to show by referring directly to the primary literature from the relevant period of the 1970s and ’80s. (Notably, creationists seem to think the discussion of the topic began in the 1990s, which is the only way they can argue that “intelligent design” predicted function when “Darwinists” rejected it).

I will be posting a series of excerpts from papers written in the 1970s soon. My point, as previously, is that up to and beyond Ohno’s proposed mechanism for generating “junk DNA”, the standard assumption, based on adaptationism (i.e., “Darwinism” under the proper definition), was that noncoding sequences were functional. The current view, that much or even most (but certainly not all) noncoding DNA is probably not adaptive for the organism, had to arise against the current of adaptationist assumptions and survived only because it is supported by far more evidence than the alternative.

For now I simply will give some indication of the literally immediate reaction to Ohno’s suggestion, recorded as a transcript of the discussion following his Rhein presentation and printed after the text of Ohno (1973). Here it is, beginning at the section relating to Ohno’s presentation:

EVANS: Dr. Ohno, I suppose the take-home message was that in your view certainly the highly repetitive DNA is junk and you said that lots of unique sequences are also junk. This point of view is now open to discussion.

YUNIS: I wonder if you really mean “junk”. You are equating non-translational and non-transcriptional DNA with junk. I agree that you must be right up to some extent, but I wonder whether you have ignored the proven polyploidization as a way of evolution.

OHNO: If there is any gene which is doing some good for your general well-being, you will suffer when you lose that gene. For this very reason a fraction of randomly sustained mutations of that locus would be deleterious. There is simply no way of having a useful gene without paying a certain price for the cost of natural selection. If, on the other hand, there is a gene which is totally irrelevant, you will lose that gene sooner or later, for natural selection would not police that gene.

YUNIS: We know that constitutive heterochromatin is rich in repetitive DNA and that satellite DNA spaces essential regions such as the centromere and nucleolar organizer. Isn’t this an important role?

OHNO: Yes, spacer is important in the same negative way as fribrinopeptides A and B. Only a short stretch of base sequence at its end would have to be conserved as a signal to be nicked by ribonuclease.

HENNIG: I feel one could accept to some extent both views. From all what is known so far we can conclude that probably the nucleotide sequence as such does not matter. Furthermore the actual amount of simple sequence DNA (within some limits) seems not to be important. But since this kind of DNA is there one has to correlate it to some kind of function. That means that either simply the presence of some portion of this material is essential for structural or functional reasons. Or one could imagine that this kind of DNA is a product of certain molecular mechanisms which as such are essential for the eukaryote genome. This, of course, should be some kind of multiplication mechanism. Such a mechanism may, for example, exist in order to keep ribosomal cistrons or 5S cistrons etc. by occasional multiplications alike. Such multiplication steps could, accidentally or not, include adjacent DNA sequences, thus producing simple sequence DNA. It is remarkable that nucleoli usually are associated with heterochromatin which contains simple sequence DNA. Also the histone genes in Drosophila seem to be associated with heterochromatin as Dr. Pardue has shown.

YUNIS: It is important to emphasize that, in general, there is a certain constancy in the amount and distribution of satellite DNA.

HENNIG: It is certainly true that there is a constancy in certain limits of simple sequence DNA. But these limits could be governed by simple mechanical requirements of the chromosomes, for example in segregation of the chromosomes. Extremely large chromosomes do have difficulties during cell division and thus an upper limit could be introduced by the size of chromosomes. I think the variability of the heterochromatic arm of the X chromosome of the Drosophila species, which we are studying, is a good example for a block of simple sequence DNA which seems not to be essential. Deletion of the long heterochromatic arm of the X in D. hydei has no obvious consequences for the flies. This could mean that there is some DNA which may not be necessary but is there and is kept. Of course, this deletion stock has not been tested for its success in a population competition with the wild type.

FORD: I think the word “junk” is a powerful word. The only thing I would seriously question is this assumption that perhaps just 10 units would be sufficient to act as a spacer when 100 or more were there. Do we really know enough to be sure of that point?

OHNO: If we argue that a given spacer can change its base sequence any way it likes, but that a length of it has to be conserved rigidly, deletions would become deleterious to spacer function. It follows that spacer sequences, too, contribute to the overall mutation load, and for this very reason, we cannot even afford to keep too many spacers.

FORD: I think it just wouldn’t be there unless it would do it. Something was a functional reason of some kind for it.

YUNIS: This is what I emphasized earlier, that this DNA must have a functional value since nothing is known so widespread and universal in nature that has proven useless.

FRACCARO: Well, there is an exception to that rule. A lot of us have permanent positions at the University but are considered by others (mainly by students) meaningless and of no utility whatsoever.

EVANS: Well with that very splendid comment I think we should now draw today’s discussion to a close. I should remind you, however, that we have at least a full hour available tomorrow to continue this discussion and would like to end by thanking all the speakers for their excellent presentations and the discussants for talking part in the discussion.

_________

Part of the Quotes of interest series.
_________

Ohno, S. 1972. So much “junk” DNA in our genome. In: Evolution of Genetic Systems (ed. H.H. Smith), pp. 366-370. Gordon and Breach, New York.

Ohno, S. 1973. Evolutional reason for having so much junk DNA. In: Modern Aspects of Cytogenetics: Constitutive Heterochromatin in Man (ed. R.A. Pfeiffer), pp. 169-173. F.K. Schattauer Verlag, Stuttgart, Germany.


Junk DNA — the quotes of interest series.

To facilitate access to the series of posts on what has been said in the literature about noncoding DNA and its potential functions, I will maintain an updated list here.


MASHing junk DNA.

The Korean War lasted from June 1950 to July 1953. It served as the basis of the 1968 book MASH: A Novel About Three Army Doctors by Richard Hooker (pen name of former army surgeon H. Richard Hornberger), describing the experiences of surgeons at a Mobile Army Surgical Hospital during the war. This was adapted into a film (MASH) starring Donald Sutherland in 1970, which in turn inspired the TV series M*A*S*H starring Alan Alda.

The finale of the television series M*A*S*H remains one of the most-watched episodes in the history of the medium, with more than 100 million viewers. By that time, the series had spanned 251 episodes over 11 seasons from 1972 to 1983. The Korean War, by contrast, lasted three years.

What does this have to do with junk DNA?

As I have been attempting to show by referencing the scientific literature in a series of Quotes of interest, there was a diversity of views on the functional significance of noncoding sequences of all sorts before and after the terms “junk DNA” and “selfish DNA” were introduced. In fact, it is clear that the default view for many authors was adaptationist and therefore that noncoding elements must have a function by virtue of their very existence. Not knowing the function of noncoding DNA stimulated, rather than stifled, research in this area. Also, it is clear that Ohno (1972) had minimal impact at least up to the time when the selfish DNA hypothesis was proposed (1980), because at that time the assumption of function was still pervasive — that’s why selfish DNA was presented in the first place, as an alternative to strict adaptationism. Moreover, two other points must be borne in mind: 1) introns, pseudogenes, satellite DNA, and various kinds of transposable elements were not characterized, or indeed in some cases not even discovered, until at least the late 1970s, and 2) the tools necessary to study noncoding elements, most notably DNA sequencing, did not arise until the late 1970s and early 1980s.

In other words, the most common assumption at least up to 1980, made on the basis of Darwinian adaptationist logic, was that most or all noncoding DNA was functional, and the tools for testing functional (or non-functional) hypotheses about noncoding DNA were not available until the 1980s. What this indicates is that the only period during which one could possibly argue that noncoding DNA was dismissed when it should have been studied begins around 1980. As I have tried to demonstrate, there was intense interest in noncoding DNA, much of it related to searches for function, during the 1980s. (Note: I have not even begun to discuss the history of thoughts about possible functions, or at least effects, of total DNA content on the cell, which traces to the 1870s.) Nevertheless, in 1989 or 1990 there is a switch in the discussion to claiming that noncoding DNA had long been dismissed as inconsequential junk, first in scientific review papers, and then in science news stories beginning by 1994. We have been told ever since that scientists long neglected noncoding DNA. Creationists have expanded this to include the absurd and supremely illogical argument that “Darwinism” is at fault in this, even though Darwinian adaptationism was the basis for assuming that noncoding DNA must be functional from the time noncoding sequences were first identified.

To summarize, then, there was a period of about 9-10 years during which it is conceivable that noncoding DNA could have been ignored when it should have been studied as biologically relevant. The period of telling us that it was ignored but is now thought to be important, by contrast, has been going on for about 18 years and shows no signs of abating. Nevermind the fact that there was no such period during which functions were not postulated and tested, or the fact that the originators of the ideas of junk DNA and selfish DNA unambiguously noted that a non-trivial portion of noncoding sequences would be functional for the organisms carrying them.

M*A*S*H was entertaining, thought-provoking, and served as a statement about the horrors of war in general. It is really just an interesting afterthought that it lasted nearly four times longer than the war on which it was based. The repetitive claim that scientists have long ignored noncoding DNA, on the other hand, serves no function and refers to a period of history that never actually happened.


Quotes of interest — satellite DNA.

Satellite DNA, also known as tandemly repeated DNA, represents a diverse class of highly repetitive elements consisting of clusters of short repeated sequences. The general category of satellite DNA is now divided into several categories according to the size of the individual repeats, though the specific classification scheme can vary among authors. Thus, one may read reference to satellites (up to hundreds of base pairs per repeat), minisatellites (10-100bp per repeat), and microsatellites (only a few bp per repeat).

The term “satellite” in the genetic sense was first coined by the Russian cytologist Sergius Navashin in 1912, initially in Russian (“sputnik”) and Latin (satelle), and only later translated to “satellite” (Battaglia 1999). This original usage referred to the morphology of a chromosome possessing a secondary constriction at a certain point along its length. The more familiar usage of “satellite” relates to a small band of DNA with a density different (usually lower, because of a high AT-content) from the bulk of the genomic DNA, and which becomes separated from the main band following CsCl centrifugation (Kit 1961; Sueoka 1961). Satellite DNA was discovered in the early 1960s as an artifact of genetic studies involving this technique of centrifugation.

Satellite DNAs are non-protein-coding, and these and other repetitive sequences should have been neglected according to standard renditions of the history of research on noncoding DNA. Does the scientific literature support this claim?

Before “junk DNA” (pre-1972):

A concept that is repugnant to us is that about half of the DNA of higher organisms is trivial or permanently inert (on an evolutionary time scale). Furthermore, at least some of the members of DNA families find expression as RNA. We therefore believe that the organization of DNA into families of related sequences will ultimately be found important to the phenotype. However, at present we can only speculate on the actual role of the repeated sequences.

Britten, R.J. and D.E. Kohne. 1968. Repeated sequences in DNA. Science 161: 529-540.

The existence of repeated sequences in higher organisms led us independently to consider models of gene regulation of the type we describe here. This model depends in part on the general presence of repeated DNA sequences. The model suggests a present-day function for these repeated DNA sequences in addition to their possible evolutionary role as the raw material for creation of novel producer gene sequences. The apparently universal occurrence of large quantities of sequence repetition in the genomes of higher organisms suggests strongly that they have an important current function.

Britten, R.J. and E.H. Davidson. 1969. Gene regulation for higher cells: a theory. Science 165: 349-357.

Although we have localized mouse satellite DNA in the centromeric heterochromatin, this localization does not establish a function for either satellite DNA or heterochromatin. It seems that this function is one which is necessary to the chromosome since the proportion of satellite DNA is maintained in established mouse cell lines even though the chromosomes have undergone other morphological change.

Pardue, M.L. and J.G. Gall. 1970. Chromosomal localization of mouse satellite DNA. Science 168: 1356-1358.

One of the potentially significant aspects of this approach is that it can discover the location of defined DNA sequences on the chromosomes and relate this to their functional distribution at interphase. Thus is seems clear, from the evidence of the enriched content of nucleoli and of centric regions of chromosomes, that these become associated in interphase. The respective functions of centromeres and satellite DNA in this phenomenon are not clear, but a mechanism which obviously coordinates the physical, and perhaps the functional, aspects of different chromosomes may rely to some extent on the chemical homology of the associated satellite DNA.

Jones, K.W. 1970. Chromosomal and nuclear location of mouse satellite DNA in individual cells. Nature 225: 912-915.

Recent reports indicate that the DNA of constitutive heterochromatin is composed to a large extent of short repeated polynucleotide sequences, termed satellite DNA. This discovery has necessitated a critical review of current ideas concerning the origin and function of this portion of the genome of higher organisms. A careful appraisal of the information that has accumulated about heterochromatin since the time of Heitz and on satellite DNA during the last decade suggests that these entities have vital structural functions: they maintain nuclear organization, protect vital regions of the genome, serve as an early pairing mechanism in meiosis, and aid in speciation.

With the assumption that a portion that comprises some 10 percent of the genomes in higher organisms cannot be without a raison d’être, an extensive review led us to conclude that a certain amount of constitutive heterochromatin is essential in multicellular organisms at two levels of organization, chromosomal and nuclear. At the chromosomal level, constitutive heterochromatin is present around vital areas within the chromosomes. Around the centromeres, for example, heterochromatin is believed to confer protection and strength to the centromeric chromatin. Around secondary constrictions, heterochromatic blocks may ensure against evolutionary change of ribosomal cistrons by decreasing the frequency of crossing-over in these cistrons in meiosis and absorbing the effects of mutagenic agents. During meiosis heterochromatin may aid in the initial alignment of chromosomes prior to synapsis and may facilitate speciation by allowing chromosomal rearrangement and providing, through the species specificity of its DNA, barriers against cross-fertilization.

At the nuclear level of organization, constitutive heterochromatin may help maintain the proper spatial relationships necessary for the efficient operation of the cell through the stages of mitosis and meiosis. In the unicellular procaryotes, the presence of a small amount of genetic information in one chromosome obviates the need for constitutive heterochromatin and a nuclear membrane. At higher levels of organization, with an increase in the size of the genome and with evolution of cellular and sexual differentiation, the need for compartmentalization and structural components in the nucleus became imminent. The portion of the genome that was concerned with synthesis of ribosomal RNA was enlarged and localized in specific chromosomes, and the centromere became part of each chromosome when the mitotic spindle was developed in evolution. Concomitant with these changes in the genome, repetitive sequences in the form of constitutive heterochromatin appeared, probably as a result of large-scale duplication. The repetitive DNA’s were kept through natural selection because of their importance in preserving these vital regions and in maintaining the structural and functional integrity of the nucleus.

The association of satellite (or highly repetitive) DNA with constitutive heterochromatin is understandable, since it stresses the importance of the structural rather than transcriptional roles of these entities. Nuclear satellite DNA’s have one property in common despite their species specificity, namely heterochromatization. In this sense the apparent species specificity of satellite DNA may be the result of natural selection for duplicated short polynucleotide segments that are nontranscriptional and can be utilized in specific structural roles.

Yunis, J.J. and W.G. Yasmineh. 1971. Heterochromatin, satellite DNA, and cell function. Science 174: 1200-1209.

After “junk DNA” (1972-1980):

It has recently become possible to measure the interspersion of repetitive and single-copy DNA sequences and to estimate the length of the interspersed sequence elements. Interspersion of repetitive and non-repetitive sequences appears to be a general, if not universal, property of higher organism DNA. Similarities in the lengths of the different classes of sequence are present in the two species for which measurements are available.

These patterns are very likely of functional significance. It is our purpose in this section to focus on the evidence which, in our judgment, leads toward understanding the functional organization of the genome. We do not intend to review the entire subject of DNA sequence organization, and, for example, we only touch on the large literature dealing with satellite DNAs.

In concluding, we return to the question of the organization of DNA sequences. Our approach to gene regulation implies that the location of repetitive sequences provides the hereditary physical basis for the patterns of gene regulation. From this viewpoint, perhaps the most direct and crucial approach to the mechanism of gene regulation in higher organisms now available is the study of DNA sequence organization. More generally, an argument can be made that whether or not this particular model of gene regulation contains some elements of reality, the placement of sequences in the genome is bound to play a basic and significant role. Among the criteria of usefulness for models of gene regulation, therefore, is the extent to which they specify the structural and functional properties of DNA sequence organization. The present state of our technology, in particular of nucleic acid reassociation technology, suggests that the tools are now in hand to unravel the patterns of DNA sequence organization and their functional meaning.

Davidson, E.H. and R.J. Britten. 1973. Organization, transcription, and regulation in the animal genome. Quarterly Review of Biology 48: 565-613.

The DNA of eukaryotic organisms contains serially repeated sequences which vary in amount and complexity from one species to the next. Some of these sequences differ from the bulk of the DNA in G + C content, and hence appear as “satellites” when the DNA is banded in a CsCl density gradient. Several satellite DNAs, such as those of the mouse, the fly, Rhynchosciara and several species of Drosophila have been shown by in situ RNA-DNA hybridization to be located in the centromeric hetero-chromatin. However, very little is known about the function of satellite DNAs. There is no evidence that they code for proteins, and it is unlikely that they are even transcribed within the cell.

Because of their simple sequences the satellites of D. virilis obviously have no coding function for ordinary proteins. This conclusion is in keeping with the fact, known since the 1920’s, that the heterochromatin of Drosophila contains only a very few genes. Also because of their simple structure, and especially because they are not located in the genetic part of the chromosome, the satellites are poor candidates for regulatory genes. It is difficult to postulate any generalized function for the satellites, necessary for all cells of the organism, since the amount of satellite DNA is reduced so drastically in the polytene tissue. Similarly, there is evidence from D. melanogaster that large segments of the heterochromatin can be deleted without adverse effects either on viability or on the normal mitotic behavior of the chromosomes. Indeed the major known effect of deletion of heterochromatin, as in the sc4L scaR chromosome, is disturbance of meiotic disjunction. If the satellite DNAs have any function, it would seem to lie in the rather ill-defined category of “chromosome mechanics”, possibly including chromosome folding, meiotic pairing, or disjunction. One could even speculate that the major role is an evolutionary one, permitting only chromosomes of closely related populations to pair in meiosis, or to be involved in interchromosome exchange of the sort seen regularly in “Robertsonian fusions”.

Gall, J.G. and D.D. Atherton. 1974. Satellite DNA sequences in Drosophila virilis. Journal of Molecular Biology 85: 633-664.

An increasing proportion of the mysteriously abundant DNA of higher organisms is becoming easier to comprehend, in general terms at least. Variations in nuclear DNA content among organisms are being correlated with specific types of non-genic DNA. Several levels of apparent “bureaucracy” in the genome are becoming defined: (1) unique sequences including structural genes and other specific sequences occurring in one or two copes per genome, (2) repeated genes in a few special instances requiring high output of gene products, (3) moderately repetitive DNA sequences that are interspersed in several patterns with DNA of levels (1) and (2) and that may be involved in regulation of gene expression, and (4) highly repetitive and satellite DNA sequences, which are variable in quantity, located in massive tandem arrays, and are organized into condensed forms of chromatin. The present report has dealt with the fourth level of the hierarchy and has described its involvement in the determination of the macrostructure of chromosomes and the genome as a whole. This fourth level appears to exert the most global form of control through playing roles in adaptation to the environment and in the evolution of new species. The term “chromosome-engineering DNA” seems to express appropriately the mode of action of highly reiterated, simple sequence DNA.

Hatch, F.T., A.J. Bodner, J.A. Mazrimas, and D.H. Moore. 1976. Satellite DNA and cytogenetic evolution: DNA quantity, satellite DNA and karyotypic variations in kangaroo rates (Genus Dipodomys). Chromosoma 58: 155-168.

Proposed functions for satellite DNA were evaluated and formally set forth by Walker (1971) and have since been expanded by Mazrimas and Hatch (1972), Lagowski et al. (1973), Lee (1975), Bostock (1971), Walker (1972), and Comings (1972). In a masterful summary and evaluation of current ideas relating repeated DNA to the organization of the eukaryote chromosome (Cold Spring Harbour Symposia on Quantitative Biology 1973) Swift stated that the function of simple sequence DNAs not only appeared to have most investigators mystified, but that the present theories concerning their function were not accepted with much enthusiasm. He did, however, point out that “There is one major hope for making sense of the fact that many higher organisms seem to carry in every nucleus a large portion of their DNA that looks superficially to be completely worthless. This lies in the comparative approach. When do simple sequence DNAs arise in evolution? Can we find two closely related species one with and one without a major block of heterochromatin?”.

In summary, we believe that the Atractomorpha results focus attention on aspects of repeated DNA which are quite different to previously postulated functions. We argue that a large proportion of the highly repeated localised DNA as well as some of the repeated interspersed DNA acts in regularizing recombinational frequency and position. Thus if repeated DNA really does play a role in homologue recognition and chromosome pairing, it is now clear that only a minimum amount functions in this way, and this minimum amount need not be expressed as visible heterochromatin (as in A. australis).

Miklos, G.L.G. and R.N. Nankivell. 1976. Telomeric satellite DNA functions in regulating recombination. Chromosoma 56: 143-167.

Although much discussion has centred on the possible functions of satellite DNA (Edelman and Gally, 1970; Kohne, 1970; Walker, 1971a, b; Bostock, 1971; Yunis and Yasmineh, 1971; Comings, 1972; Rae, 1972; Jones, 1973; Hennig, 1973; Swift, 1973; Southern, 1974; Tartof, 1975; Hsu, 1975; Hatch et al., 1976) the major problem in evaluating function has been a lack of direct experimental manipulation of the satellite DNA content of any chromosome. Of the postulated functions, the more common ones would assign a role for satellite DNA in determining centromere strength (Walker, 1971a, b), aspects of chromosome pairing for regular segregation of homologs (Yunis and Yasmineh, 1971), involvement in the processes of speciation (Hatch et al., 1976), and alterations in the recombination system (Miklos and Nankivell, 1976).

The most important aspect of satellite DNA remains the nature of its functions. Although a large body of data has been gathered concerning its structure, distribution and properties in several different organisms, most of these results have in fact neither supported nor disproved any one of the particular hypotheses of function (see Comings, 1972; Swift, 1973; Hsu, 1975; Miklos and Nankivell, 1976; for evaluations of functions). The most popular hypothesis on satellite DNA function has been, and still is, that satellite DNA is involved in some aspect of chromosome mechanics such as chromosome pairing.

Yamamoto, M. and G.L.G. Miklos. 1978. Genetic studies on heterochromatin in Drosophila melanogaster and their implications for the functions of satellite DNA. Chromosoma 66: 71-98.

Satellites constitute from 1% to 65% of the total DNA of numerous organisms, including that of animals, plants, and prokaryotes. Their existence has been known for about 15 years, but, although it is thought that they must be biologically important, with few exceptions … their functions are still largely in the realm of speculation. This remains true despite their ubiquity and, except for polytenized tissues, their constancy as a fraction of the total DNA in all tissues of the particular animal or plant species in which they are observed.

The molecular diversity of this group of DNA’s, all taken together and classified as “satellites,” may be reflected in each satellite (or possibly groups of satellites) having a distinct function. This belief is based in part on the fact that there are many exceptions to nearly every generalization that has been made about satellite DNA’s.

Skinner, D.M. 1977. Satellite DNA’s. BioScience 27: 790-796.

The idea that the coordinate regulatory system of animal genomes is encoded in networks of repetitive sequence relationships is now a decade old. We and others have developed the concept that genes could be regulated by specific interactions occurring at repetitive sequences in the DNA genome. The premises have been (i) that the differentiated properties of animal cells derive from diverse and specific cytoplasmic messenger RNA (mRNA) sequence sets and (ii) that the cell-specific populations of mRNA’s result from cell-specific patterns of structural gene transcription.

Davidson, E.H. and R.J. Britten. 1979. Regulation of gene expression: possible role of repetitive sequences. Science 204: 1052-1059.

Evolutionary conservation of W [sex chromosomal] satellite DNA strongly suggests that functional constraints may have limited sequence divergence.

Singh, L., I.F. Purdom, and K.W. Jones. 1980. Sex chromosome associated satellite DNA: evolution and conservation. Chromosoma 79: 137-157.

Since the discovery that satellite DNA is located in heterochromatin, its possible role in mediating various heterochromatic functions has been the subject of both controversy and other reviews. Heterochromatin shows many very well defined functions in such diverse processes as chromosome pairing and segregation, position effect variegation, chromosome rearrangements, speciation, and recombination. All of these functions have been analyzed in great detail eithergenetically or cytogenetically, but in no case have the specific DNA sequences responsible for these phenomena been determined. Long tandem arrays that can change rapidly in evolution both qualitatively and quantitatively could act to disrupt normal chromosome behavior. However, the question remains whether such simple tandem arrays have an important positive contribution toward any of the functions attributed to heterochromatin.

With the application of recombinant DNA technology to such highly repeated sequences we now have the tools to characterize genetically altered states of heterochromatin with sufficient precision as to answer these questions.

Brutlag, D.L. 1980. Molecular arrangement and evolution of heterochromatic DNA. Annual Review of Genetics 14: 121-144.

After “selfish DNA”, the decade during which noncoding DNA supposedly was ignored (1980-1989):

The foregoing data support the concept that the so-called “junk” or genetically inactive DNA centered around the centromeric region has a function in controlling the separation of centromere (or its replication into two daughter centromeres) at the junction of metaphase-anaphase in mitosis.

Vig, B.K. 1982. Sequence of centromere separation: role of centromeric heterochromatin. Genetics 102: 795-806.

Satellite DNAs were first discovered over twenty years ago as species of DNA which, due to their unusual base composition, band at densities distinct from bulk DNA upon equilibrium sedimentation (Kit, 1961). Subsequently, it was shown that these DNAs are highly repetitious, that they are arranged in long tandem arrays, and that they are localized typically in pericentric or telocentric heterochromatin. Many of these DNAs, including mouse satellite DNA, have been sequenced. Despite detailed knowledge of the structure and location of satellite DNAs, their potential function(s) have only been hypothesized. These range from none (i.e., selfish DNA) to roles in many events including enhanced or reduced recombination, spindle attachment, gene amplification, chromosome pairing and/or segregation. Unfortunately, most of these hypotheses do not readily lend themselves to experimental investigation.

One major conclusion from the work described is that the association of kinetochores with centromeric regions of mouse chromosomes is not simply due to the presence of mouse satellite DNA sequences. However, mouse satellite DNA does appear to play a crucial role in the maintenance of contact between sister chromatids during metaphase.

Lica, L.M., S. Narayanswami, and B.A. Hamkalo. 1986. Mouse satellite DNA, centromere structure, and sister chromatid pairing. Journal of Cell Biology 103: 1145-1151.

Repetitive DNA evolves more rapidly than other genomic regions. Still, long regions of homology can be found between satellites from closely related species. Statistically significant homologies can even be found between satellites from species very distantly re- lated as the Drosophila and Bovine satellites or between animal and plant species. Whether such homologies have any functional significance, is not known.

The interpretation of these homologies can be addressed with respect to two different theories concerning the function of repeated DNA. The striking coincidence between the size of these repeat units and the mononucleosome DNA length suggests that these repeats have a role in determining chromatin structure. In fact, a sequence-dependent phasing of nucleosomes along repetitive DNA has been found in a mouse satellite DNA and in the African green monkey satellite. This could explain the homologies found between these repeats at the sequence level and also the striking conservation of their size. On the other hand, if this DNA is functionless as suggested by some authors, the homologies found could be a consequence of a common origin for many tandemly repeated families. They could have arisen from conserved genomic sequences by independent amplification events. For example, several families of interspersed repetitive sequences found in animal species are known to derive from different tRNA genes by independent amplification events. Thus, the conservation of size could be explained if, for example, nucleosomes have a role in determining the size of the sequence to be amplified.

No experimental approach to the study of the functional significance of these sequences is readily apparent at present. However, Arabidopsis, with its small genome and simple pattern of repeated DNA may eventually be a useful system for the study of these ubiquitous components of the higher eukaryotic genome.

Martinez-Zapater, J.M., M.A. Estelle, and C.R. Somerville. 1986. A highly repeated DNA sequence in Arabidopsis thaliana. Molecular and General Genetics 204: 417-423.

Tandemly repeated DNA families have long attracted considerable attention from genome-watchers, ever since satellite DNAs were originally isolated, over 20 years ago, as subsets of genomic DNA that were separable from the bulk of DNA by isopycnic centrifugation.

Willard, H.F. and J.S. Waye. 1987. Hierarchical order in chromosome-specific human alpha satellite DNA. Trends in Genetics 3: 192-198.

The species specificity of satellite profiles has long been interpreted as evidence for evolutionary instability of this class of DNA. In turn, this has led to the notion that either satellite DNAs have no function and are simply excess DNA, or that any function would be of a general nature involving chromosome condensation, pairing or recombination.

Lohe, A.R. and D.L. Brutlag. 1987. Identical satellite DNA sequences in sibling species of Drosophila. Journal of Molecular Biology 194: 161-170.

A highly conserved repetitive DNA sequence, (TTAGGG)n, has been isolated from a human recombinant repetitive DNA library. Quantitative hybridization to chromosomes sorted by flow cytometry indicates that comparable amounts of this sequence are present on each human chromosome. Both fluorescent in situ hybridization and BAL-31 nuclease digestion experiments reveal major clusters of this sequence at the telomeres of all human chromosomes. The evolutionary conservation of this DNA sequence, its terminal chromosomal location in a variety of higher eukaryotes (regardless of chromosome number or chromosome length), and its similarity to functional telomeres isolated from lower eukaryotes suggest that this sequence is a functional human telomere.

Moyzis, R.K., J.M. Buckingham, L.S. Cram, M. Dani, L.L. Deaven, M.D. Jones, J. Meyne, R.L. Ratliff, and J.-R. Wu. 1988. A highly conserved repetitive DNA sequence, (TTAGGG)n, present in the telomeres of human chromosomes. Proceedings of the National Academy of Sciences of the USA 85: 6622-6626.

The chromosomes of most mammalian species contain centromeric domains which comprise repetitive DNA sequences. Most of these domains contain blocks of simple sequence DNA families, the properties of which give rise to the characteristic C-band patterns present in mammalian chromosomes. More than one simple sequence DNA family can occupy the same centromeric domain. The biological role of these sequences in the function of an active centromere is unknown; however, one of the simple sequence DNAs in the mouse genome can bind microtubule spindle fibers, which may imply an active role for these particular DNA sequences. Recently, we have shown that one member of the human alphoid family of DNA sequences is physically closer to the functional kinetochore within the centromeric domain of human chromosome 9 than are members of the simple sequence DNA family termed satellite III.

Joseph, A., A.R. Mitchell, and O.J. Miller. 1989. The organization of the mouse satellite DNA at centromeres. Experimental Cell Research 183: 494-500.

“Noncoding DNA has been ignored until recently” (beginning around 1989-1990):

The prevailing view that satellite DNA is mostly ‘junk’ whose presence or absence has no bearing on the fitness of its carriers, has been widely accepted. Most of the support for this came from interspecific comparisons. By adding extra heterochromatic materials or by deleting nearby essential (ribosomal RNA) genes, previous studies only addressed the issue indirectly. We have provided the first direct test of this hypothesis by comparing the fitnesses of Drosophila with, and without, a well characterized array of satellite repeats. A fitness effect is clearly detectable. The observed effect is also inconsistent with the view that the functions of satellite DNA, if any, must be in the germ cells.

It is far fetched to think that all satellite DNAs have a useful role, but it is equally unwise to label them universally as junk in the absence of any other direct proof.

Wu, C.-I., J.R. True, and N. Johnson. 1989. Fitness reduction associated with the deletion of a satellite DNA array. Nature 341: 248-251.

The centromere is the major cis-acting genetic locus involved in chromosome segregation in mitosis and meiosis. The mammalian centromere is characterized by large amounts of tandemly repeated satellite DNA and by a number of specific centromere proteins, at least one of which has been shown to interact directly with centromeric satellite DNA sequences. Although direct functional assays of chromosome segregation are still lacking, the data are most consistent with a structural and possibly functional role for satellite DNA in the mammalian centromere.

As a necessary first step in the identification and characterization of DNA at mammalian centromeres, one approach has been to focus on the structure and organization of the DNA from the primary constriction. Although it has been recognized for over 20 years that the centromeric heterochromatin in chromosomes from virtually all complex eukaryotic organisms consists of various families of satellite DNA, they have only recently been taken seriously as candidates for something other than ‘junk’ DNA or genomic ‘flotsam and jetsam’ (Miklos 1985). Satellite DNA families in different mammalian orders (e.g. rodents and primates) appear largely unrelated in terms of their actual sequences; however, similarities in their overall chromosomal organization and in specific short sequences implicated in centromere protein recognition may offer enticing clues to the potential involvement of at least some satellite DNAs in centromere structure and/or function.

Willard, H.F. 1990. Centromeres of mammalian chromosomes. Trends in Genetics 6: 410-416.

____________

Part of the Quotes of interest series.
____________

Other citations

Battaglia, E. 1999. The chromosome satellite (Navashin’s “sputnik” or satelles): a terminological comment. Acta Biologica Cracoviensia, Series Botanica 41: 15-18.

Kit, S. 1961. Equilibrium sedimentation in density gradients of DNA preparations from animal tissues. Journal of Molecular Biology 3: 711-716.

Sueoka, N. 1961. Variation and heterogeneity of base composition of deoxyribonucleic acids: a compilation of old and new data. Journal of Molecular Evolution 3: 31-40.

Quotes of interest — science news stories.

We have been told in science news stories since the early 1990s that biologists long neglected the potential significance of noncoding DNA. (Sadly, this is in line with the claims made by creationists, who claim that “Darwinism” is to blame despite the obvious fact that Darwinian adaptationism would expect functions. Some biologists likewise play up the notion that we have ignored noncoding sequences and just now are coming to appreciate them, thanks, no doubt, to their own revolutionary insights, but again, this ignores a diverse literature on the topic spanning the rise of the tools necessary for such work up to the present.) But what about the science stories that were actually written during the supposed period during which noncoding DNA was dismissed as uninteresting (i.e. 1980 to the early 1990s)?

If you had a subscription to Science in the 1980s, you would have read stories like these by their science writer Roger Lewin:

Lewin, R. 1981. Evolutionary history written in globin genes. Science 214: 426-427.

Even though the human β-globin complex contains a relatively large number of active genes, 95 percent of the locus is made up of DNA that does not code for proteins. What is the role of this extra DNA, if any? The pseudogenes constitute just a small proportion of the region, although more pseudogenes might exist. Some of the DNA is made up of representatives of well-known families of repetitive sequences. And the remainder is DNA of no known function or comparable sequence.
“We wanted to test the hypothesis that this extra DNA is ‘junk DNA,'” says Jeffreys, “so we compared the β loci in humans, gorillas, and baboons.” Jeffreys and his colleagues reasoned that if it were junk DNA, then over the 20 to 40 million years of evolution represented by humans, apes, and Old World monkeys both the sequence and the overall quantity of intergenic DNA could be expected to vary. “It turned out that the cluster is remarkably stable,” reports Jeffreys. “The overall pattern and size of the cluster is the same, and the rate of nucleotide substitutions is one-quarter to one-fifth of what be expected in functionless DNA”. The noncoding DNA therefore appears not to be junk, but what function it might perform is still a mystery.

Lewin, R. 1982. Repeated DNA still in search of a function. Science 217: 621-623.
[Reporting about an NIH International Workshop in Highly Repeated DNA July, 1982]

Interest in repetitive DNA sequences goes back many years but, as with many aspects of molecular biology, the advent of recombinant DNA technology and DNA sequencing now permits previously unmatched scrutiny of the structures of interest.

If mobility is a reality, and most agree that it probably is, then it seems likely that at least some members of repeat families will have important effects in the genome, even if they have no formal function. Enhancing recombination and altering rates of gene expression are obvious possibilities, while the initiation of new species is a more recondite proposal.

The truth is, however, that the functions of the large and motley collection of repeated DNA families are proving particularly resistant to elucidation. Putative functions are many, including, variously, involvement in chromosome pairing, control of gene expression, processing of messenger RNA precursors, and participation in DNA replication. So far none has been established, save for the single exception of a small family that gives rise to 7S RNA, a molecule that recently was serendipitously discovered to be an essential component of a particle that mediates the secretion of proteins from cells.

Some repetitive DNA will undoubtedly be shown to have a function, in the formal sense; some will likely be shown to exert important effects; and the remainder may well have no function or effect at all and can therefore be called selfish DNA. Repetitive DNA constitutes a substantial proportion of the genome (up to 90 percent in some cases), and there is considerable speculation on how it will eventually be divided between these three groups. Current bets would put a small fraction in the function category, with distribution of the rest rising steeply through the effect and selfish categories.

Satellite DNA unquestionably is a puzzle. What determines the number of copies in a repeat family? And how does the genome tolerate so much of it? Perhaps, as Singer has recently promulgated, just a small fraction of the satellite sequences is essential to some genomic function while the remainder is harmless surplus. This, she indicates, is a comfortable middle ground between the extreme selfish DNA position, which sees no function in all this “junk DNA,” and the adaptationist position, which looks for functions in every structure. The same questions and speculations can be applied to dispersed repetitive DNA.

One observation that might be taken as evidence of function in repeated sequences is the frequency of transcription into RNA. A significant proportion of nuclear RNA contains transcripts of repeated sequences, although 90 percent of this is lost in RNA processing and exit to the cytoplasm. Davidson and his colleagues have shown that in sea urchin the spectrum of repeat families that are transcribed changes during development, an appealing argument for some regulatory function. Most intriguing, however, is the discovery that only a small proportion of any repeat family is ever transcribed. “Most members appear to be quiescent, which must make you cautious when isolating samples in search of their function.”

It is clear that, from their abundance, their unusual structure, and their frequent transcription, dispersed repetitive DNA families cannot be ignored. But it is equally clear that for the most part they, like their tandemly repeated relatives, remain a phenomenon in search of a function.


Lewin, R. 1982. Adaptation can be a problem for evolutionists. Science 216: 1212-1213.

Molecular biology of recent years has revealed many new and intriguing categories of DNA, some of which appear to have no role. One explanation of this has been that the nonaptive sequences provide raw material for future evolution. But the logic of natural selection does not allow for selection for future use. More likely is that the accumulation of nonaptive DNA is a consequence of the innate property of repeated sequences of nucleic acid to replicate and move around the genome. Later it may be recruited to perform some role, in which case it becomes an exaptation.

Lewin, R. 1983. A naturalist of the genome. Science 222: 402-405.

Some mobile elements are large and complex, measuring as much as 10,000 nucleotides in length and carrying many genes, while others are simple sections of repeated DNA just a few hundred nucleotides long. Some people would classify all such elements as “junk” or “parasitic” DNA. Others strongly demur and insist that, for instance, although there is yet to be found any convincing evidence for the involvement of a limited class of elements in development in organisms other than maize, the possibility should by no means be dismissed. In any case it is clear that the mobility of certain genetic elements is essential in the generation of the huge diversity of antibodies in vertebrates and in the production of different antigenic coats in certain parasites. Jumping genes clearly represent a potentially rich source of mutation. In addition, an evolutionary link between mobile elements and retroviruses now seems incontrovertible, as does a causal relationship with certain cancers.

Lewin, R. 1985. More progress in messenger RNA splicing. Science 228: 977.

This summer marks 8 years since eukaryotic genes were first discovered to be interrupted by noncoding sequences, known variously as intervening sequences or introns. The discovery raised two sets of questions. The first concerns the origin and function-if any-of introns, which, by its very nature, is a very difficult question to test and therefore remains somewhat in the realms of speculation, although significant insights are being made.The second focuses on the mechanics of removal of these sequences in the production of mature RNA molecules, and in principle should be experimentally more tractable. The immense effort directed at this second question has produced during the past 8 years some conventional biochemistry, some novel and surprising nucleic acid chemistry, and a great deal of frustration.

Lewin, R. 1986. “Computer genome” is full of junk DNA. Science 232: 577-578.

Many biologists were unhappy with the idea that much of the DNA might have no function, says Loomis. “There is a very strong feeling that if a molecule, or any kind of biological structure, exists, then it must be serving some kind of selectively advantageous purpose. I disagree with this viewpoint very strongly.” Loomis prefers to turn the question around. “We should ask, ‘what is the selective advantage of getting rid of a particular structure?’ This is not common thinking.”

It is of course very difficult to prove that a structure or a sequence of DNA has no function. “People will always say, ah, but you haven’t looked under the right conditions,” says Loomis. In the case of multigene families, the best data come from mutation experiments.

Lewin, R. 1988. Chance and repetition. Science 240: 603.

With some kind of concerted effort to map and sequence the entire human genome now appearing to be inevitable, there will be much excitement at the prospect of discovering what is encoded in the 3-billion-base “message”. There are certain to be some surprises, perhaps even equivalent in magnitude to the discovery a decade ago of long, noncoding sequences that interrupt the great majority of eukaryotic genes. But there are many biologists who expect large parts of the genome to be devoid of any function at all: “We face the prospect of trudging through huge tracts of junk DNA,” remarked British molecular biologist Sydney Brenner during one of the many recent panel discussions on the project.

At least some proportion of the DNA in the genomes of most organisms is in the form of these so-called middle repetitive sequences, ranging from 3% to as much as 70%: typically, the bigger the genome, the more repetitive DNA. There is a long tradition in biology that, seeing structures as extensive as these, argues that there must be a functional explanation for them.

Biologists have long speculated about the function of middle repetitive sequences, with regulation of gene expression being one popular notion. Loomis and Gilpin’s perspective, however, is that, although some middle repetitive sequences may have acquired a function once they have formed, there is no need to invoke function as a selective pressure for their origin.

____________

Part of the Quotes of interest series.


Quotes of interest — 1980s edition (part two).

This is the second installment in the quotes of interest series that focuses in particular on research and discussions from the 1980s, when noncoding DNA supposedly was ignored as irrelevant. The important message being offered is that there was plenty of research into possible functions or lack thereof in noncoding sequences of all types, and that whichever way authors concluded was based on the evidence available at the time, not ideology. This includes the parallel development of neutral theory, many proponents of which did conclude that pseudogenes were nonfunctional on the basis of their high mutation rates compared with coding sequences. Again, the point is not that no one argued against function (I argue against function at the organism level for most noncoding DNA), but that this is based on evidence, not unsupported assumption.

Members of the Alu family of interspersed repeated sequences and its rodent equivalents may be the normal cellular DNA replication initiation sites. In mammalian cells DNA replication proceeds bidirectionally simultaneously from many sites, and thus the initiation sites for replication might be expected to be interspersed repeated sequences with two-fold rotational symmetry. The inverted repeated examples of the Alu family of interspersed repeated sequences and their Chinese hamster equivalents show these attributes. These considerations raise the question of whether the transcription of these repeated sequences by RNA polymerase III, or the interaction of these sequences with the low molecular weight RNA, or both, may play a role in the initiation of DNA replication.

Jelinek, W.R., T.P. Toomey, L. Leinwand, C.H. Duncan, P.A. Biro, P.V. Choudary, S.M. Weissman, C.M. Rubin, C.M. Houck, P.L. Deininger, and C.W. Schmid. 1980. Ubiquitous, interspersed repeated sequences in mammalian genomes. Proceedings of the National Academy of Sciences of the USA 77: 1398-1402.

We have assigned six members of the human β-actin multigene family to specific human chromosomes. The functional gene, ACTB, is located on human chromosome 7, and the other assigned β-actin-related sequences are dispersed over at least four different chromosomes including one locus assigned to the X chromosome. Using intervening sequence probes, we showed that the functional gene is single copy and that all of the other β-actin related sequences are recently generated in evolution and are probably processed pseudogenes. The entire nucleotide sequence of the functional gene has been determined and is identical to cDNA clones in the coding and 5′ untranslated regions. We have previously reported that the 3′ untranslated region is well conserved between humans and rats (Ponte et al., Nucleic Acids Res. 12:1687-1696, 1984). Now we report that four additional noncoding regions are evolutionarily conserved, including segments of the 5′ flanking region, 5′ untranslated region, and, surprisingly, intervening sequences I and III. These conserved sequences, especially those found in the introns, suggest a role for internal sequences in the regulation of β-actin gene expression.

Our finding of highly conserved blocks of nucleotides in two of the five intervening sequences of β-acting genes raises the possibility that these segments have regulatory functions. Conserved internal regions have been reported previously, such as the internal transcriptional enhancer regions of immunoglobulin genes. However, the locations of these enhancers were initially regarded as a peculiarity of the immunoglobulin gene loci. More recently, internal control regions have been detected (but yet unidentified) for the adenovirus E1A gene, human globin genes, and chicken thymidine kinase gene. Any conclusion that the conserved β-actin intron sequences, especially those of IVS I, function as transcriptional enhancers must await direct experimentation. Nevertheless the evolutionary conservation of the immunoglobulin enhancer segments indicates that other transcriptional enhancers or cis-acting regulatory signals would be under selective pressure. It is interesting to note in this regard that the IVS I of both α- and β-globin genes are the most conserved introns of these genes. The IVS I of the human and mouse β-globin genes, for example, has 81 base pairs matching to give a KN(1) value of 0.302. Therefore these introns may well contain part of the proposed downstream regulatory elements.

Ng, S.-Y., P. Gunning, R. Eddy, P. Ponte, J. Leavitt, T. Shows, and L. Kedes. 1985. Evolution of the functional human β-actin gene and its multi-pseudogene family: conservation of noncoding regions and chromosomal dispersion of pseudogenes. Molecular and Cellular Biology 5: 2720-2732.

Although the presence and similar location of pseudogenes in all the mammalian globin gene clusters suggest that pseudogenes may have some as yet unidentified function, the simplest explanation for their existence is that they are the natural consequence of the mechanisms of gene amplification and sequence divergence. The arrangement of genes within the human α-globin gene cluster is consistent with this possibility.

Proudfoot, N.J. and T. Maniatis. 1980. The structure of a human α-globin pseudogene and its relationship to α-globin gene duplication. Cell 21: 537-544.

In summary, the structural analysis of a number of different globin gene clusters suggests that globin gene families are in evolutionary flux. Perhaps pseudogenes are simply a natural consequence of the mechanisms by which multigene families evolve.

Lacy, E. and T. Maniatis. 1980. The nucleotide sequences of a rabbit β-globin pseudogene. Cell 21: 545-553.

Particularly surprising are the intron-exon splice borders of the H3.3 gene. Not only do they contain the standard splice consensus sequences, but in all cases the introns are flanked by 7-8 base pair direct repeats. The function, if any, of these repeats is unclear, since the repeats include both intron and exon bases. One functional difference between these introns can be inferred from the structures of the previously isolated cDNAs. Three of the cDNAs were shown to contain an unspliced intron, but did not carry introns 2 and 3. This could reflect the preferential splicing out of introns 2 and 3 before the splicing out of intron 1. If there is a tendency toward 5′ to 3′ splicing, the unusual splice junctions seen for the H3.3 gene could act to supersede this tendency. The advantage to the organism to remove intron 1 last is unclear but could point to some as yet undetermined function for this intron. In support of this, we have found that a DNA probe derived from intron 1 hybridizes to a single fragment in a Southern blot of total mouse genomic DNA indicating that the sequences in this intron may be conserved, whereas a DNA probe derived from intron 2 does not hybridize.

Wells, D., D. Hoffman, and L. Kedes. 1987. Unusual structure, evolutionary conservation of non-coding sequences and numerous pseudogenes characterize the human H3.3 histone multigene family. Nucleic Acids Research 15: 2871-2889.

A mouse α-globin-related pseudogene (ψα30.5) completely lacks intervening sequences, and could not code for a functional globin polypeptide because of frameshifts. The widespread occurrence of globin pseudogenes in other species suggests that they are not ‘dead’ genes but may be important in controlling globin expression.

The general hypothesis that pseudogenes control the productive genes in some fashion, nevertheless, remains attractive and we are investigating the hypothesis further, including tests in non-erythroid tissues. Certainly, the widespread occurrence of globin pseudogenes argues strongly for their functional importance.

Vanin, E.F., G.I. Goldberg, P.W. Tucker, and O. Smithies. 1980. A mouse α-globin-related pseudogene lacking intervening sequences. Nature 286: 222-226.

The foregoing data support the concept that the so-called “junk” or genetically inactive DNA centered around the centromeric region has a function in controlling the separation of centromere (or its replication into two daughter centromeres) at the junction of metaphase-anaphase in mitosis.

Vig, B.K. 1982. Sequence of centromere separation: role of centromeric heterochromatin. Genetics 102: 795-806.

A highly conserved repetitive DNA sequence, (TTAGGG)n, has been isolated from a human recombinant repetitive DNA library. Quantitative hybridization to chromosomes sorted by flow cytometry indicates that comparable amounts of this sequence are present on each human chromosome. Both fluorescent in situ hybridization and BAL-31 nuclease digestion experiments reveal major clusters of this sequence at the telomeres of all human chromosomes. The evolutionary conservation of this DNA sequence, its terminal chromosomal location in a variety of higher eukaryotes (regardless of chromosome number or chromosome length), and its similarity to functional telomeres isolated from lower eukaryotes suggest that this sequence is a functional human telomere.

The human genome contains a variety of DNA sequences present in multiple copies. These repetitive DNA sequences are thought to arise by many mechanisms, from direct sequence amplification to the unequal recombination of homologous DNA regions to the reverse flow of genetic information. While it is likely that some of these repetitive DNA sequences influence the structure and function of the human genome, little experimental evidence supports this idea at present.
We reasoned, however, that evolutionary conservation of a particular repetitive DNA sequence family might imply that the sequence is essential to cellular function.

Moyzis, R.K., J.M. Buckingham, L.S. Cram, M. Dani, L.L. Deaven, M.D. Jones, J. Meyne, R.L. Ratliff, and J.-R. Wu. 1988. A highly conserved repetitive DNA sequence, (TTAGGG)n, present in the telomeres of human chromosomes. Proceedings of the National Academy of Sciences of the USA 85: 6622-6626.

____________

Part of the Quotes of interest series.


Quotes of interest — pseudogene.

The term “pseudogene” was coined by Jacq and colleagues in 1977. The standard tale of biologists dogmatically ignoring possible functions of noncoding DNA would have it that such a sequence automatically would be dismissed as “junk” when discovered, especially since the notion of a degraded and now non-coding former gene matches Ohno’s concept of “junk DNA” as originally proposed. The reality is that Jacq et al. (1977) did consider whether the sequence had a function, but based on the available data they concluded that the best explanation is that it is “an evolutionary relic”. They did not cite Ohno.

Summary
The 5S DNA of Xenopus laevis, coding for oocyte-type 5S RNA, consists of many copies of a tandemly repeated unit of about 700 base pairs. Each unit contains a “pseudogene” in addition to the gene. The pseudogene has been partly sequenced and appears to be an almost perfect repeat of 101 residues of the gene. The order of components in the repeat unit is (5′) long spacer-gene-linker-pseudogene (3′) in the “+” strand (or H strand) of the DNA. The possible function of the pseudogene is discussed.

The functions of the different regions of the 5S DNA are only imperfectly understood. The gene region 1-121 codes for the mature oocyte 5S RNA, and the presence of a pppG sequence at residue 1 of the mature 5S RNA defines this residue as the point of initiation of transcription by RNA polymerase III (Roeder, 1976). The point of termination of transcription, however, is less clear. Brown and Brown (1976) have argued that the high A + T-rich sequence of residues 119-123 of the gene region is a signal for the termination of transcription. But low yields of a larger transcription product–about 135 residues long–have been isolated by Denis and Wegnez (1973) in pulse-labeling experiments in Xenopus laevis oocytes. Similar length molecules have also been isolated in heat-shocked Drosophila cells by Rubin and Hogness (1975). While clear evidence that these 135-long molecules are precursors of the mature 5S RNA in Xenopus (or Drosophila) is lacking, their isolation clearly demonstrates that longer transcripts may be synthesized in vivo. It is therefore possible that the structural gene for 5s RNA is larger than the 121 residues of the mature 55 RNA and extends into the region of DNA, linking gene and pseudogene for at least another 15 residues.

Thus the known transcription of the 5S DNA system does not explain the presence of the pseudogene. Moreover, no RNA products corresponding to the pseudogene have been isolated, although it is conceivable that these may well have been overlooked or confused with tRNA in earlier studies (Denis and Wegnez, 1973), especially if they occur only in low yield. We are thus forced to the conclusion that the most probable explanation for the existence of the pseudogene is that it is a relic of evolution. During the evolution of the 5S DNA of Xenopus laevis, a gene duplication occurred producing the pseudogene. Presumably the pseudogene initially functioned as a 5S gene, but then, by mutation, diverged sufficiently from the gene in its sequence so that it was no longer transcribed into an RNA product.

This evolutionary explanation for the presence of the pseudogene, however, is incomplete by itself in that it ignores the conservation in sequence of the pseudogene, and indeed of the entire G + C-rich spacer of 5S DNA. In an attempt to explain this, it has been suggested (Brownlee, 1976) that the pseudogene may be a “transcribed spacer” corresponding to a primary transcript of 5s RNA, which is a transient precursor and has not been detected. If this is so, then most of the G + C-rich region of 5S DNA would be the structural gene for 5S RNA. This function, if true, would provide the necessary selective pressure to conserve the sequence of the “linker” and pseudogene region so that the correct processing of the postulated 300-long precursor was maintained. In the absence of any experimental evidence for such a long precursor, however, this suggestion must be regarded as speculative; it is more probable that the pseudogene is a relic of evolution.

____________

Part of the Quotes of interest series.
____________

Jacq, C., J.R. Miller, and G.G. Brownlee. 1977. A pseudogene structure in 5S DNA of Xenopus laevis. Cell 12: 109-120.