Skip to comments.Genome Evolution | First, a Bang Then, a Shuffle
Posted on 01/31/2003 4:19:03 PM PST by jennyp
Picture an imperfect hall of mirrors, with gene sequences reflecting wildly: That's the human genome. The duplications that riddle the genome range greatly in size, clustered in some areas yet absent in others, residing in gene jungles as well as within vast expanses of seemingly genetic gibberish. And in their organization lie clues to genome origins. "We've known for some time that duplications are the primary force for genes and genomes to evolve over time," says Evan Eichler, director of the bioinformatics core facility at the Center for Computational Genomics, Case Western Reserve University, Cleveland.
For three decades, based largely on extrapolations from known gene families in humans, researchers have hypothesized two complete genome doublings--technically, polyploidization--modified by gene loss, chromosome rearrangements, and additional limited duplications. But that view is changing as more complete evidence from genomics reveals a larger role for recent small-scale changes, superimposed on a probable earlier single doubling. Ken Wolfe, a professor of genetics at the University of Dublin, calls the new view of human genome evolution "the big bang" followed by "the slow shuffle."
It's a controversial area.
"There has been a lot of debate about whether there were two complete polyploid events at the base of the vertebrate lineages. The main problem is that vertebrate genomes are so scrambled after 500 million years, that it is very difficult to find the signature of such an event," explains Michael Lynch, a professor of biology at Indiana University, Bloomington, With accumulating sequence data from gene families, a picture is emerging of a lone, complete one-time doubling at the dawn of vertebrate life, followed by a continual and ongoing turnover of about 5-10% of the genome that began in earnest an estimated 30-50 million years ago. Short DNA sequences reinvent themselves, duplicating and sometimes diverging in function and dispersing among the chromosomes, so that the genome is a dynamic, ever-changing entity.
Duplication in the human genome is more extensive than it is in other primates, says Eichler. About 5% of the human genome consists of copies longer than 1,000 bases. Some doublings are vast. Half of chromosome 20 recurs, rearranged, on chromosome 18. A large block of chromosome 2's short arm appears again as nearly three-quarters of chromosome 14, and a section of its long arm is also on chromosome 12. The gene-packed yet diminutive chromosome 22 sports eight huge duplications. "Ten percent of the chromosome is duplicated, and more than 90% of that is the same extremely large duplication. You don't have to be a statistician to realize that the distribution of duplications is highly nonrandom," says Eichler.
The idea that duplications provide a mechanism for evolution is hardly new. Geneticists have long regarded a gene copy as an opportunity to try out a new function while the original sequence carries on. More often, though, the gene twin mutates into a nonfunctional pseudogene or is lost, unconstrained by natural selection because the old function persists. Or, a gene pair might diverge so that they split a function.
Some duplications cause disease. A type of Charcot-Marie-Tooth disease, for example, arises from a duplication of 1.5 million bases in a gene on chromosome 17. The disorder causes numb hands and feet.
INFERRING DUPLICATION ORIGINS A duplication's size and location may hold clues to its origin. A single repeated gene is often the result of a tandem duplication, which arises when chromosomes misalign during meiosis, and crossing over distributes two copies of the gene (instead of one) onto one chromosome. This is how the globin gene clusters evolved, for example. "Tandem duplicates are tandemly arranged, and there may be a cluster of related genes located contiguously on the chromosome, with a variable number of copies of different genes," says John Postlethwait, professor of biology in the Institute of Neuroscience at the University of Oregon, who works on the zebrafish genome.
In contrast to a tandem duplication, a copy of a gene may appear on a different chromosome when messenger RNA is reverse-transcribed into DNA that inserts at a new genomic address. This is the case for two genes on human chromosome 12, called PMCHL1 and PMCHL2, that were copied from a gene on chromosome 5 that encodes a neuropeptide precursor. Absence of introns in the chromosome 12 copies belies the reverse transcription, which removes them.1 (Tandem duplicates retain introns.)
The hallmarks of polyploidy are clear too: Most or all of the sequences of genes on one chromosome appears on another. "You can often still see the signature of a polyploidization event by comparing the genes on the two duplicated chromosomes," Postlethwait says.
Muddying the waters are the segmental duplications, which may include tandem duplications, yet also resemble polyploidy. "Instead of a single gene doubling to make two adjacent copies as in a tandem duplication, in a segmental duplication, you could have tens or hundreds of genes duplicating either tandemly, or going elsewhere on the same chromosome, or elsewhere on a different chromosome. If the two segments were on different chromosomes, it would look like polyploidization for this segment," says Postlethwait. Compounding the challenge of interpreting such genomic fossils is that genetic material, by definition, changes. "As time passes, the situation decays. Tandem duplicates may become separated by inversions, transpositions, or translocations, making them either distant on the same chromosome or on different chromosomes," he adds.
QUADRUPLED GENES Many vertebrate genomes appear to be degenerate tetraploids, survivors of a quadrupling--a double doubling from haploid to diploid to tetraploid--that left behind scattered clues in the form of genes present in four copies. This phenomenon is called the one-to-four rule. Wolfe compares the scenario to having four decks of cards, throwing them up in the air, discarding some, selecting 20, and then trying to deduce what you started with. Without quadruples in the sample, it is difficult to infer the multideck origin. So it is for genes and genomes.
"How can you tell whether large duplications built up, or polyploidy broke down? People are saying that they can identify blocks of matching DNA that are evidence for past polyploidization, which have been broken up and overlain by later duplications. But at what point do blocks just become simple duplications?" asks Susan Hoffman, associate professor of zoology at Miami University, Oxford, Ohio.
The idea that the human genome has weathered two rounds of polyploidy, called the 2R hypothesis, is attributed to Susumu Ohno, a professor emeritus of biology at City of Hope Medical Center in Duarte, Calif.2 The first whole genome doubling is postulated to have occurred just after the vertebrates diverged from their immediate ancestors, such as the lancelet (Amphioxus). A second full doubling possibly just preceded the divergence of amphibians, reptiles, birds, and mammals from the bony fishes.
Evidence for the 2R hypothesis comes from several sources. First, polyploidy happens. The genome of flowering plants doubled twice, an estimated 180 and 112 million years ago, and rice did it again 45 million years ago.3 "Plants have lots of large blocks of chromosomal duplications, and the piecemeal ones originated at the same time," indicating polyploidization, says Lynch. The yeast Saccharomyces cerevisiae is also a degenerate tetraploid, today bearing the remnants of a double sweeping duplication.4
Polyploidy is rarer in animals, which must sort out unmatched sex chromosomes, than in plants, which reproduce asexually as well as sexually. "But polyploidization is maintained over evolutionary time in vertebrates quite readily, although rarely. Recent examples, from the last 50 million years ago or so, include salmonids, goldfish, Xenopus [frogs], and a South American mouse," says Postlethwait. On a chromosomal level, polyploidy may disrupt chromosome compatibility, but on a gene level, it is an efficient way to make copies. "Polyploidy solves the dosage problem. Every gene is duplicated at the same time, so if the genes need to be in the right stoichiometric relationship to interact, they are. With segmental duplications, gene dosages might not be in the same balance. This might be a penalty and one reason why segmental genes don't survive as long as polyploidy," Lynch says.
Traditional chromosome staining also suggests a double doubling in the human genome's past, because eight chromosome pairs have near-dopplegängers, in size and band pattern.5 A flurry of papers in the late 1990s found another source of quadrupling: Gene counts for the human, then thought to be about 70,000, were approximately four times those predicted for the fly, worm, and sea squirt. The human gene count has since been considerably downsized.
Finally, many gene families occur in what Jurg Spring, a professor at the University of Basel's Institute of Zoology in Switzerland, dubs "tetrapacks."6 The HOX genes, for example, occupy one chromosome in Drosophila melanogaster but are dispersed onto four chromosomes in vertebrate genomes.7 Tetrapacks are found on every human chromosome, and include zinc-finger genes, aldolase genes, and the major histocompatibility complex genes.
"In the 1990s, the four HOX clusters formulated the modern version of the 2R model, that two rounds of genome duplication occurred, after Amphioxus and before bony fishes," explains Xun Gu, an associate professor of zoology and genetics at Iowa State University in Ames. "Unfortunately, because of the rapid evolution of chromosomes as well as gene losses, other gene families generated in genome projects did not always support the classic 2R model. So in the later 1990s, some researchers became skeptical of the model and argued the possibility of no genome duplication at all."
THE BIG BANG/SLOW SHUFFLE EVOLVES Human genome sequence information has enabled Gu and others to test the 2R hypothesis more globally, reinstating one R. His group used molecular-clock analyses to date the origins of 1,739 duplications from 749 gene families.8 If these duplications sprang from two rounds of polyploidization, the dates should fall into two clusters. This isn't exactly what happened. Instead, the dates point to a whole genome doubling about 550 million years ago and a more recent round of tandem and segmental duplications since 80 million years ago, when mammals radiated.
Ironically, sequencing of the human genome may have underestimated the number of duplications. The genome sequencing required that several copies be cut, the fragments overlapped, and the order of bases derived. The algorithm could not distinguish whether a particular sequence counted twice was a real duplication, present at two sites in the genome, or independent single genes obtained from two of the cut genomes.
Eichler and his group developed a way around this methodological limitation. They compare sequences at least 15,000 bases long against a random sample of shotgunned whole genome pieces. Those fragments that are overrepresented are inferred to be duplicated.8 The technique identified 169 regions flanked by large duplications in the human genome.
Although parts of the human genome retain a legacy of a long-ago total doubling, the more recent, smaller duplications provide a continual source of raw material for evolution. "My view is that both happen. A genome can undergo polyploidy, duplicating all genes at once, but the rate of segmental duplications turns out to be so high that every gene will have had the opportunity to duplicate" by this method also, concludes Lynch. It will be interesting to see how the ongoing analyses of the human and other genome sequences further illuminate the origins and roles of duplications.
Ricki Lewis (firstname.lastname@example.org) is a contributing editor.
Does it, or does it merely speculate that such has occured in animals? The duplication of genes is (according to the article) far from random. Maybe genes that get expressed the most or tend to mutate the most have the most copies. That is engineering and it does not demand a naturalistic explanation.
The article assumes that gene duplication will automatically produce an increase in complexity. There is little evidence that this is true. The reason is that there is precious little evidence that gene copies can ever become a gene with a function much different from the old gene. Because of that, all the duplications in the world won't get you from ameoba to man. It will just get you to a man with a lot of psuedogenes. We have lots of such genes, but so much "junk DNA" is conserved that some have been led to wonder if it does not have a function after all, such as steering protiens to the right part of the cell.
Yup, facts are such a problem for evolutionists, one more thing for them to need to try to explain away! They also have their own math. For example, one would think that if you take away something you would end up with less not more. However in evo math when natural selection takes away from the gene pool you get more genes than you had before!
The author is not speaking to the distribution of duplication events before selection. From our POV that is still random. We don't know initial conditions, don't know a mechanism, and can't predict future events. That's as random as things get.
What it comes down to is that nobody can distinguish between a random and an "intelligent design" event.
There are supposedly some 10 million years of mutations separating man from chimps. Chimps and men differ by some 5% of their DNA (the evolutionist 1% has been proven wrong by the same man who originally made the statement).BZZZZZT! It's more like 1.4% where it counts - in the genes themselves:
The new estimate could be a little misleading, said Saitou Naruya, an evolutionary geneticist at the National Institute of Genetics in Mishima, Japan. "There is no consensus about how to count numbers or proportion of nucleotide insertions and deletions," he said.Besides, IIRC his method of counting insertions & deletions would treat a 100 base pair insertion as 100 mutations. Nebullis, do you remember if this is true?
Indels are common in the non-functional sections of the genome, said Peter Oefner, a researcher at Stanford's Genome Technology Center in Palo Alto, California. Scientists estimate that up to 97 percent of DNA in the human genome has no known function. However, he added, indels are extremely rare in gene sequences.
"We haven't observed a single indel in a [gene] to date between human and chimp," said Oefner. Therefore, the revised estimate doesn't alter the amount of DNA that holds information about our species. Humans and chimps still differ by about one percent in gene sequences, he said.
Since chimps and men have about 3 billion DNA base pairs that 5% represents some 150,000,000 favorable mutations in those ten million years. Since with all our science, all our billions in research on DNA for decades have not shown a single favorable mutation has ever happened, I think that your statement is absolutely wrong scientifically - just as evolution is completely wrong scientifically.Plugging in the correct numbers & assumptions:
Since chimps and men have about
3 billion90 million gene-encoding DNA base pairs that 5%1.4% represents some 150,000,00014,000 neutral or favorable mutations in those ten million years.
Trying to use bogus numbers to defend their ideas makes us furious. :-)
She did, in #27.
Not quite. This 'study' is not a study at all. It is a reinterpretation of the work done by Roy Britten in comparing the sequences of human and chimp DNA. It is a reductionist view of the DNA differences between humans and chimps. It throws away most of the differences because supposedly they are unimportant because they are not in genes. Well the rest of the DNA does matter unlike what this hack has to say. Yes 97% of DNA does not code for genes, but his statement that
Scientists estimate that up to 97 percent of DNA in the human genome has no known function.
is totally false and he is not a scientist if he made it. The last half dozen years of biological research have been concerned with finding out just exactly what that 97% of DNA which evolutionists call "junk" does. What this DNA does is control what the gene does, when and how much protein it is to make, and even what specific proteins, amongst several which many genes can make, are to be made by the gene. In other words 'this junk' which this hack says scientists say 'has no known function' is what makes an organism function. In one single discovery, they have found what 10% of that DNA does - it acts as a zipper during cell division. So your article is total nonsense and National Geographic should be ashamed to publish such garbage.
So the 3.9% difference you wish to throw away is indeed important as is the 97% of DNA which your phony article claims is non-functional. What this shows is the quality of science being peddled by what were once respectable magazines in their attempt to save the totally discredited theory of evolution by discrediting the good reputation they had built up for decades.
In short, gore3000s numbers are better, its not 14K gene changes between man and chimp in 10 million years, but rather 150K changes that have established themselves througout the population.
(Gore3000 claimed 150 million mutations.)
The point of Britten's new study is that these previously missing mutations were simple insertions & deletions. So if you have a 1000 bp duplication, it's still just one mutation. I think the "extra 3.9%" figure refers to the increased difference in sequence, not to 2 1/2 times more mutations. I couldn't find the post I was thinking of from back in September (on another board) that explained the point directly, but here's an article from CalTech that hints at what I'm saying:
To describe exactly what Britten did, it is helpful to explain the old method as it was originally used to determine genetic similarities between two species. Called hybridization, the method involved collecting tiny snips of the DNA helix from the chromosomes of the two species to be studied, then breaking the ladder-like helixes apart into strands. Strands from one species would be radioactively labeled, and then the two strands recombined.
The helix at this point would contain one strand from each species, and from there it was a fairly straightforward matter to "melt" the strands to infer the number of good base pairs. The lower the melting temperature, the less compatibility between the two species because of the lower energy required to break the bonds.
In the case of chimps and humans, numerous studies through the years have shown that there is an incidence of 1.2 to 1.76 percent base substitutions. This means that these are areas along the helix where the bases (adenine, thymine, guanine, and cytosine) do not correspond and hence do not form a bond at that point. The problem with the old studies is that the methods did not recognize differences due to events of insertion and deletion that result in parts of the DNA being absent from the strands of one or the other species. These are different from the aforementioned substitutions. Such differences, called "indels," are readily recognized by comparing sequences, if one looks beyond the missing regions for the next regions that do match.
To accomplish the more complete survey, Britten wrote a Fortran program that did custom comparisons of strands of human and chimp DNA available from GenBank. With nearly 780,000 suitable base pairs available to him, Britten was able to better infer where the mismatches would actually be seen if an extremely long strand could be studied. Thus, the computer technique allowed Britten to look at several long strands of DNA with 780,000 potential base pairings.
As expected, he found a base substitution rate of about 1.4 percent-well in keeping with earlier reported results-but also an incidence of 3.9 percent divergence attributable to the presence of indels. Thus, he came up with the revised figure of 5 percent.[emphasis mine]
That really sounds to me like what I was saying: The 5% represents the total difference in base pair sequences, but it took a number of mutations equal to 1.4% of the total length to produce those differences.
Uh-oh... I think my math was off, too. 3 billion total bps x 1.4% mutations = 42 million mutations. 90 million gene-encoding bps x 1.4% = 1.26 million mutations. That is a lot, though much less than gore3000's 150 million mutations.
However that is just the numbers. I think you are right on one important part. He seems to be counting all of those mutations as favorable, when you point out that many of them, most even, could be neutral. I'd like to know what gore3000's reasoning is on that. It seems to me that there is no reason all of those changes have to be favorable.
So how fast do mutations, neutral or favorable, work their way into populations today? That should give us a measuring stick to see of 150,000 mutations can work their way into the human genome in ten million years. Perhaps it would be better to say "work their way into the genome of an isolated group like Icelanders" since human populations were much smaller during most of our history.
That would be one mutation (neutral or favorable) working its way into the whole population every 67 years. I wish someone who knows about the rate now would speak up here, but that sounds like a really, really really short time, don't you think? I mean, we don't breed like flies, it takes a while for mutations to be established, yes?
Let's see... 10 million years divided by 42 million mutations = 1 fixation every .238 years (3 months or so). But keep in mind that there are always many mutations at different locations in the genome working in parallel to get themselves fixed at the same time. How many? I have no idea, but if there were 1000 different alleles out there in the population at the same time that would mean an average allele would have 238 years in which to fixate for the numbers to work out. If there are 100,000 alleles then the average allele has 23,800 years to acheive fixation for the numbers to work out. (Did I state that clearly?)
As for how long it takes for an individual allele to achieve fixation, I don't know the exact numbers, but they do fixate more quickly in small populations than in large ones. (If there are 10 in the population, 1 has a new neutral mutation, & every breeding pair produces 2 offspring, then the mutation could represent 0%, 10%, or 20% of the next generation's population. In the 3rd generation I think it would represent 0%, 10%, 20%, 30%, or 40%.)
Another thing to ponder is that through most of humanity's history, we were divided into many small, somewhat isolated tribes that had relatively little gene flow between them. I'll bet that genetic drift was rampant for a long time, even when the total human population number was relatively large. It wasn't until a couple thousand years ago that we truly became one big population with lots of biracial children. ("Lots" as measured over several generations.) So the total amount of genetic change was probably higher thousands of years ago than is happening today.
So even with 42 million mutations between humans & chimps, I don't think it presents any problem.
First of all, the article you cited is not a new study. All it does is rework the what Britten did and make it sound more favorable towards evolutionary theory. The person, as I pointed out is an ideological hack who continues to tell the EVOLUTIONIST LIE that 97% of the DNA is junk. The article ahban cited on the brain - about this very DNA which this EVOLUTIONIST LIAR says scientist consider nonsense, shows what he is.
In fact, what modern biology has found is that it is not the genes but what evolutionist call 'junk dna' that is the most important part of our genome, it is what makes us tick and makes the genes work properly:
Within a single bacterial cell, genes are reversibly induced and repressed by transcriptional control in order to adjust the cells enzymatic machinery to its immediate nutritional and physical environment. Single-celled eukaryotes, such as yeasts, also possess many genes that are controlled in response to environmental variables (e.g., nutritional status, oxygen tension, and temperature). Even in the organs of higher animals --- for example, the mammalian liver --- some genes can respond reversibly to external stimuli such as noxious chemicals. ...
The most characteristic and exacting requirement of gene control in multicellular organisms is the execution of precise developmental decisions so that the right gene is activated in the right cell at the right time during development of the many different cell types that collectively form a multicellular organism. In most cases, once a developmental step has been taken by a cell, it is not reversed. Thus these decisions are fundamentally different from bacterial induction and repression. In executing their genetic programs, many differentiated cells (e.g., skin cells, red blood cells, lens cells of the eye, and antibody-producing cells) march down a pathway to final cell death, leaving no progeny behind. The fixed patterns of gene control leading to differentiation serve the needs of the whole organism and not the survival of an individual cell.
From: Regulation of transcription initiation
So what this hack says about 'junk dna' is total unscientific nonsense. Without the mechanisms set up by this 'junk DNA' the genes would not work at all, period. The organism would not function, period. What we see here is an evolutionist lying through his teeth trying to save a totally decrepit and false theory through lies.
You are misreading the article you cite:
The problem with the old studies is that the methods did not recognize differences due to events of insertion and deletion that result in parts of the DNA being absent from the strands of one or the other species.
What the above means is simply that because of deletions in each species, the strands selected did not align properly, hence a simple 'alphabetic' comparison of the sequences gave a wrong number. What Britten did, and the reason he revised the figures, is he properly aligned the strands according to what was the purpose of them. In this way he came up with the more accurate 5% number.
Now as to neutral mutations, they just cannot spread throughout a species - according to studies made by evolutionists themselves when they were trying to solve the problem posed by genetics. The basis of population genetics is the Hardy-Weinberg principle which says that in a stable population the genetic mix of the population will remain stable absent any genetic advantage of a particular genetic makeup. What this means is that a neutral mutation in a population of 1 million organisms will continue to be in only 1 millionth of the population if it is neutral. In fact it will likely dissappear completely due to chance (if you play a game at odds of 2 to 1 with two dollars long enough you will lose both dollars), so neutral mutations cannot be in any way responsible for these differences in any significant way.
Due to the above, yes, the differences are 5%. Yes, you need some 150 million mutations. Yes, mostly all of them have to be favorable to have survived.
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.