10,000 genomes.

Lots of genomes going to be sequenced. Some of the members of the group are colleagues at Guelph. Very cool. That is all.

Genome 10K: A Proposal to Obtain Whole-Genome Sequence for 10 000 Vertebrate Species

Genome 10K Community of Scientists

The human genome project has been recently complemented by whole-genome assessment sequence of 32 mammals and 24 nonmammalian vertebrate species suitable for comparative genomic analyses. Here we anticipate a precipitous drop in costs and increase in sequencing efficiency, with concomitant development of improved annotation technology and, therefore, propose to create a collection of tissue and DNA specimens for 10 000 vertebrate species specifically designated for whole-genome sequencing in the very near future. For this purpose, we, the Genome 10K Community of Scientists (G10KCOS), will assemble and allocate a biospecimen collection of some 16 203 representative vertebrate species spanning evolutionary diversity across living mammals, birds, nonavian reptiles, amphibians, and fishes (ca. 60 000 living species). In this proposal, we present precise counts for these 16 203 individual species with specimens presently tagged and stipulated for DNA sequencing by the G10KCOS. DNA sequencing has ushered in a new era of investigation in the biological sciences, allowing us to embark for the first time on a truly comprehensive study of vertebrate evolution, the results of which will touch nearly every aspect of vertebrate biological enquiry.




In which Dr. Eisen gets scooped.

Jonathan Eisen, of Tree of Life, has an excellent feature called the “Overselling Genomics Award”. Here, I am gonna scoop him and hand out something similar, at least based on the heading.

A genome may reduce your carbon footprint

This somewhat rhetorical title must excite many scientists, particularly those with ongoing research on biomass, feedstock development, and lignocellulosic breakdown/fermention. With the costs of sequencing rapidly decreasing, and with the infrastructure now developed for almost anyone with access to a computer to cheaply store, access, and analyze sequence information, emphasis will increasingly be placed on ways to apply genome data to real world problems such as reducing dependency on fossil fuel. For the efficient production of bioenergy, this may be accomplished through development of improved feedstocks. This article will consider more closely the impact of very cheap sequence data (approximately 1USD per genome) on improvement of switchgrass (Panicum virgatum L.), a perennial grass well suited to biomass production.

Wow, $1 per genome? That would be sweet!

Lower and basal.

The story:

New Tree Of Life Divides All Lower Metazoans From Higher Animals, Molecular Research Confirms

The response:

“It is absurd to talk of one animal being higher than another” (Charles Darwin, 1837)

More information:

Understanding evolutionary trees

(By the way, Rob DeSalle, who is quoted in the story, was one of my postdoc advisors and he definitely understands phylogenies — but the story is pretty sloppy nonetheless)

Generic genome sequence press release (by Andy).

This comment by Andy was too good not to repost.

Generic press release for genome sequencing

Scientists map genome of (insert name).

A team of researchers from (insert university/institute/lockup garage) has completed mapping the genome of (animal/plant/squashy deep-sea thing).

“We were amazed how (strike one) similar/dissimilar it is to the human genome,” said (insert name of lead scientist/grad student/custodian who happened to answer the phone).

The discovery should help scientists (strike all but one) cure cancer/end world hunger/prevent hair loss).

Science by press release.

With apologies to Jonathan Eisen for encroaching on his annoyance specialty, here is yet another case of science via press release.

Big hop forward: Scientists map kangaroo’s DNA

Taking a big hop forward in marsupial research, scientists say they have unraveled the DNA of a small kangaroo named Matilda. And they’ve found the Aussie icon has more in common with humans than scientists had thought. The kangaroo last shared a common ancestor with humans 150 million years ago.

“We’ve been surprised at how similar the genomes are,” said Jenny Graves, director of the government-backed research effort. “Great chunks of the genome are virtually identical.”

The scientists also discovered 14 previously unknown genes in the kangaroo and suspect the same ones are also in humans, Graves said.

The animal whose DNA was decoded is a small kangaroo known as a Tammar wallaby and named Matilda. Researchers working with the government-funded Centre of Excellence for Kangaroo Genomics sequenced Matilda’s DNA last year. Last week, they finished putting the pieces of the sequence together to form a genetic map. The group plans to publish the research next year, Graves said.

Scientists have already untangled the DNA of around two dozen mammals, including mice and chimps, which are closer to humans on the evolutionary timeline. But Graves said it’s the kangaroo’s distance from people that make its genetic map helpful in understanding how humans evolved.

By lining up the genomes of different species, scientists can spot genes they never knew existed and figure out what DNA features have stayed the same or changed over time. Elements that have remained the same are usually important, Graves said.

The research is an important step in the understanding of genomes in general, said geneticist Bill Sherman, an associate professor of molecular ecology and conservation biology at the University of New South Wales.

But another genetic researcher was more skeptical of the project’s significance.

“If you are in Australia and you want to show that you are a major player in genomics, then it’s important,” said Penn State University biology and computer science professor Webb Miller. “But two guys in their garage are going to sequence another marsupial very soon.”

Those “two guys” are Miller and Penn State colleague Stephan Schuster, who are working on a shoestring budget to map the genome of the Tasmanian devil, which is in danger of extinction because of a contagious facial tumor disease. Miller and Schuster said their project could lead to a way to keep the species alive.

But check out the last line for the biggest problem in the story.

This isn’t the first time Australia’s unique wildlife has provided evolutionary clues. Earlier this year, scientists mapped the DNA of a platypus and found that it crosses different classifications of animals.

No. They. Did. Not.

1,000 genomes on the way (sort of).

ScienceNOW and ScienceDaily are reporting the announcement of the 1000 Genomes Project, which will be supported by agencies in the UK, China, the US, and elsewhere. It will include analyses of the genomes of 1000 individual humans, and will build upon the International HapMap Project.

ScienceDaily describes the early phases of the project:

In the first phase of the 1000 Genomes Project, lasting about a year, researchers will conduct three pilots. The results of the pilots will be used to decide how to most efficiently and cost effectively produce the project’s detailed map of human genetic variation.

The first pilot will involve sequencing the genomes of two nuclear families (both parents and an adult child) at deep coverage that averages 20 passes of each genome. This will provide a comprehensive dataset from six people that will help the project figure out how to identify variants using the new sequencing platforms, and serve as a basis for comparison for other parts of the effort.

The second pilot will involve sequencing the genomes of 180 people at low coverage that averages two passes of each genome. This will test the ability to use low-coverage data from new sequencing platforms to identify sequence variants and to put them in their genomic context.

The third pilot will involve sequencing the coding regions, called exons, of about 1,000 genes in about 1,000 people. This is aimed at exploring how best to obtain an even more detailed catalog in the approximately 2 percent of the genome that is comprised of protein-coding genes.

So, it’s really six “complete” sequences like those available for Jim Watson and Craig Venter, plus low-redundancy coverage for 180 additional people. Then the rest are subsets of genes (1,000 of around 20,000) from 1,000 people, or about 0.075% of the genome.

Though it isn’t 1000 genomes sensu stricto, it is definitely a very exciting project.

Are you a cat genome person or a dog genome person?

The most recent issue of Genome Research contains a report of the cat genome sequence (Pontius et al. 2007), adding Felis catus to the rapidly growing collection of animal genome sequences. One of the reasons that the number of mammal sequences is increasing so quickly is that there have been reduced standards for sequence coverage. To wit, the cat is one of 24 mammal species approved by NHGRI for “low redundancy” sequencing, meaning that the sequence will be covered only 2-fold (vs. up to 7x coverage in dog, chimp, human, mouse, and rat). Moreover, in this report, only 60% of the euchromatic DNA was actually sequenced (and nevermind the heterochromatin). Seventeen of these low redundancy genomes have already been released, as noted in the table from Green (2007). This leaves many gaps in the sequence, but the rationale is that having incomplete genomes from many species can be at least as informative as having more thorough sequences from only a few species.

In the trade-off between breadth vs. depth — or phylogenetic diversity vs. individual resolution — this leans more towards the former. Of course, this does not preclude improving coverage later, and in fact many of the 2x genomes are already being sequenced to a higher redundancy.


Of the greatest interest to me, about 32% of the available cat sequence is made up of transposable elements, mostly LINEs and SINEs as in other mammals. The percentage might be higher overall since much of the non-coding portion of the genome was not sequenced in the cat. Not having this information is one of the downsides of low coverage. On the other hand, the TE content looks to be very similar to dog anyway, so this is useful information that would not be available yet if we had to insist on 7x coverage for every species.

Speaking of the dog genome, it bears noting that a survey sequence of only about 25% of the genome at 1.5x coverage was released in 2003 (Kirkness et al. 2003). This initial sequence (from Craig Venter’s poodle Shadow) was followed by work from a different set of authors who released a complete dog genome (7.5x coverage) in 2005 (Lindblad-Toh et al. 2005). So again, releasing a partial sequence certainly does not stop a more detailed coverage from being done down the line.

In an ideal world we might have high redundancy, totally complete (not just euchromatic), fully annotated, completely accurate genome sequences from multiple individuals from thousands of species — but that isn’t reality for the time being.

Given such constraints, do you think we should have incomplete data from lots of species, or high depth information from a few species? In other words, are you a cat genome person or a dog genome person?


ps: You’ll note that I resisted the temptation to post pictures of my own cats — you’re welcome.