On Christmas morning this year, most people hoped to find an iPad, a puppy, or a luxury car wrapped in a giant red bow under their tree. But geneticists received their present a day early, in the form of two landmark papers published on Christmas Eve in the journal Science. The two extremely dense data sets described in the articles represent new knowledge about gene expression in two seemingly obscure animals: a roundworm and a fruit fly. But the information gained in these massive, multi-institutional projects represents the next important step after the Human Genome Project toward understanding the connection between genes and human health.
Drosophila melanogaster, a species of fruit fly, and Caenorhabditis elegans, a kind of roundworm, are two of the unlikeliest heroes in science. Both are frequently used model organisms used in laboratories around the world, prized for their short reproductive cycle and relatively small, easily manipulated genomes. Much of what we know about the mechanisms of evolution and development have been gained from studies of the fly and the worm, and the Human Genome Project was built on technologies first used on these tiny critters.
Now that scientist are beginning to search beyond the 0.5 percent of DNA that encodes for proteins, Drosophila and C. elegans are once again called upon to be pioneers. The ModENCODE consortium, short for model organism Encycolpedia Of DNA Elements, is building a library of gene expression and interaction in these species to get a better handle on the dynamics of gene function.
“These efforts in model organisms pave the way for similar annotations of the human genome,” said Kevin White, professor of human genetics and ecology & evolution at the University of Chicago, and one of the leaders of the Drosophila side of the modENCODE project.
Figuring out the billions of A, C, G, and T nucleotides that make up an organism’s genome is only the first step in understanding genetic function – the equivalent of a recipe that only lists the ingredients without giving any further instruction. The same set of genes controls the proper development and function of hundreds of different cell types, as different as the specialized cells of the retina and the strong cells of bone and muscle. To accomplish this is a matter of timing, with genetic regulators (themselves moderated by genes) turning on the right genes at the right times.
When you consider that there are 22,000 genes in C. elegans and 17,000 genes in Drosophila, figuring out which genes are turned on when is no modest undertaking. Hence the project is as much computational as biological, as enormous data sets are shaped into networks revealing the intricate genetic choreography cells use to control themselves based on internal and external signals.
“An animal cell behaves as though it contains a tiny computer, assessing the many signals that it receives from its neighborhood and then deciding whether to maintain itself unchanged (its usual fate), grow and divide, or kill itself for the good of the entire cell collective,” Science editor-in-chief Bruce Alberts writes in an accompanying editorial. “Powerful techniques such as those used in these two landmark studies can provide us with lists of all the molecules involved.”
Those lists and networks are not easily described in a scientific article, never mind a humble science blog (as you can see by the figure at right). New genes encoding proteins were discovered, and new non-protein regions – described as “the dark matter of the genome” by commenter Mark Blaxter – were characterized for their regulatory ability. One interesting discovery made in both the fly and worm projects is that cells use relatively simple hierarchies of regulatory factors to perform a multitude of functions, in a sort of “Kevin Bacon game” of gene expression.
“In D. melanogaster each regulator is, on average, only two (and no more than five) links away from any other,” Blaxter writes.
The descriptive information provided by modENCODE will set up further exploration by scientists interested in particular pieces of the gene program. To enable such research, all the data is available online, in an open-source depository that scientists can use to develop their own experiments. And because many of the regulatory systems of flatworms and fruit flies are preserved in humans, such research will also offer insights to human development and health, informing the ENCODE project proper and the eventual future of genetic-based medicine. So the modENCODE data was a Christmas present for all of us…even if you didn’t know you wanted it.