Biology used to be the scientific discipline where data was at a premium, a rare resource painstakingly collected in the field or the laboratory. But today’s biologists are confronted with a flood of data, a fire-hose torrent of genetic and clinical information that only builds with the spread of fast sequencing and electronic medical records. But as these databases fill terabyte after terabyte of computer storage, the successful transformation of that data into practical information about human biology and disease has lagged behind. Genome-wide association studies (GWAS) have explained only a small percentage of disease heritability, clinical records remain largely unstudied on a large scale, and the complications created by environmental influences and multi-gene disorders have frustrated scientists.
Into this impasse comes a new multi-institutional project based at the University of Chicago: the Silvio O. Conte Center, funded by a nearly $14 million combination of grants from the National Institute of Mental Health and the Chicago Biomedical Consortium. Led by Andrey Rzhetsky, professor of medicine and human genetics at the Medical Center, the collaboration of 15 scientists from 7 institutions will apply the power of advanced computation and data-mining to the growing tide of data collected about neuropsychiatric disorders. The trick will be to not just focus on one database, be it genetics or environmental factors or clinical outcomes, but all of them at once, creating a higher-resolution image of what goes awry in the brain to cause mental disease.
“A great deal of data already exists, yet nobody is already looking at it the way we plan to do and we have very smart people on this team,” said Rzhetsky, who is also a senior fellow of the Computation Institute at the University of Chicago and Institute for Genomics and Systems Biology. “When you have multiple communities that partially study the same subject you can get a kind of three-dimensional picture of a phenomenon.”
Rzhetsky has previously demonstrated the promise of data-mining – the discovery of patterns and information in large pools of data – using clinical records and scientific literature. In a 2007 study, his team examined 1.5 million patient records and found significant overlap between mental disorders such as schizophrenia, bipolar disorder, and autism, suggesting a similar overlap of the genetic factors that cause these conditions. Two years later, Rzhetsky and colleagues applied text-mining computation to the scientific literature database PubMed, creating a network of genes and biological interactions associated with cerebellar conditions such as ataxia and degeneration.
Beyond demonstrating the potential of data-mining, those studies also shed light on the hazy borders separating different psychiatric disorders. While the overlaps could complicate psychiatric diagnosis in the clinic, they might also make the disorders susceptible to the multi-faceted approach proposed by the Conte Center.
“Most studies are done one disorder at a time, and that’s like studying the trunk or the hoof or the tail of an elephant; you might miss the big picture,” said Benjamin Lahey, Irving B. Harris Professor of epidemiology at the University of Chicago and a co-investigator at the Conte Center. “This project will enable us to look at things in a way that has never been done before, at a scale that dwarfs anything that’s ever been done.”
Most of the work will take place on databases that already exist, including GWAS data, clinical records data “warehouses” containing information about millions of patients, the PharmGKB database at Stanford that collects side effect data for therapeutic drugs, gene expression data, and text mining of the scientific literature. Applying computational methods to this ocean of data will hopefully produce novel hypotheses about the biological mechanisms underlying neuropsychiatric disorders, which can then be tested by collaborators at the Institute for Genomics and Systems Biology and other institutions.
“We are taking a very ‘data driven’ perspective on neuropsychiatric disease,” said Russ Altman, professor of bioengineering, genetics, and medicine at Stanford University and another investigator on the project. “We do not come in with many pre-conceived theories, and so we may create disruptive and interesting hypotheses about the molecular mechanisms and genetic modulators of these diseases. We may also come up with novel polypharmacuetical ways to treat these disorders based on our data mining approaches.”
If it works, it could create a new model of science applicable to almost any disorder or phenotype where such massive data collection has occurred. Heart disease, diabetes, obesity, even human behaviors could potentially be analyzed in the same manner, Rzhetsky said. With a large enough pool of data to mine – and the right team of talented data-miners – the flood of biological data can be sculpted into a canal of clinically relevant science.
“We definitely have one of the strongest genomics groups in the country, we have probably one of the strongest statistical genetics groups, and we have excellent world-renowned experts in phenotypes,” Rzhetsky said. “It’s exciting because there is potential, but now we have to work hard to get there.”