The public perception of science in action typically involves a person in a white coat pouring brightly-colored fluids in and out of test tubes. Sure, a little bit of that does go on in a laboratory. But before the glassware is broken out, a lot of less glamorous stuff has to happen. Every experiment is built on a foundation of reading – devouring the scientific literature that came before you and keeping up with the pile of new journals from your field that arrive on a weekly basis.
Once upon a time – say the 17th century – it was possible for a scientist to keep up with the flow of science because there were only a couple dozen of people contributing to it. Today, University of Chicago sociologist James Evans and professor of medicine Andrey Rzhetsky estimate, a cancer biologist is confronted with millions of relevant journal articles within their field, with thousands more added every month. Obviously, there’s no way for a mere mortal to keep up with that kind of reading list, much less keep an eye on other fields to inspire creative ideas.
In last week’s issue of Science, Evans and Rzhetsky suggest a potential substitute to handle this impossible task: the computer. Appearing the same day as Gary An’s pitch for using computer models to test hypotheses, Evans and Rzhetsky’s piece is an argument for using computers to generate those hypotheses by boiling down the unreadable avalanche of scientific literature into promising questions.
“During Newton’s time, a scientist could read everything that was published, at least in English,” Rzhetsky, a senior fellow of the University’s Computation Institute, told Wired’s Brandon Keim in an interview. “That’s just not an option anymore. We can’t deal with all this information.”
But the answer is not to just simply plug data into a computer with zero context and hope for promising treatments to pop out the other end, Rzhetsky told Keim, deploying an excellent film reference as a metaphor.
“In the movie Memento, a man has only a short-term memory. Every 15 minutes has to reconstruct causal relationships. He observes people talking to him, and doesn’t know who’s a friend and who’s a foe. That’s my metaphor for abandoning hypothesis and context,” Rzhetsky said.
Instead, the Science article proposes that a computerized super-reader could not just pinpoint new hypotheses within a field, but also make exciting connections between fields that may not normally intersect. As an example, Evans and Rzhetsky cite the ABC model of University of Chicago professor emeritus Don Swanson: when one field studies the connection between concept A and B and another studies B and C, the connection between A and C may hold unappreciated promise. If you can get the terminology to match up between fields (a big if, the authors admit), a computer could find the links in the literature between biochemistry and cancer biology that overworked biochemists and cancer biologists might have missed.
The value of this theory has already been demonstrated by Rzhetsky, in a paper published last November in PLoS Computational Biology. Working with the laboratory of Kathleen Millen, who studies brain development, Rzhetsky and colleagues built a data set from more than 300,000 papers and 8 million abstracts on ataxia (a failure of muscle coordination) in mice and humans. By condensing that mountain of scientific work into two publicly available networks of molecular interactions (one for mice, one for humans), the authors were able to reveal novel genes associated with brain defects that have yet to be explored in the laboratory.
“That’s something no human curator, or even a group of human curators, could ever do,” Rzhetsky told Wired. “In a program, it’s possible.”
(The Guardian also covered the Science paper and polled other experts in the field about the current and future plausibility of “automatic hypothesis generation.”)
Evans, J., & Rzhetsky, A. (2010). Machine Science Science, 329 (5990), 399-400 DOI: 10.1126/science.1189416