Sabermetrics for Scientists

This post originally ran on the Computation Institute website.

The last decade has seen a statistical revolution in sports, where new, smarter measures of player performance in baseball, football, or soccer are replacing more traditional stats. Often known as “sabermetrics” in tribute to the Society for American Baseball Research, advanced statistics such as VORP, BABIP, and FIPS try to more accurately quantify a player’s performance, while forecasting tools such as PECOTA try to predict their future. While imperfect, these stats have given general managers new tools to decide which players to sign to long-term contracts and which to release.

The scientific community has its own measures of career performance, but the use of these figures in personnel decisions remains controversial. Decisions on hiring or tenure remain largely in the hands of committees, who judge applicants based on their CV, interviews, pedigree, or myriad other potentially subjective factors. Attempts to come up with more objective measures of scientific achievement are handicapped by disagreement over what factors make a “good” scientist and predict a successful career: is it number of publications, or citations, or something else entirely?

Stefano Allesina, Computation Institute faculty and assistant professor of ecology and evolution at the University of Chicago, was interested in whether there might be a better way to make these important decisions about scientists’ careers. To do so, for a study published this week in Nature, he first had to test whether a scientist’s future is predictable based on his or her accomplishments to date.

“If the future is unpredictable, we may as well pick CVs and hire people at random. We’re wasting our time in these hiring committees,” Allesina said. “But if the future is quite predictable, then we just need to read the signs and figure out which factors influence your future potential the most.”

Allesina and collaborators Daniel Acuna and Konrad Kording of Northwestern University were blessed with an abundance of open data to conduct their test. Via the, they found the names and academic relationships of over 34,000 neuroscientists, who they could then associate with their publication history via the scientific journal database Scopus. The researchers then used machine-learning techniques to find the best formula of using that data to “predict the past,” as Allesina put it.

The trick is to turn back the clock on scientists’ lives to see if the CV data available early in their career – say, during their post-doctoral period – could be used to predict how their career would look one, five, or ten years later. As a read-out, they used one popular measure of scientist performance: the h-index, a statistic developed by physicist Jorge Hirsch that takes into account both number of publications and citations. An h-index of 12 (informally considered to be a good number for a physicist to achieve a tenured position) means that a scientist has published at least 12 papers that have been cited at least 12 times each. To put that number in perspective, Albert Einstein had an h-index of 96, while Richard Feynman’s was 53.

“We wanted to keep it as simple as possible and focus on finding out which quantities were important,” Allesina said. “The problem is that there’s no measure for success, so we focused on something that was very easy to measure and quite accepted in the field, this h-index.”

The experiments started with all of the factors the researchers could find for each scientist, everything from number of publications and citations to how many journals they published in, the prestige of those journals, time spent in graduate school, the h-index of their advisor, the prestige of their institution, amount of funding received, student evaluations onRateMyProfessors, and so on. The algorithm then selected the factors that best predicted a scientist’s future h-index, dropping those that were less predictive or highly correlated with other factors.

After many rounds of model selection, the researchers were left with a simple formula using five factors: current h-index, number of published articles, years since first article, number of articles in the most prestigious journals, and number of distinct journals published in (you can play with the model here). That combination suggests it’s less who you know or where you work than the type of science you do that predicts a successful future.

“I think that this is a very reassuring study in a way,” Allesina said. “The things that we value the most are in fact the things that matter the most. And your future is quite predicable; we all get rejections and bad experiences here and there, but in the long run, if you are a good scientist, you’ll do fine.”

The model’s forecasts were much more accurate than both the 5-year and 10-year predictions made with the h-index alone. But when applied outside of neuroscience, to scientists studying fruit fly genetics or evolution, the model was slightly less accurate, suggesting that the details of the formula would need to be adjusted for different scientific fields.

While the model may be simple, and the authors insist that peer review is still the best way of making hiring, promotion, or funding decisions, boiling down a scientific career to an accurate and objective number has its advantages. Allesina said that the number could be used to judge where racial, gender, or other biases exist within the system, and offer a “blinded” score by which to avoid these biases.

For funding purposes, basing decisions on an index that predicts a scientist’s future could lead to more ambitious science. Instead of choosing “safe” grant proposals that are more likely to work, scientists with high potential could be chosen and funded to pursue riskier ideas.

“Let’s try to find good people, give them money with no strings attached, tell them to do whatever you want with it, and we see what happens,” Allesina said. “I think that’s a great idea, but then you have to have a way to select which ones are the good candidates for this kind of strategy. The number of scientists is amazingly high, so you’re really looking for needles in a haystack.”


Acuna DE, Allesina S, Kording KP. (2012). Future impact: Predicting scientific success Nature DOI: 10.1038/489201a

About Rob Mitchum (525 Articles)
Rob Mitchum is communications manager at the Computation Institute, a joint initiative between The University of Chicago and Argonne National Laboratory.
%d bloggers like this: