How to Recycle Cancer GWAS Data

500px-symbol_recycling_votesvgIn the 2000s, a new kind of genetic experiment was born: the genome-wide association study, or GWAS. If geneticists could recruit enough people with a particular disease and compare them to an equal number of disease-free controls, they believed GWAS would point the way to common gene variants associated with disease risk and novel biological pathways. One of the strengths of GWAS was that it was hypothesis-free, an unbiased comparison that could reveal surprising risk-associated genes that had not occurred to scientists in the past. More than 1,000 GWAS studies have been conducted to date, on diseases ranging from diabetes to Parkinson’s disease to Crohn’s disease to various types of cancer.

While these studies have identified thousands of gene variants (called single nucleotide polymorphisms, or SNPs) associated with disease risk, they can still only explain a small fraction of the heritability of disease. Some scientists have thus moved on from GWAS to the next wave of genetic studies, including whole-genome sequencing to look for rare variants and gene-environment interaction studies. But some geneticists think the field may be moving too quickly onto the next big thing, and that there remains value in the volumes of GWAS data collected over the last decade. A second generation of GWAS is taking place, where the data from the first round is approached in new ways to find previously hidden gems of information.

In two recent studies, assistant professor of health studies Brandon Pierce applied this Reuse/Recycle/Reduce philosophy to GWAS data on pancreatic cancer risk, a disease where genetic and biological explanations are particularly lacking. For both experiments, Pierce bended the “hypothesis-free” rule of GWAS in order to narrow the field of gene variant candidates and allow for a more selective scan of pre-existing data. By reducing the number of candidates from the ~550,000 of a full GWAS, the statistical threshold for confirming a SNP association with risk can be set lower. If the original GWAS experiments were the equivalent of looking for a needle in a haystack, the new techniques are a much less daunting task, he said.

“You conduct fewer tests, so the haystack is smaller,” Pierce said. “In all of the tests you are conducting, you know the SNPs are biologically meaningful, whereas in a typical GWAS, a large percentage of the SNPs may have very little to do with human biology.”

In the first study, published in March in Cancer Causes & Control, Pierce adapted a connection discovered by epidemiology studies to his genetic scan. Patients with type 2 diabetes were measured to have elevated risk for pancreatic cancer – a logical relationship given that diabetes is primarily a disease of the pancreas. Pierce took 37  SNPs associated with type 2 diabetes and tested them in the GWAS data collected by a previous study of pancreatic cancer. None of the SNPs tested showed a strong association with pancreatic cancer, though two new gene variants produced suggestive evidence of an association. The results suggested that the biological link between type 2 diabetes and pancreatic cancer may not be as strong as the epidemiology data indicated.

“We didn’t find any major associations that popped out at us from the diabetes study, so the conclusion was that these established genes for type 2 diabetes don’t seem to have a big effect on pancreatic cancer risk,” Pierce said.

But a second study, published in Cancer Research, would lead Pierce almost full circle. This time around, he ran the pancreatic cancer GWAS data through what he dubbed a “pleiotropy scan,” testing only SNPs previously demonstrated to have a biological effect in humans. For many of the more than half-million SNPs typically tested in a GWAS, scientists have yet to discover a linkage to any disease or biological effect, suggesting that these markers may sit without effect in the long gaps between protein-encoding genes in human DNA. Like the first study, limiting his GWAS tests to only these SNPs (1,087 in this case) allowed Pierce to pick up more subtle associations than in a full-blown GWAS.

With this test, Pierce scored a hit: a SNP in the gene HNF1A that had never before been identified with increased risk for pancreatic cancer. Previously, SNPs in this gene were implicated in levels of blood and liver enzymes, cholesterol, heart disease…and diabetes (it wasn’t detected in the first study because it is a different SNP in the same HNF1A gene). Because so many biological effects of HNF1A are already known, studies of how it might influence the risk of pancreatic cancer risk already have a head start, said Habibul Ahsan, director of the Center for Cancer Epidemiology and Prevention and senior author of both studies.

“Because it’s a gene that has known biology already, the path to discovering the biological underlying mechanism for the association between the gene and pancreatic cancer will be faster to elucidate compared to other GWAS signals,” Ahsan said. “We already know a lot about HNF1A, so we are already ahead in the knowledge of what this gene does and its other associated pathologies. So the biological leap from this GWAS finding to some real clinical implications will  be faster and easier.”

Fueled by the success with the pleiotropy scan, Pierce already has ideas for more experiments in the same vein, looking for new SNPs in prostate and breast cancer. In places where the data is already gathered and shared, he said, anyone can perform a similar data-mining project to look for genes that may have been missed at first glance. In the second generation of GWAS, the studies show, a little recycling can go a long way.


Pierce BL, Austin MA, & Ahsan H (2011). Association study of type 2 diabetes genetic susceptibility variants and risk of pancreatic cancer: an analysis of PanScan-I data. Cancer causes & control : CCC, 22 (6), 877-83 PMID: 21445555

Pierce BL, & Ahsan H (2011). Genome-wide “Pleiotropy Scan” Identifies HNF1A Region as a Novel Pancreatic Cancer Susceptibility Locus. Cancer research PMID: 21498636

About Rob Mitchum (526 Articles)

Rob Mitchum is communications manager at the Computation Institute, a joint initiative between The University of Chicago and Argonne National Laboratory.

%d bloggers like this: