Taking Cancer Data to the Cloud


Until now, researchers authorized by the National Institutes of Health (NIH) to analyze their Cancer Genome Atlas (TCGA) had to set up a secure, compliant computing environment capable of managing and analyzing terabytes of data, download the data — which could take weeks — and then install the appropriate tools needed to perform the desired analysis.

Now, the University of Chicago is launching the first secure cloud-based computing system that will enable researchers to access and analyze human genomic cancer information without the costly and cumbersome infrastructure normally needed to download and store massive amounts of data.

The Bionimbus Protected Data Cloud, which is the only NIH-approved cloud-based system for TCGA data, will be equipped with the most commonly used query pipelines and will allow researchers to focus solely on the analysis of large-scale cancer genome sequencing, which experts believe can unlock paths to appropriate treatment, early detection and prevention of cancer.

“Our hope is that the Bionimbus environment will help democratize access to cancer genomics data so that more researchers can fruitfully work with large datasets to understand genomic variations that seem to be one of the keys to the precise diagnosis and treatment of cancer,” said Dr. Robert Grossman, principal investigator of the Bionimbus project and professor of medicine at the University of Chicago Medicine.

The Bionimbus Protected Data Cloud continues to add to its current stable of the most widely used sets of cancer DNA from TCGA, including breast, ovarian and prostate.

TCGA is a comprehensive project to improve the understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing. TCGA contains data from more than 6,000 cancer patients, spanning 20 different types of cancer. The TCGA is a collaboration between the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI), both part of the NIH.

“The Bionimbus Protected Data Cloud provides cancer researchers a simple way to analyze TCGA data without having to become experts at managing big data,” said Kenna Shaw, director of the TCGA Program Office.

Dr. Megan McNerney, instructor of pathology at University of Chicago, used Bionimbus to analyze data that led to her discovery that gene CUX1, which acts as a tumor suppressor, is frequently inactivated in acute myeloid leukemia.

“Bionimbus was critical for my work, as it was used it for all aspects of the project, including secure storage of protected data, quality control of next-generation sequencing results, alignments, expression analysis, and algorithm development,” she said. “The strength of Bionimbus, however, is the support that is provided for end users, which enabled both expert and non-expert team members to use the cloud.”

The cloud technology for the Bionimbus Protected Data Cloud was developed in part by the Open Science Data Cloud, a National Science Foundation-supported project that is developing cloud infrastructure to manage, analyze and share large scientific datasets.

%d bloggers like this: