Precision Medicine News

New Data Repository Could Speed Precision Medicine for Cancer

The Proteomic Data Commons will allow researchers to analyze cancer data and rapidly develop precision medicine therapies.

New data repository could speed precision medicine for cancer

Source: Thinkstock

By Jessica Kent

- The National Cancer Institute (NCI) has launched the Proteomic Data Commons (PDC), a next-generation proteomic data repository that will facilitate data access, sharing, and analysis, and speed the development of precision medicine therapies for cancer.

Housed within NCI’s Cancer Research Data Commons (CRDC), PDC includes corresponding genomics and imaging data sets to enable integrative research. Cancer researchers can access this data for analyses and submit their own data sets to share with the research community.

In the past, researchers have analyzed data sets with separate pipelines. But with the PDC, multi-omics data is harmonized with a common set of analytic pipelines to make it easier to study the information.

“The PDC was developed to advance our understanding of how proteins help to shape the risk, diagnosis, development, progression, and treatment of cancer,” NCI stated. “In-depth analysis of proteomic data allows us to study both how and why cancer develops and to devise ways of personalizing treatment for patients using precision medicine.”

The PDC is one of several repositories within NCI’s CRDC, a secure, cloud-based infrastructure featuring diverse data sets and innovative analytic tools designed to advance data-driven discovery.

The CRDC provides access to data from several NCI programs, including the Cancer Genome Atlas and Therapeutically Applicable Research to Generate Effective Treatments (TARGET). The CRDC operates on the principle that repositories will be built with the input and collaboration of the broad research community, and that repositories will make data components expendable and reusable.

“The vision for the CRDC is a virtual, expandable infrastructure that provides secure access to many different data types across scientific domains, allowing users to analyze, share, and store results, leveraging the storage and elastic compute, or ability to easily scale resources, of the cloud,” NCI stated.

The addition of the PDC will build on CRDC’s offerings of other repositories, including the Genomic Data Commons (GDC), which enables data sharing across cancer genome studies, and the NCI Cloud Resources, which bring together data and computational power to enable cancer research.

Through the PDC, researchers will have access to highly curated and standardized biospecimen, clinical, and proteomic data, as well as an intuitive interface to filter, query, search, visualize, and download all data and metadata.

Additionally, the PDC features cloud-based infrastructure from Amazon Web Services (AWS), facilitating interoperability with AWS-based data analysis tools. An application programming interface (API) provides cloud-agnostic data access and allows third parties to extend the functionality beyond the PDC.

The new repository also offers a highly structured workspace that serves as a private user data store and a data submission portal.

By making these data sets available using modern computing and network technology, the PDC will make it possible for any researcher to ask new and fundamental questions about cancer. The data repository will also provide much-needed tools to accelerate research and the development of personalized treatments for individual patients.

“The ability to combine diverse data types and perform cross-domain analysis of large datasets can lead to new discoveries in cancer prevention, treatment and diagnosis, and supports the goals of precision medicine and the Cancer Moonshot,” NCI said.