New Machine Learning Approach Supports Patient Data Privacy

By Jessica Kent

July 29, 2020 - A new technique enables clinicians to train machine learning models while preserving patient data privacy and could advance the field of brain imaging, according to a study published in Scientific Reports.

To train machine learning models, researchers typically have to use large, diverse datasets from a variety of organizations. However, this can be challenging as hospitals and health systems are often resistant to sharing patient data due to legal, privacy, and cultural obstacles.

An emerging method called federated learning could be the solution to this issue, researchers stated. The approach trains an algorithm across multiple decentralized devices or servers holding local data samples without exchanging them.

"The more data the computational model sees, the better it learns the problem, and the better it can address the question that it was designed to answer," said senior author Spyridon Bakas, PhD, an instructor of Radiology and Pathology & Laboratory Medicine in the Perelman School of Medicine at the University of Pennsylvania.

"Traditionally, machine learning has used data from a single institution, and then it became apparent that those models do not perform or generalize well on data from other institutions."

To test the effectiveness of the federated learning approach, researchers from Penn Medicine collaborated with teams from the University of Texas MD Anderson Cancer Center, Washington University, and the Hillman Cancer Center at the University of Pittsburgh.

The study started with a model that was pre-trained on multi-institutional data from an open-source repository known as the International Brain Tumor Segmentation (BraTS) challenge. BraTS currently provides a dataset that includes more than 2,600 brain scans captured with MRI from 661 patients.

Next, ten hospitals participated in the study by training artificial intelligence models with their own patient data. Researchers then used the federated learning technique to aggregate the data and create the consensus model.

Researchers compared federated learning to models trained by single institutions, and to other collaborative-learning approaches. The effectiveness of each method was measured by testing them against scans that were annotated manually by neurologists.

When compared to a model trained with centralized data that did not protect patient privacy, federated learning was able to perform almost identically. The findings also suggested that increased access to data through data private, multi-institutional collaborations can benefit model performance.

While the approach could be used to answer many different medical questions, the method has the potential to analyze MRI scans of brain tumor patients and distinguish healthy brain tissue from cancerous regions.

The federated learning model will need to be validated and approved by the FDA before it can be licensed and commercialized as a clinical tool for physicians. The model can help radiologists and oncologists make important decisions about patient care, researchers noted.

"Studies have shown that, when it comes to tumor boundaries, not only can different physicians have different opinions, but the same physician assessing the same scan can see different tumor boundary definition on one day of the week versus the next," said Bakas. "Artificial intelligence allows a physician to have more precise information about where a tumor ends, which directly affects a patient's treatment and prognosis."

The results from this study have led to a much larger collaboration between Penn Medicine, Intel, and 30 partner institutions, supported by a $1.2 million grant from the National Cancer Institute (NCI). The institutions will use the federated learning approach to train a consensus AI model on brain tumor data. The ultimate goal of the project will be to create an open-source tool for any clinician at any hospital to use.

The findings from this study and the larger federated learning project open up possibilities for further use of AI in healthcare.

"Radiomics is to radiology what genomics was to pathology. AI will revolutionize this field, because, right now, as a radiologist, most of what we do is descriptive. With deep learning, we're able to extract information that is hidden in this layer of digitized images,” said Rivka Colen, MD, an associate professor of Radiology at the University of Pittsburgh School of Medicine.

Analytics in Action News

New Machine Learning Approach Supports Patient Data Privacy

A machine learning method allows researchers to train an algorithm across multiple datasets without exchanging them, supporting patient data privacy.

Next in Analytics in Action