- Maybe it takes one to know one – and maybe it takes one to train one.
Researchers at the Mayo Clinic, NVIDIA, and the MGH & BWH Center for Clinical Data Science are exploring how to use MRI images generated by artificial intelligence to train a deep learning model designed to identify clinical abnormalities in imaging data.
The research is aimed at overcoming the perpetual challenge of accessing enough high-quality, variable data to sufficiently train AI algorithms in complex diagnostic tasks, said the team in a research paper accompanying a blog post on NVIDIA’s corporate website.
“Data diversity is critical to success when training deep learning models,” the researchers said. “Medical imaging data sets are often imbalanced as pathologic findings are generally rare, which introduces significant challenges when training deep learning models.”
“We propose a method to generate synthetic abnormal MRI images with brain tumors by training a generative adversarial network (GAN).”
GANs are an established method of “filling in the gaps” in images or creating new types of images from samples of existing data.
The strategy can leverage existing images of the brain’s anatomy alongside snapshots of tumors from real patients to create completely new images based on patterns observed in the originals.
The result is a computer-generated dataset that carries no privacy concerns – none of the new images are of actual patients – and allows for nearly infinite variations of tumor size, placement, and type.
“This offers an automatable, low-cost source of diverse data that can be used to supplement the training set,” the researchers explained.
“For example, we can alter a tumor’s size, change its location, or place a tumor in an otherwise healthy brain, to systematically have the image and the corresponding annotation.”
To create the training dataset, the research team used two publicly available resources: the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and the Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). The public data provided the original, labeled and annotated images necessary for the project.
“The GAN trained to generate synthetic images from labels allows for the generation of arbitrary multi-series abnormal brain MRIs,” said the team. “Since we have the brain anatomy label and tumor label separately, we can alter either the tumor label or the brain label to get synthetic images with the characteristics we desire.”
While the resulting images did contain some features that made it possible to distinguish the AI-generated data from the original MRI data, the team believes that expanding the training dataset could help to produce synthetic images that are nearly identical to real-life clinical data.
“More attention likely needs to be paid for the tumor boundaries so it does not look superimposed and discrete when synthetic tumor is placed,” the team observed. “Also, performance of brain segmentation algorithm and its ability to generalize across different data sets needs to be examined to obtain higher quality synthetic images combining data sets from different patient population.”
Continuing to develop the strategy and improve the quality of computer-generated images may help the AI research community overcome the significant patient privacy concerns that accompany the analysis of personal health information (PHI).
“Protection of PHI is a critical aspect of working with patient data,” the researchers acknowledged. “Oftentimes concern over dissemination of patient data restricts the data availability to the research community, hindering development of the field.”
“While removing all DICOM metadata and skull-stripping will often eliminate nearly all identifiable information, demonstrably proving this to a hospital’s data sharing committee is near impossible. Simply de-identifying the data is insufficient.”
Research has shown that private data can still be extracted from a trained model, they added, contributing to privacy concerns that carry both financial and reputational penalties for healthcare organizations.
“Development of a GAN that generates synthetic, but realistic, data may address these challenges,” they noted.
Expanding the availability of training data sets for deep learning and other AI applications could accelerate the development of clinical decision support tools, especially in organizations that may not have the volume of patients required to secure a large number of images regarding rare conditions.
“When combined with smaller, institution-specific data sets, modestly sized organizations are provided the opportunity to train successful deep learning models,” the team said.
“These results offer a potential solution to two of the largest challenges facing machine learning in medical imaging, namely the small incidence of pathological findings, and the restrictions around sharing of patient data.”
The research team has made its code publicly available for download from GitHub, allowing all researchers to validate their work and continue to develop applications for the strategy.