Analytics in Action News

RSNA Creates Medical Imaging Dataset for Machine Learning Research

As the largest public collection of brain hemorrhage CT scans, the medical imaging dataset has greatly accelerated the development of machine learning models.

RSNA creates brain hemorrhage dataset for machine learning research

Source: Thinkstock

By Jessica Kent

- The Radiological Society of North America (RSNA) has created a public medical imaging dataset of expert-annotated brain hemorrhage CT scans, leading to the development of machine learning algorithms that can help detect and characterize this condition.

Intracranial hemorrhage is a potentially life-threatening problem that has both direct and indirect causes. Accurately diagnosing the presence and type of intracranial hemorrhage is a critical part of effective treatment, RSNA said.

RSNA set out to create a brain hemorrhage CT scan dataset for the most recent edition of its Artificial Intelligence Challenge. In the 2019 edition, participants were tasked with creating a machine learning algorithm that could help detect and characterize intracranial hemorrhage on brain CT.

Instead of using an existing dataset, as the team had done for the first two challenges, the competition’s organizers decided to create one from scratch. They compiled the dataset from three institutions: Stanford University in Palo Alto, California, Universidade Federal de São Paulo in São Paulo, Brazil, and Thomas Jefferson University Hospital in Philadelphia, Pennsylvania.

RSNA partnered with the American Society of Neuroradiology (ASNR) to curate the dataset and organizers issued an open call for volunteers within the ASNR membership to annotate the images. A day and a half later, they had 140 volunteers from which they selected 60 to annotate a collection of 874,035 brain hemorrhage CT images in 25,312 unique exams.

Volunteers marked each image as normal or abnormal. For abnormal images, they marked the hemorrhage subtype.

"It was a nail-biter all the way along," said the paper's lead author, Adam E. Flanders, MD, neuroradiologist and professor at Thomas Jefferson University Hospital.

"We were building the airplane while it was in flight. When you consider the number of images that we had to de-identify locally, consume, curate, label, cross-check and then organize into just the right datasets to release to the contestants, there was a lot of work involved by the volunteer workforce, the RSNA Machine Learning Subcommittee, data scientists, contractors and RSNA staff.”

Upon releasing the dataset, organizers received 22,200 submissions from competitors in 75 countries. Submissions came from all over – some coming from people outside the medical realm.

"I was really impressed by the huge volunteer effort and the tremendous worldwide interest in this project," said Flanders. "The ten top solutions came from all over the world. Some of the winners had absolutely no background in medical imaging."

RSNA released the dataset under a non-commercial license, so it is freely available to all AI researchers for non-commercial use and further refinement.

The RSNA team noted that engaging with a subspecialty society to leverage their unique expertise is an effective method to follow for future collaborations. Organizers are using the approach again for this year’s competition, a partnership with the Society of Thoracic Radiology that aims to improve detection and characterization of pulmonary embolism on chest CT.

"The value of this challenge is to create a dataset that might lead to a generalizable solution, and the best way to do that is to train a model from data originating from multiple institutions that use a variety of CT scanners from various manufacturers, scanning protocols and a heterogeneous patient population," said Flanders.

"In this case, we had data from three institutions and international participation. The dataset is unique, not only in terms of the volume of abnormal images but also the heterogeneity of where they all came from. The dataset we created for this challenge will endure as a valuable ML research resource for years to come."