Tools & Strategies News

Framework to Mitigate Bias in Radiology Machine-Learning Models

A new report outlines best practices for mitigating bias in artificial intelligence and machine-learning tools often used in radiology.

3 rows of 6 lightbulbs on a teal background. All the lightbulbs are unlit except for the second one from the right in the bottom row

Source: Getty Images

By Shania Kennedy

- A special report published in Radiology: Artificial Intelligence last week highlighted the practices that can lead to bias in artificial intelligence (AI) and machine-learning (ML) models increasingly used in radiology and provides strategies to mitigate these issues.

The report is the first in a three-part series on the topic. Part one focuses on data handling, part two on model development, and part three on performance evaluation.

In this first installment, the authors state that 12 suboptimal practices occur during the data handling process that can potentially lead to bias. In this report, data handling is defined as “all data-related processes following the initial planning for an ML study up to model development and training.”

Using this definition, the researchers outlined a framework that divides data handling into four steps: data collection, data investigation, data splitting, and feature engineering. Within each step, there are three overarching practices that have the potential to lead to biases, according to the authors.

During data collection, these practices are improper identification of the dataset, using a single source of data, and using an unreliable source of data. Improper identification of the dataset can happen when researchers fail to collect all available data or feed the model redundant data while using a single source. Use of an unreliable source of data can occur if researchers do not consider best practices for reliability and generalizability.

To address these pitfalls, in-depth reviews of the relevant clinical and technical literature and seeking input from experts are recommended. The authors also advise collecting data from trusted institutions and utilizing multiple datasets.

Data investigation involves evaluating the collected data to detect potential issues. The potential for biases arises when this process involves inadequate exploratory data analysis (EDA), EDA with no domain expertise, and failing to observe actual data.

EDA serves to organize and summarize the raw data to identify important patterns within it, including flagging any deviations. Any findings are then interpreted and addressed. An inadequate EDA or one without domain expertise can lead to important deviations in the data being missed, which can negatively impact data integrity and quality. Failing to observe actual data, which refers to the researchers personally inspecting the data rather than its statistical properties, can also lead to missed insights.

Successfully mitigating bias in these steps requires thoroughly investigating the collected data using statistical and observational tools and engaging in knowledge sharing with clinical and data science experts.

Data splitting, which refers to the process of dividing data into training, validation, and testing sets, is where many data handling errors occur in studies using medical data, according to the report. Data splitting practices that can lead to bias are broken down into leakage between datasets, imbalanced datasets, and overfitting to hyperparameters.

Leakage between datasets refers to when data from one set is present in another, while imbalanced datasets do not represent real-world data or are otherwise not generalizable. Overfitting to hyperparameters is when a model does not perform as well on the test set as it did on the training set because those parameters were unintentionally tuned to work well on the test set but not on other datasets.

Addressing these issues requires splitting data at the most appropriate levels, which is typically the patient level when considering medical data; splitting data in ways that accurately represent real-world data trends; and using a subset of the training data for hyperparameter tuning.

Feature engineering transforms or removes features from the input data so that an ML model 'sees' more meaningful, less redundant features. Within feature engineering, improper feature removal, improper feature scaling, and mismanagement of missing data can lead to bias.

Feature removal seeks to remove noise from the data to make it easier for the ML model to learn, and improper feature removal can involve removing the wrong features or making incorrect assumptions about what is considered noise. Feature scaling refers to applying mathematical operations to features so that they all have similar scales and magnitudes, and improper feature scaling can cause data normalization issues. Missing data impacts data representation and dataset balance, making the handling of this data critical.

To combat these issues, the authors recommend evaluating feature removal scenarios in separate experiments, consulting statisticians to assist with standardization and normalization, and using imputation to handle missing data.

These issues and the proposed solutions in the report are not exhaustive, but they provide a starting point for researchers interested in mitigating bias in radiology AI, the authors stated.