Researchers Validate NLP Tool to Extract SDOH from Clinical Notes

By Shania Kennedy

August 25, 2023 - Researchers from the Indiana University Richard M. Fairbanks School of Public Health (IU Fairbanks School) and Regenstrief Institute have validated the generalizability and portability of a natural language processing (NLP) model designed to extract social determinants of health (SDOH) data from clinical notes.

Developing an artificial intelligence (AI) tool for use in healthcare comes with a host of challenges, but the generalizability and portability—defined in the research as the ease and accuracy with which the model can be deployed in a new environment and updated to meet the needs of new data, in addition to how well it performs when applied to that data—is often one of the most significant.

These issues are especially difficult to tackle in NLP-based models or those dealing with SDOH data, as the unstructured and often less standardized nature of the data sources can make training or tuning a model for use across more than one health system cumbersome.

The model in this study, previously developed by IU and Regenstrief researchers, is designed to use NLP to extract individual social determinants from clinical notes to help inform patient care.

“Social factors have a great impact on our health. It’s not just the medical care that we receive, but it’s also the places where we live, the places where we work and our access to food and transportation and other resources that have a major influence on our health,” said Chris Harle, PhD, MS, senior author on the study, who serves as professor and chair, Health Policy and Management at the IU Fairbanks School, in a press release discussing the study.

“It’s important for the clinicians and health systems providing medical care to know about people’s social risk factors so when prescribing medications, ordering tests or planning to perform a procedure, they can better treat the whole person — perhaps with lower cost drugs or alternative sources for tests — and can also link them to services that help address their needs for a safe place to live and healthy food to eat,” he continued.

To evaluate how well the model would be able to extract these factors in a different clinical setting, the research team applied it to six million clinical notes generated during a six-month period at the University of Florida Health (UF Health).

The tool was tasked with extracting financial insecurity and housing instability data from these notes after being developed on notes from IU, and the researchers adjusted the model to accommodate the notes from UF Health.

The NLP’s performance was measured in terms of accuracy, positive predictive value, sensitivity, and specificity.

The tool flagged 13,000 notes for financial insecurity and 19,000 for housing instability. For both factors, the model achieved 0.87 or higher across all performance metrics.

The researchers noted that these findings underscore the need to accommodate institution-specific note-writing templates and clinical terminology of emergent diseases when leveraging NLP to extract SDOH data.

“The more that we can disseminate and adapt natural language processing and other artificial intelligence methods that fully describe a patient to give clinicians a full 360 understanding of patients’ needs, the better,” Harle stated.

“If we can extract social information more efficiently, it’s less costly,” he continued. “Then we can start to take what we’d call a population health perspective. So, if a health system can efficiently identify the patients who have housing instability — the population of patients who have this need — then the healthcare system may be able to employ a more proactive population-based intervention to serve that whole group of people, connecting them, for example, to the housing services in the community or financial resources that might be available.”

This research is the latest to look at how NLP can help healthcare organizations more effectively tackle patients’ SDOH needs.

Last week, researchers shared that they had developed an NLP model capable of using unstructured EHR data to flag SDOH for Alzheimer's disease and related dementias (ADRD) patients.

SDOH are key risk factors for adverse health events in ADRD populations, but identifying and addressing those needs is a persistent challenge for care teams.

The model is designed to help combat this by automatically extracting data related to transportation, food, housing, financial difficulties, social isolation, abuse, neglect, or exploitation, and medication insecurities.

Tools & Strategies News

Researchers Validate NLP Tool to Extract SDOH from Clinical Notes

An NLP-based SDOH tool has displayed significant generalizability, portability, and accuracy across two organizationally and geographically distinct health systems.

Next in Tools & Strategies