Analytics in Action News

AI Model Uses Free-Text Data to Accurately Classify COVID-19 Symptoms

Researchers have developed an artificial-intelligence method that can use EMR-based free-text data to classify COVID-19 symptoms, which may improve future predictive analytics models.

Source: Getty Images

By Shania Kennedy

- Researchers have developed a prediction model that can convert coronavirus-related free text information from EMRs into symptom-based data.

In a study published in JMIR Medical Informatics, researchers aimed to determine whether or not it was feasible to create and deploy an artificial-intelligence (AI) method for extracting raw free-text data related to COVID-19 from patient EMRs and classifying it into useable, symptom-based data that could be used for large-scale analysis.

Free-text data is less frequently used in predictive analytics research because it is challenging to normalize. However, the researchers note that free-text data often contain valuable information that could complement coded data, particularly within a pandemic. Thus, AI models that can transform free-text data into standardized data have significant potential in clinical research.

The researchers developed their model using the iCAREdata database, which houses EMR data from out-of-hours (OOH) healthcare providers. From this, the researchers created a dataset of records from Jan. 1, 2019, to Nov. 30, 2020. They then extracted 15 free-text fields per record as input to be classified into a list of 27 COVID-19 signs and symptoms. Both objective and subjective signs and symptoms were included.

A random sample of the free-text inputs was split— one-third were from before the pandemic and two-thirds after the beginning of the pandemic. These inputs were given to five clinicians to annotate for symptom codes. Samples were presented to a machine-learning (ML) text categorization model and two deep neural networks (DNNs), BERTje and domain-adapted BERTje, and then evaluated based on their ability to correctly classify the data.

Overall, all models achieved high results, with BERTje performing slightly better than the domain-adapted version and the ML model. The researchers posit that these results indicate that developing and using an AI model for mining free-text data and transforming it into relevant, symptom-coded data is feasible and necessary.

Once coded, symptoms generated from free-text data can be used to develop and test other algorithms, assess the quality of history-taking and record-keeping, and real-time symptom surveillance during a public health emergency, the researchers said.

Other studies have also utilized free-text data for clinical analytics.

A study published last month in JMIR Medical Informatics shows that free-text data from EMRs can be combined with structured clinical data to help predict lymph node metastasis (LNM) in lung cancer patients. LNM is necessary for clinical decision-making for these patients, but it is often challenging to diagnose preoperatively.

To test whether using free-text data from EMRs, which often contains valuable information related to LNM status, could help them develop prediction models, researchers developed six ML models that leveraged natural language processing (NLP).

The NLP algorithm extracted relevant predictive features from 794 EMRs, and related computed tomography (CT) reports, and the data was given to the ML models. All the models achieved high performance in predicting LNM status. The models also outperformed clinicians, but more research is needed to validate the models and confirm the findings.