Data Analytics Tool Distinguishes Cancer Cells from Normal Cells

By Jessica Kent

January 19, 2021 - A data analytics tool can evaluate complex gene expression information and distinguish cancer cells from normal cells in tumor samples, according to a study published in Nature Biotechnology.

Researchers have historically studied tumors as a mixture of all cells present, many of which are not cancerous. With the emergence of single-cell RNA sequencing in recent years, researchers are able to analyze tumors in much greater resolution. Scientists can examine the gene expression of each individual cell to better understand the tumor landscape, including the surrounding microenvironment.

However, it’s difficult to distinguish between cancer cells and normal cells without a reliable computational approach, researchers noted. To improve upon older methods, a team from The University of Texas MD Anderson Cancer Center developed a new data analytics algorithm called the CopyKAT (copy number karyotyping of aneuploid tumors) model.

CopyKAT increases accuracy by adjusting for the newest generation of single-cell RNA sequencing data. The tool could help researchers more easily evaluate the complex data obtained from large single-cell RNA sequencing experiments, which deliver gene expression data from many thousands of individual cells.

CopyKAT uses this gene expression data to look for aneuploidy, or the presence of abnormal chromosome numbers, which the team noted is common in most cancers. The tool could also help identify distinct subpopulations, or clones, within the cancer cells.

The team first benchmarked its tool by comparing results to whole-genome sequencing data, which showed high accuracy in predicting copy number changes. In three additional datasets from pancreatic cancer, triple-negative breast cancer, and anaplastic thyroid cancer, CopyKAT was able to accurately distinguish between tumor cells and normal cells in mixed samples.

In analyzing these samples, the team also showed that the tool can effectively identify subpopulations of cancer cells within the tumor based on copy number differences, as confirmed by experiments in triple-negative breast cancers.

“We developed CopyKAT as a tool to infer genetic information from the transcriptome data. By applying this tool to several datasets, we showed that we could unambiguously identify, with about 99 percent accuracy, tumor cells versus the other immune or stromal cells present in a mixed tumor sample,” said Nicholas Navin, PhD, senior author of the study and associate professor of genetics and computational biology.

“We could then go one step further to discover the subclones present and understand their genetic differences.”

The study was made possible by MD Anderson’s Moon Shots Program, a collaborative effort to rapidly develop scientific discoveries into meaningful clinical advances that save patients’ lives. The program leverages ten research platforms to find patterns, evaluate treatments, and predict outcomes, bringing experts together to find new ways to end cancer.

The CopyKAT tool is freely available to researchers. The team noted that the tool is not applicable to the study of all cancer types. For example, aneuploidy is relatively rare in pediatric and hematologic cancers.

Still, researchers expect that the CopyKAT tool will improve the identification of cancer cells and facilitate better cancer care.

“By using CopyKAT, we were able to identify rare subpopulations within triple-negative breast cancers that have unique genetic alterations not widely reported, including those with potential therapeutic implications,” said Ruli Gao, PhD, assistant professor of cardiovascular sciences at Houston Methodist Research Institute.

“We hope this tool will be useful to the research community to make the most of their single-cell RNA-sequencing data and to drive new discoveries in cancer.”

Researchers have increasingly looked to genetic data to improve cancer treatment and make more informed care decisions. A separate study recently published in Gastroenterology showed that using genetic data from diverse populations could help researchers develop better risk prediction scores for inflammatory bowel diseases.

“The ability to accurately predict genetic disease risk in individuals across ancestries is a critical avenue that may positively affect patient outcomes, as early interventions and even preventive measures are being considered and developed,” said the study’s senior author Judy H. Cho, MD, Dean of Translational Genetics and Director of The Charles Bronfman Institute for Personalized Medicine at the Icahn School of Medicine at Mount Sinai.

“These findings support a need for greater genetic diversity, including more data on African American populations, to enhance disease risk predictions and reduce health disparities for all populations.”

Analytics in Action News

Data Analytics Tool Distinguishes Cancer Cells from Normal Cells

The tool leverages data analytics technology to accurately differentiate between cancer cells and normal cells found in tumor samples.

Next in Analytics in Action