- The healthcare industry has been diving into the pool of precision medicine with great enthusiasm, setting up dedicated big data analytics centers and genomics research labs to unravel the secrets of life’s greatest mysteries: the impact of a patient’s DNA on his risks of developing diseases and his individual responses to available therapies.
Though researchers have been able to dramatically reduce the time and costs involved with sequencing the human genome, the process still requires vast computing resources and a thorough knowledge of big data analytics principles in order to extract meaningful insights from the results.
But even though supercomputers are processing genomic information at lightning speed, the ability of researchers to interpret and analyze the results hasn’t kept pace.
In order to understand the particular variations of a patient’s genome, sets of data containing millions of fragments of information, called “reads,” must be matched up with sample data, like tissue paper over a template, so researchers can pinpoint the differences.
The mismatch between processing speed and read analysis is creating a big data “bottleneck,” according to researchers from the FDA, shifting the problems of precision medicine downstream as experts try to operationalize the potential of extraordinarily personalized care.
With the number of sequenced genomes expected to double every twelve months, reaching approximately 1.6 million human examples by 2017, the development of next-generation sequencing (NGS) techniques is vital to the continued success of early precision medicine explorations.
“Many genetic biomarkers that are used in clinical practice and drug development were identified through genome-wide association studies (GWAS) using genotyping microarray technology,” write Hao Ye, Joe Meehan, Weida Tong and Huixiao Hong from the Division of Bioinformatics and Biostatistics at the FDA’s National Center for Toxicological research. The study was published in the November edition of Pharmaceutics.
Genome-wide association studies have seen some measure of success, but they are prone to microarray genotyping errors and subject to the effects of variation in genotype calling algorithms, the study says. Scientists who have been disappointed by some of these early techniques are leveraging new big data analytics strategies to overcome many of these obstacles.
“In theory, a read can be successfully aligned onto a reference genome by applying a series of insertions, deletions, and substitutions,” the study says. “An alignment algorithm assigns a score to the alignment of a short read onto a reference to estimate how well they align.”
“The score is used to identify the optimized location of the read in the reference genome. A good alignment algorithm is able to map reads onto a reference genome rapidly and accurately.”
Current methods rely on two main strategies, including “seed and extend” analytics and q-grams. Both utilize small fragments of data extracted from a read to make comparisons between the new data and the sample, based on certain assumed parameters.
Next-generation sequencing platforms can use a DNA library to generate a raw image of the data, which is then translated into raw reads before being aligned with sample sequencing and analyzed to produce downstream results.
“The rapid development of next-generation sequencing technologies provides a promising opportunity to extend the capability of biomarker discovery in precision medicine,” the researchers write. But certain challenges remain.
Repeated regions comprise approximately half of the human genome, yet many of these regions include very slight variations that may affect the accuracy of sequencing results. Researchers have yet to devise reliable strategies for mapping reads with speed and high sensitivity. Existing seeding and q-gram techniques are often slow and may not be specific enough, the authors note.
In addition, many systems are designed specifically to handle shorter reads, and might not produce acceptably low error rates when analyzing longer data sequences. The team expects that future systems will be developed with these longer sequences in mind, and will be better able to identify gene variants and other critical information as they become more sophisticated.
As more and more human genomes are sequenced, and precision medicine becomes more integrated into clinical practice, researchers are likely to face additional challenges, especially when it comes to reference data. Patients of different ethnic backgrounds have notably different genomic variants, which means that researchers have to be careful when selecting a reference genome to work with.
The FDA is working closely with partner organizations to develop these next-generation sequencing techniques as precision medicine initiatives begin to gain steam. While the White House is still waiting on Congress to approve funding for its centerpiece million-patient DNA database, other efforts are proceeding at a healthy pace.
Researchers, geneticists, and data scientists across the healthcare system are putting grant money to work to tackle neurodegenerative diseases, rare cancers, and patient monitoring. Collaborations between healthcare providers, the National Institutes of Health (NIH), and academic research centers are producing exciting precision medicine breakthroughs, and new sources of funding keep rolling in.
Last week, the Kraft Family Foundation announced a $20 million pledge to Harvard Business School to establish the Kraft Endowment for Advancing Precision Medicine, a tribute to Myra Kraft, who succumbed to ovarian cancer in 2011.
The gift follows a similar donation from Joshua and Marjorie Harris to the Icahn School of Medicine at Mount Sinai, and contributes to the pattern of linking academic institutions with clinical researchers and big data experts to craft real-world applications for narrowly targeted therapies.
“We are at a point in history where big data should not intimidate, but inspire us,” said NIH Director Dr. Francis Collins in 2014. “We are in the midst of a revolution that is transforming the way we do biomedical research. In some cases, rather than posing a question, designing experiments to answer that question, and then gathering data, we already have the needed data in hand. We just have to devise creative ways to sift through this mountain of data and make sense of it.”