- The term “artificial intelligence” is often used to describe a wide range of machine learning methodologies with the potential to radically alter the detection, diagnosis, treatment, and management of diseases. While true artificial intelligence is still somewhere over the horizon, there is a rapidly growing market for machine learning tools that can deliver clinical decision support, financial modeling, fraud detection, and other critical insights.
In 2016, Frost & Sullivan projected 40-percent growth annually in the AI market for healthcare and life sciences over the next five years, reaching $6.6 billion by 2021. There may be untold potential for AI to augment and support a huge variety of healthcare-related tasks, but early successes have largely focused on pattern recognition and deep learning techniques that help analyze medical images and sift through patient data.
Such is the case at the MGH & BWH Center for Clinical Data Science, the collaboration of Mass General and Brigham and Women’s Hospitals in Boston to create, promote and commercialize artificial intelligence for healthcare. As Partners HealthCare Innovation noted one year after the launch of the center in 2016, advancements in computing power could very well “revolutionize the way healthcare is delivered.”
“The fundamental goal is in extracting meaningful insights from massive amounts of data and making them actionable,” said Trung Do, Vice President of Business Development. These advancements in artificial intelligence will be on display during Partners HealthCare’s World Medical Innovation Forum in April.
In this HealthITAnalytics.com podcast, Mark Michalski, MD, Executive Director at the MGH & BWH Center for Clinical Data Science (CCDS), details the work of data scientists and researchers at CCDS, the technical infrastructure enabling current efforts at applying machine learning to healthcare, and the potential for machine learning to become a powerful tool in the hands of clinical professionals in the years to come.
KYLE MURPHY: First off, tell us about yourself and the MGH & BWH Center for Clinical Data Science.
MARK MICHALSKI: I’m a radiologist by training, but I sort of lost my way, ended up in a startup or two, and now find myself principally in machine learning as it applies to diagnostics. And it’s a lot of fun. I get to work with phenomenal people here at the center. These are folks who come from all sorts of places with all sorts of backgrounds — from industry and defense and of course healthcare as well as from academics. We have a nice mix of people who understand the data and know how to manage it and create machine learning models that bring meaning from it.
MURPHY: So it makes perfect sense why radiology has been a focus at the center. Can you go into a bit more detail about the machine learning applications in use and those that are in the works?
MICHALSKI: Well, it’s absolutely true that I have a bias because of my background. But I would also say that one of the things about medical imagining data, in particular, is that it’s really nicely posed relative to a lot of other healthcare data for this kind of work. And that’s partly because the data is already digital. It’s standardized in a way. All the chest x-rays more or less look the same. You can feed those sorts of digitized, standardized into the existing technology around deep learning relatively easily. That’s in comparison to clinical notes which can be a little bit messy and it’s a little harder to standardize some of the other healthcare data out there. The other part of this is that when you hear people talking about artificial intelligence these days, artificial intelligence can actually mean a lot of things to a lot of different people. But where most of us in the technology side are thinking about AI, we’re actually thinking about a really narrow set of tools, mostly deep learning. This deep learning set of tools is almost a narrower set of tools that comes out of machine learning and deep learning is particularly good when it comes to images. It actually came out of the computer vision community really. So those are part of the reasons why I think the community, in general, has tried applying this technology to images, whether they’re CT scans, x-rays, MRIs, or pathology slides; or images of moles to try to determine whether it’s melanoma or not; or the back of the eye to determine whether diabetic retinopathy is progressing. Images are a big part of the field today. In the future, I think it’ll look quite a bit different with all sorts of data coming to bear for diagnosis and treatment.
MURPHY: From a man versus machine standpoint, how reliable is the work of these machine learning tools compared to, say, a radiologist looking at the same series of images? Do you have a growing sense of comfort with what the computer is able to do?
MICHALSKI: If you think about the problems that we’re solving on a spectrum—on one side of the spectrum, the technology is sufficient to probably meet what humans can do now if not succeed our capability, and where it hits the ceiling is actually our capability. We train these models with our data. So if our annotations are incorrect, it’s only going to be as good as we are. So that’s one side of the spectrum. There’s a set of problems where we’re there. The technology is probably as good as humans and it’s only going to get as good as our training data is — at least for now. On the other side of the spectrum, we’ve got the problems where we really need a lot of context for what’s going on. So you need to be able to reliably see a lot about the patient’s background, where they come from, and all this contextual information that our current systems don’t know how to handle or can’t capture. And in those areas, these systems don’t work very well or that at they don’t work the delta between what we had and now what we have with deep learning isn’t that great. So there’s really a spectrum. One thing I would say is that at least right now, the systems only get as good as the data we have to train them on so we have to start thinking about things like, "How do we actually define truth, how do we actually label this data?" And that’s becoming a more and more important question.
MURPHY: How would you define deep learning for the average healthcare professional and its implications for the industry in terms of what it entails from a clinical data science perspective?
MICHALSKI: The definitions are really important, and I think the most tech-savvy amongst us are still working on making sure we get the definitions right. Data science has the broadest scope — a combination of computer science and statistics — and was really born out of big data. And then a subset of data science is a set of analytical tools known as machine learning. Machine learning is really a set of tools where you don’t necessarily have to start out with a hypothesis. You can apply a computational method or a machine learning method to make meaning out of a set of labeled or unlabeled data. So it’s something that learns without you having to necessarily give it an initial hypothesis. It’s really a set of statistical and computational tools. Within machine learning is an even narrower set of tools. Deep learning is really the natural evolution of artificial neural networks. Which is a longstanding concept in classification segmentation actually seven decades or more old but it now has new life with the combination of large annotated datasets and GPUs, which taken together make this old concept of artificial neural networks now much more effective and new again.
MURPHY: Where do you see the most near-term potential for AI and machine learning in healthcare, and what are some current challenges these types of tools would be able to solve?
MICHALSKI: What we’re going to see in the near-term is a progression of healthcare data toward a more quantified approach. So for example, most of the time today if you’re tracking a liver tumor or something like that, you might measure that by hand using a digital ruler. But in the future what’s probably going to happen is you’re going to have a machine learning algorithm that’s going to look at that tumor and then give you a quantified volume automatically, and then that quantified volume is going to be populated to your report in structured data. So what we’re looking at in the future is more quantified values that used to be pretty subjective — that’s going to be true for medical imaging, that’s going to be true for pathology, that’s going to be true for just about anything you can apply machine learning to — and that has huge implications for all sorts of things. For example, if you’re being treated for cancer or if you’re being treated for heart disease, you want to really quantify measurements of what the size of your heart is, what the size of your left ventricle is. Right now that’s done through subjective measurements. In the future, that will all be done automatically with machine learning. And that has big implications for the way that we treat people and the way that we diagnose them.
MURPHY: Computing power plays a crucial role in making machine learning possible. Based on your own experience, why is infrastructure important for machine learning in healthcare, the actual physical machines that make this all possible?
MICHALSKI: That’s a great question. The infrastructure necessary to do this work is a combination of things. It’s a combination of hardware stacked software tools for annotation and ultimately the consumption of these models in a clinical workflow. With each one of those areas, there’s still quite a bit of work to do. On the hardware side, there is an ongoing discussion about where the technology should fit. There’s probably a combination of on-premise and cloud-based solutions you want to use to have the most cost-effective and scalable system. That’s something this community is still engaging in. On the software side, there’s going to be a lot of folks thinking about how you rapidly annotate images and data elements for machine learning training. The more that you can integrate that into the existing clinical workflows the better. That’s still a big software task and all that related to infrastructure development in the healthcare environment.
MURPHY: What happens when health IT infrastructure does not have the performance and bandwidth needed for machine learning? What do healthcare organizations need to do or where do they need to look to shore up those inefficiencies?
MICHALSKI: That’s a tough one because unfortunately there aren’t any easy answers there. It is often a multifaceted problem. You have to have a robust network infrastructure to build one of these systems from because you’re talking about using large training data sets to build these models. However, there fortunately is a lot of good work happening right now on trying to create scaled solutions both on-premise and cloud providers are also working pretty hard to democratize this technology. But like lots of hard things, the infrastructure development piece will have to happen over time, and there are still some challenges we have to overcome. For example, I’m spending a fair bit of my time now thinking about how we apportion components of our GPU hardware back to a large set of investigators in the Boston area. Currently, there are some solutions for that but they’re not all totally robust so we’re working hard to solve some of those problems. Check in again in 12 months or so and hopefully we’ll have some solutions there and probably a whole other set of problems.
MURPHY: How can health systems support machine learning efforts on the clinical side and for researchers with services in the datacenter?
MICHALSKI: It party depends on what you want to do. If you’re a large academic center, then very typically you have researchers internally that want to build models. So you have to think about how you’re going to help them do that. That requires things like GPU resources and annotation tools as well as the interest in resources required to integrate those solutions into your clinical workflow. If you’re a small town practice or a smaller practice rather the infrastructure piece you probably are not doing a lot of whole lot of model development but you might want to consume these models. You have to think about what kind of providers can serve those models up to you in a safe, reliable way. That’s a validation question. That’s a question about which services are going to be able to deliver that value in the most cost-effective, reliable way. The field is early enough that that’s a pretty open question. Who the vendors will be, who can provide these services in the best way — is something that I frankly am trying to solve with a lot of my compatriots out there. How we can get this out into the reading rooms most effectively. I think the who and what is something that is still under development and discovery.
MURPHY: What are certain assumptions about AI and machine learning that you’d like to see done away with? Are there misconceptions about the technology that gets in the way of real understanding?
MICHALSKI: When you use the term artificial intelligence, it just means so many different things to different people. Many people think of artificial intelligence almost as the solution to any problem. General artificial intelligence might be that, but we don’t have general artificial intelligence yet. What we have is a narrow set of tools — deep learning — which is very very useful. But the amount that it’s useful differs on the kind of problem that you’re trying to solve. So it’s very effective in computer vision. And it’s also effective in other places. But the difference in its efficacy between what we had ten years ago and what we have now is really different. In other words, you shouldn’t think about artificial intelligence or what’s happening in artificial intelligence as a generic solution for all things. It really pays to know what the existing technologies have allowed us to do differently and where it’s really impactful. That’s what I really wish I could communicate to the rest of the world. A set of narrow tools that are very useful, but differentially so when it comes to different kinds