- The data warehouse is often seen as the holy grail of analytics tools. A single, centralized, normalized data set containing every piece of information being used to produce reports leaves your organization with clean, vetted data uninhibited by silos or the need for additional conversion. But the healthcare industry has been collecting data for years before “warehousing” ever became a Wikipedia page, and that data is still critically relevant when evaluating disease trends, long-term outcomes, and population health. It’s stored in a hundred different local systems, and that might be okay. Can you perform good analytics without a data warehouse? Should you?
There’s no question that a data warehouse has advantages. But it’s also expensive and oftentimes difficult to build, requiring an organization to rip out every existing system and replace it with a new solution. While this might be worth the effort for many health systems, even more facilities are already buckling under the financial strain of EHR implementation, changes in payment models, penalties for quality measures, and fierce competition with rivals. An argument could be made that better data analytics will give a hospital an edge in the fight to stay afloat, but realistically, not every CFO is willing to take the risk.
That doesn’t mean you can’t perform meaningful analytics by drawing on disparate data sources. By leaving data in its original form, you may encounter more errors, since the data is not pre-vetted by the warehouse, but you will also maintain all the granularity of information that doesn’t have to be pushed and molded into a single format. You can start right away with significantly less financial outlay. And you avoid the time and expense of duplicating data just for the sake of adding it to the big pot.
“It’s not always necessary to duplicate and replicate everything. Some data is already readily available in the enterprise data warehouse, with fast, random access through highly optimized schemas and indexing,” says Yves de Montcheuil, Chief Marketing Officer at Talend. “You may need to bring in a subset of this data when needed to perform lookups or joins, but…some other data sets might be better off just residing where they are produced.” As long as there is a logical layer that allows you to draw on multiple sources to produce a single report, you can skip the centralization and get straight to the results.
Many healthcare organizations are turning to Hadoop to achieve this. The open-source framework allows for data to be divided into clustered nodes, each running on an independent server tied to a master node. This harnesses the power of an entire IT network without the need for a centralized repository, keeping the network scalable and agile. It’s possible to have a single-tier Hadoop architecture, which mimics the warehouse in that data is centralized, but the advantage of the system is in flexibility and abstraction.
“The Hadoop data warehouse is expected to continue the long-term trend in the data warehouse–evolution movement away from centralized and hub-and-spoke topologies toward the new worlds of cloud-oriented and federated architectures,” explains James Kobielus, IBM Senior Program Director, Product Marketing, Big Data Analytics solutions. “The Hadoop data warehouse itself is evolving away from a single master schema and more toward database virtualization behind a semantic abstraction layer. Under this new paradigm, the Hadoop data warehouse will require virtualized access to the disparate schemas of the relational, dimensional, and other constituent database management systems (DBMSs) as well as other repositories that constitute a logically unified cloud-oriented resource.”
Could this be the solution cash-strapped healthcare organizations have been looking for? As long as you keep in mind the big picture, by creating a logical architecture to link clusters and pay close attention to data quality, there are alternatives to the single warehouse model. Creating a scalable, flexible big data framework can reduce implementation costs while taking advantage of data that has already been laboriously collected through EHRs by overworked clinicians.