Healthcare Data: Batch vs. Stream Processing

Categories: health plans, health systems, healthcare organizations, associations
Editor's Note: This blog was published prior to the transition to WebMD Ignite.
The healthcare industry is undergoing a big data revolution. With the massive influx of patient and physician data from EHRs, surveys, personal data sources, and more, healthcare institutions need effective ways to harness the information to master precision marketing, optimize the patient experience, and improve network utilization.
Historically, the healthcare industry has lagged in technology adoption. In fact, many systems still rely on large data processing in the form of batch processing, which is limited by its inability to process data quickly. Stream processing, on the other hand, supports near real-time data processing, giving health systems a higher level of accuracy with time-sensitive data processing.
Let’s take a look at batch vs. real time stream processing and explore the implications for the healthcare industry:
What is batch processing?
Batch processing is an efficient way of processing high volumes of healthcare data where a group of transactions is collected over a period of time. Typically, these systems are structured around complex event processing (CEP), which uses event-by-event processing and aggregation.
The biggest challenges when working with healthcare big data are volume, velocity, variety, and veracity. Batch processing addresses volume and variety in the big data architecture. The masses of structured and semi-structured historical data are typically stored in Hadoop with a batch processing system.
One of the primary challenges of batch processing is the latency of the computation. In other words, data that comes in big batches and is cleansed through a batch processing system can be several hours, days, or sometimes weeks to a month old by the time it reaches healthcare professionals.
The end result is oftentimes-outdated data that’s a byproduct of “too late” architecture.1
What is stream processing?
Stream processing is a data processing model that computes one data element or a small window of data in near real-time, processing in seconds to minutes at most. Technology capable of stream processing produces near real-time data because it processes data as it comes through the health system. Within this type of processing system, there is a higher level of accuracy, which is significant for time-sensitive data.
Contrary to batch processing, stream processing analyzes and acts on real-time data using “continuous queries.”
“Essential to stream processing is streaming analytics, or the ability to continuously calculate mathematical or statistical analytics on the fly within the stream...[Such] solutions are designed to handle high volume in real time with a scalable, highly available and fault tolerant architecture. This enables analysis of data in motion.”1
Stream processing in healthcare
As the healthcare industry increasingly moves toward a value-based healthcare model, there’s a greater need for real-time data to inform key decisions, personalize patient marketing campaigns, improve patient outcomes, and encourage patient engagement.
This is where stream-processing architecture comes into play.
When everything is connected, from administrative perspective to historical data and output in near real-time data points, health institutions have the opportunity to deepen patient and physician connections, enhancing the patient experience.
Near real-time data processing solutions allow healthcare systems to make better decisions based on more robust and better quality data. As a result, they can take immediate action on enriched data insights and analysis, which can be significant to the health of a patient as well as their experience with a hospital or health institution.
Near real-time output has the additional benefit of scalability due to more agile data processing. What this means is that health systems have the ability to process data until they feel the quality of the data is most accurate. What’s more, if there is a data quality issue, the root cause can be remediated much faster with stream processing than with a batch processing system.
Batch vs. stream processing
Though stream processing has its benefits, there’s room for both data processing methods in the field of health analytics. Batch processing is often less complex and more cost effective than stream processing and can be applicable for certain bulk data processing needs.
As outlined above, however, there is a lag time with batch processing systems whereas stream processing manages more data at a faster pace. By collecting, curating, enriching, and analyzing data in near real time, stream processing solutions implicitly allow for higher quality data that can inform agile decision making.
Final thoughts
As healthcare’s data sophistication continues to evolve in the years to come, one of the fundamental challenges will be prioritizing “small data” that delivers actionable intelligence, rather than suffering analysis paralysis under the mountains of data at their disposal. While there is no “one-size-fits-all” solution that will be the right fit for every organization, it’s important to consider the advantages and disadvantages of batch and stream processing systems to determine the right approach.