Uncertain Health in
an Insecure World – 29
“Garbage in – Garbage
Out”
Ninety percent (90%) of the world’s data has been generated in the last 2 years.
A physician’s assistant just completed my life insurance
physical exam and lab work. In 20 minutes or so, he recorded a medical history,
measured vital statistics – height, weight, pulse, blood pressure – then
collected my blood and urine for biochemistry and other assays. Just imagine
how many times per second this type of medical data and personal health
information (PHI) is obtained in doctor’s offices, ambulances & hospitals,
and entered into databases around the world.
Global healthcare data is growing exponentially.
In fact, the average hospital generates 665 terabytes of
medical and PHI data per year! California-based Kaiser Permanente healthcare
system has accumulated 40-50 petabytes of insurance and treatment data from its
~9 million members!! The entire U.S. healthcare system will soon exceed a zettabyte
(1021 gigabytes) of data!!!
Significant advances in desktop gene (DNA) sequencing and polymerase chain reaction (PCR) testing procedures provide doctors with rapid diagnoses of
infectious microbes, drug resistance and rare diseases. These new technologies
are also generating massive amounts of data. A key business strategy of some
Big Pharma and global medical lab services conglomerates is to use such
companion diagnostics data to bring greater treatment efficiency &
effectiveness to the healthcare marketplace – so-called precision or
personalized medicine.
This is the promise of “Big Data!”
International Business Machines (IBM) has defined three big
data characteristics: volume (i.e., hard data storage
capacity), velocity (i.e., processing time after queries) and variety
(i.e., structured, unstructured and semi-structured log-in files required to
access internet programs). McKinsey estimates that up to 80% of U.S. healthcare
system information being collected is unstructured – medical device recordings,
doctor’s notes, monitor & sensor readouts, lab results, imaging studies,
clinical outcomes and financial claims data.
In healthcare, there is a fourth big data ‘V’ called veracity.
This blog previously noted the insecurity of PHI data (see post #21). Healthcare data quality assurance is also critical. Like financial data, healthcare data (i.e., prescription handwriting) must be error-free and credible. Poor healthcare data quality in a data warehouse can have life & death consequences, especially when using unstructured data.
This blog previously noted the insecurity of PHI data (see post #21). Healthcare data quality assurance is also critical. Like financial data, healthcare data (i.e., prescription handwriting) must be error-free and credible. Poor healthcare data quality in a data warehouse can have life & death consequences, especially when using unstructured data.
Analytics is the key value proposition for healthcare big
data.
The scaling up of such unstructured healthcare data requires
different analytics architecture than available business intelligence tools. Industrial
strength big data computing demands distributed processing capabilities across
several servers (or nodes), utilizing parallel “divide & process” open architecture computing that has only
recently become available.
Rapidly increasing data requires adequate alternate storage
capacity. Check!
Data centers can now store data on large servers (100 terabyte capacity) for later processing using code designed for the relevant application. And while 2015 computer hard disk storage (terabytes and even petabytes), random access memory (RAM >16 megabytes) and reading capacity (>100 megabytes per second) have increased 1000-fold since the 1990’s, it was the advent of open architecture computing that made the promise of big data analytics possible.
Data centers can now store data on large servers (100 terabyte capacity) for later processing using code designed for the relevant application. And while 2015 computer hard disk storage (terabytes and even petabytes), random access memory (RAM >16 megabytes) and reading capacity (>100 megabytes per second) have increased 1000-fold since the 1990’s, it was the advent of open architecture computing that made the promise of big data analytics possible.
The vision of governments and businesses operating in the
healthcare sector is to combine big data, advanced computing science and
analytics to solve the complexity of chronic disease management, improve
clinical trial accuracy and enable personalized treatments in daily medical
practice. The only solution to big data becoming useful is prescriptive &
predictive analytics (see post#18).
Before the advent of open architecture computing, data was
stored on disks and computation was processor-bound. Conventional relational
databases were accessed using structured query language (SQL) supported on one
server. Programs written in Java script ran data analysis, wherever the processor
was located, without sending the program to a data center. Processing speed
declined when large data sets were presented to the server at the same time.
Huge data sets generated by web search engines Google and
Yahoo led Doug Cutting (below) to invent and the Apache Software Foundation to release the Hadoop Distributed File
System (HDFS, 2003) in order to store big data. MapReduce (MAPR, 2004) software can distribute sub-tasks to 100’s or 1,000’s of servers in a Hadoop cluster (i.e.,
“shuffling”), map initial outputs, then
reduce & track these map outputs in parallel processing jobs.
Open architecture "write once, read many" computing ecosystems like Hadoop improve
processing time by using multiple servers to handle a large amount of stored
data, reducing the time from query to output. Hadoop achieves high processing
speed by using servers working in parallel, equalizing processing power to match
the huge amount of data generated.
The healthcare sector has been strangely slow to join this big data revolution. Why?
Patient confidentiality concerns have lagged adoption
compared to the retail and banking industries. Big data capture by healthcare
systems, governments and Big Pharma has paralleled technical open architecture advances
in computing science. Ever-increasing healthcare costs raises serious
sustainability concerns that can now be confronted by aggregating insurance
risks (“bundling”), managing
utilization (“right care”) and tracking
patient outcomes in linked databases (i.e. Kaiser Permanente’s HealthConnect).
Working with such big data at scale may eventually allow healthcare
stakeholders to create value by exchanging efficacy information and incentivising greater
efficiency.
Until this happens, it’s healthcare big data “garbage in – garbage out” littering the Square.
No comments:
Post a Comment