michael poremba, director, data architecture at practice fusion
DESCRIPTION
Practice Fusion, the largest cloud-based electronic health records (EHR) system in the US, used by more than 100,000 health care providers managing over 100 million patient medical records, faced the need to move their four terabyte HIPAA audit reporting system off of a relational database. Practice Fusion selected MongoDB for their new HIPAA audit reporting system. Learn how the team designed and implemented a highly scalable system for storing protected health information in the cloud. This case study covers the move from a relational database to a document database; data modeling in JSON; sharding strategies; indexing; sharded cluster design supporting high availability and disaster recovery; performance testing; and data migration of billions of historical audit records.TRANSCRIPT
Transitioning a 4 TB Health Care
Security Auditing System to MongoDBMichael Poremba
Director, Data Architecture
Practice Fusion
IntroductionsGetting started
+ 20 years software engineering
+ Data architect / application architect
+ High-volume OLTP relational databases
+ Application performance and scalability
+ Domain experience:Health care; financial services; IT management; content management and distribution;
targeted advertising; telecom billing; manufacturing; insurance
Michael Poremba @ Practice Fusion
+ Cloud-based electronic health records (EHR)
+ Over 100,000 health care providers in US
+ Over 90,000,000 patient medical records
+ OLTP database: Week day peak ~ 40,000 transactions per second
+ 4 TB security auditing records ~ 50% of OLTP database storage
Practice Fusion
+ HIPAA: Health Insurance Portability and Accountability Act of 1996
+ Who did what to which patient’s medical record when?
+ Regulatory requirement—audit log must be kept and reviewed
+ Law enforcement and evidence in legal discovery
+ Save the audit log forever
+ Primary use cases:
Audit report in EHR: Security audit log viewer
Physician data analytics: Clinical quality measures (CQM)
HIPAA Security Audit Log
HIPAA Security Auditing on MongoDB
Project anatomy & lessons learned
Security Auditing – Legacy Architecture
Public
Load
Balancer
App 1
App 2
App n
.
.
.
EHR
(OLTP DB)
ActivityFeed
ActivityFeedParameter
4..8
CQM
(reporting)
ETL
Audit
Report
+ Latency on SAN increased
+ Response time slowed for writes
+ Database connections held longer
+ Connection pool expanded
+ User interface locked up—waiting
+ Users tried to log in again
+ Login is heaviest user operation
+ [Repeat]
The Log Jam
Found at: http://anchorhardwoods.com/wp-content/uploads/2011/08/log-jam.jpg
Audit Service – New Architecture
Public
Load
Balancer
App 1
App 2
App n
.
.
.
MongoDB
Audit Log
Audit
ServiceAMQ
Queue
Listener
Audit
Report
CQM
(reporting)
ETL
+ Isolate auditing system from EHR OLTP database
+ Extract audit IO off of EHR SAN
+ New service interface for audit events
+ Scale out audit service
+ Scale out data store for auditing
Benefits of New Architecture
Project Objectives
+ New infrastructure for MongoDB
and AMQ
+ Modernize audit service API
+ Modernize audit report UI
+ Convert ~200 audit write operations
to new service API
+ Data warehouse ETL from MongoDB
+ Migrate 4 billion exiting audit records
New Security Auditing SystemColetteprogram management
Ernestservices expert
Bhaviktest engineering
Michaeldata architecture
Jeffcluster architecture
JayMongoDB expert
BrettAMQ expert
Bryaninfrastructure coordination
Carlosdata warehouse ETL
+ Transaction volume: Sustain 1,000 new documents per second
+ Data volume: Scale to 10’s of billions of audit event records
+ High availability and disaster recovery—higher SLA than EHR
+ Quick UI response time for interactive audit report
+ Tamper prevention and detection
No updates or deletes permitted on audit log
Security alerts when audit log is altered
+ Leverage industry standards for health care security audit logging
~300 distinct auditable user actions
Required and varying data elements
Security Auditing – Application Requirements
AuditEvent
ParticipantObject
AuditSystem
User
0..n1..1 1..2
Health Care Industry Standards for Audit Logging
+ ISO 27789:2013: Health
Informatics – Audit trails for
electronic health records
+ ASTM E2147-01(2013):
Standard Specification for Audit
Disclosure Logs for Use in
Health Information Systems
+ FHIR SecurityEvent – resource
definition for auditing
{
"_id" : <BinaryData(4)>, // The audit event GUID
"docHash" : <String; Required>, // Tamper detection
"audOrgGuid" : <BinaryData(4); Required>, // Shard key
"crtdDttmUtc" : <Date; Required>, // Datetime record was inserted
"evnt" : {// Required subdocument
"dttmUtc" : <Date; Required>, // Date/time that event occurred
"typ" : <String; Required>, // Event record type; ~ 300 types
"ptDataTyp" : <String; Required>, // Standard set of patient data types
"actn" : <String; Required>, // Standard set of actions
"sys" : <String; Required> // Source system for audit event
},
"usr" : { // Required subdocument
"usrId" : <String; Required>, // Human-readable ID
"usrGuid" : <BinaryData(4); Required>, // Machine-readable ID
"dispNm" : <String; Required>, // Required; Display name for user
"orgId" : <String; Required>,
"orgNm" : <String; Required>
},
"altUsr" : { // Optional subdocument for second user
... // Subdocument contains same properties as "usr"
},
"pt" : { // Optional subdocument
"ptId" : <String; Required>, // Human-readable ID for patient
"ptPracGuid" : <BinaryData(4); Required>, // Machine-readable ID for patient
"dispNm" : <String; Required>, // Display name for patient
"orgId" : <String; Required>,
"orgNm" : <String; Required>
},
"body" : { // Optional subdocument
... // Flattened list of attributes, specific to audit event subtype
}
}
JSON Document Schema for Audit Events
AuditEvent
ParticipantObject
AuditSystem
User
0..n1..1 1..2
Schema Design – Lessons Learned
+ Prop nms strd per doc Long names add up for large collections (ours: 1 TB)
Consider using abbreviated property names
Up-vote this feature request:
https://jira.mongodb.org/browse/SERVER-863
+ Know your application read/write patterns
+ Application responsible for data integrity
+ Be aware of data type behaviors Indexed string search is case sensitive
Several binary data types for UUID—use type 4
(default type is specific to database driver)Found at: http://www.milesfinchinnovation.com/blog/wp-
content/uploads/2013/02/iStock_000019474446Medium.jpg
Schema Design – Lessons Learned
Leverage native data types:
+ Date
+ Boolean
+ Numeric "1" + "1" "11"
"11" + "1" "111"
+ UUID "8c290139-f4e3-49c1-9ba2-a883defc6a15"
"8C290139-F4E3-49C1-9BA2-A883DEFC6A15"
"8c29-0139-f4e3-49c1-9ba2-a883-defc-6a15"
"8c290139f4e349c19ba2a883defc6a15"
"{8c290139-f4e3-49c1-9ba2-a883defc6a15}"
"{8C290139-F4E3-49C1-9BA2-A883DEFC6A15}"
Found at: http://www.industryweek.com/innovation/innovation-one-size-fits-one
ActivityFeed
Audit EventType
ActivityFeed
Parameter
Action TypePatient
Data Type
(~300)
(~4 billion)
(~30 billion)
(10) (18)
UserPatient
(~100,000)(~90 million)
Practice
(~50,000)
Legacy Auditing System – Relational Schema
Issues around data normalization
+ New requirements introduced
+ Filter criteria and sort criteria
stored in five different tables
+ Audit events must be read into
memory for filtering and sorting
Join and expand data set by practice
Sort and filter expanded data set
+ Response time suffers for large
practices with many audit events
Schema Design – Lessons Learned
ActivityFeed
Audit EventType
ActivityFeed
Parameter
Action TypePatient
Data Type
UserPatient
Practice
Denormalize with care:
{
"_id" : <BinaryData(4)>,
"docHash" : <String; Required>,
"audOrgGuid" : <BinaryData(4); Required>,
"crtdDttmUtc" : <Date; Required>,
"evnt" : {
"dttmUtc" : <Date; Required>,
"typ" : <String; Required>,
"ptDataTyp" : <String; Required>,
"actn" : <String; Required>,
"sys" : <String; Required>
},
"usr" : {
"usrId" : <String; Required>,
"usrGuid" : <BinaryData(4); Required>,
"dispNm" : <String; Required>,
"orgId" : <String; Required>,
"orgNm" : <String; Required>
},
"pt" : {
"ptId" : <String; Required>,
"ptPracGuid" : <BinaryData(4); Required>,
"dispNm" : <String; Required>,
"orgId" : <String; Required>,
"orgNm" : <String; Required>
},
"body" : { ... }
}
+ Millions of events per owning organization
+ Quick UI Response Time for Interactive Audit Reports
+ Audit report UI allows events to be sorted/filtered five different ways
+ UI allows paging through audit event
+ Create a secondary index for each sort method
Index Design
+ Organization, event date DESCdb.auditEvent.ensureIndex ( {"audOrgGuid": 1, "evnt.dttmUtc": -1} );
+ Organization, patient, event date DESCdb.auditEvent.ensureIndex ( {"audOrgGuid": 1, "pt.ptId": 1, "evnt.dttmUtc": -1 } );
+ Organization, user, event date DESCdb.auditEvent.ensureIndex ( {"audOrgGuid": 1, "usr.usrId": 1, "evnt.dttmUtc": -1 } );
+ Organization, patient data type, event date DESCdb.auditEvent.ensureIndex ( {"audOrgGuid": 1, "evnt.ptDataTyp": 1, "evnt.dttmUtc": -1
} );
+ Organization, user action type, event date DESCdb.auditEvent.ensureIndex ( {"audOrgGuid": 1, "evnt.actn": 1, "evnt.dttmUtc": -1} );
+ Document created date DESCdb.auditEvent.ensureIndex ( {"crtdDttmUtc": -1 } );
Index Definitions
+ Filter by practice GUID
+ Sort by event created date time, descending order
+ Limit to 20 documents
db.auditEvent.find( {"audOrgGuid": BinData(4,"ABrlAG57Rx6gY3zyHzFK3Q==")} )
.sort( {"evnt.dttmUtc" : -1} ).limit(20).explain();
{
"clusteredType" : "ParallelSort",
"shards" : {
"RepSet02/MNGODDB03-SHRD02:27018, MNGODDB04-SHRD02:27018" : [
{
"cursor" : "BtreeCursor auditEvent_audOrgGuid_dttmUtc",
...
} ] }
...
"numshards" : 1,
...
Query Plan
Indexing Strategy – Lessons Learned
+ As with relational databases,
indexes are essential for efficient
queries
+ Learn how to use .explain()
to read query plans
+ Avoid collection scans:"cursor" : "BasicCursor"
+ For compound indexes, query
sort order must match index sort
orderFound at: http://www.ebay.com/itm/13-pc-Hex-Shank-Titanium-Drill-Bit-Set-Quick-Change-
Bits-/350526103504?pt=LH_DefaultDomain_0&hash=item519cfbdfd0
Principle of least privilege
+ MongoDB cluster not accessible from public Internet
+ Security enabled on cluster
+ Application users granted minimum permissions required
Signed audit events
+ Audit events signed with hash of audit event contents
+ Recompute hash on reads—test the data against hash value
+ Send security alert when hash does not match
Oplog monitoring
+ Use mongo-connector Python scripts to monitor oplog
+ Watch for .update() and .delete() operations on collection
+ Send security alert when data changes are detected
Tamper Prevention and Detection
Found at:http://legacymedia.localworld.co.uk/275663/Article/images/17639732/4416792.jpg
Security – Lessons Learned
+ Minimize network access to
MongoDB cluster
+ Enable authentication
+ Leverage role-based
authorization
+ Use SSL (MongoDB Enterprise)
+ Disable REST interface and
HTTP status interface
Found at: http://www.harborfreight.com/3-1-2-half-inch-circular-padlock-98972.html
+ Shard the database to scale out
+ Begin with small number of shards (2 or 3)
+ Group all audit events from the same medical practice
Every audit event is “owned” by some practice
Audit report UI always queries events by medical practice
+ Composite shard key on { PracticeGuid, _id }db.runCommand({
shardcollection : "AuditLog.auditEvent",
key: {audOrgGuid: 1,
_id: 1}});
Transaction Volume: 1,000 New Documents per Second
Found at:http://s3.amazonaws.com/Reconsales/800/0bfe72e0-9b06-42ac-9644-5727a3ca9c79.jpg
Sharding the Database – Lessons Learned
+ At the onset of development
determine whether to shard
+ Specify shard key in queries Allows mongos to route query
Minimize distributed “scatter/gather” queries
Queries spanning chunks likely span shards
+ Choose a key that allows even
balancing Balancing is performed in 32 MB chunks
Design shard key to ensure chunks will not
exceed 32 MB
Found at: http://www.airbrushaction.com/content/sites/default/files/tipstricks-images/4_27.png
High Availability and Disaster Recovery – Replica Sets
+ If audit log is down, then 100,000
health care providers are idle
+ Audit logging subsystem must be
more reliable than customer EHR
+ Node failover must be automatic
+ Protect against network and data
center failure scenarios
Found at: http://www.huntsmart.com/App_Themes/hs.com/ProductImages/250/DNSBC.jpg
Disaster Recovery DCPrimary DC DC2 AZ2
Sharded Cluster Replicated Across Multiple Data Centers
config
mongos shard 2
arbitermongos
amq
arbiter
amq
DC3 AZ1
shard 2
DC2 AZ1
shard 2
mongos shard 3
arbitermongos
arbiter
shard 3shard 3
mongos shard 1
arbitermongos
arbiter
shard 1shard 1
config config
amq amq
Performance and Stress Testing – Lessons Learned
+ Acquire or build load testing tools
+ Test using a realistic, unbiased data set
+ Test database cluster to ensure write
throughput
+ Ensure read & write performance meets
load requirements
+ Find the performance ceiling
+ Find and resolve bottlenecks
+ Tune IO and memory
Found at: http://www.webdesign.org/img_articles/21892/broken_chain.jpg
Data Migration – Lessons Learned
Data Migration
+ Parallelize data migration process
+ Identify and remove bottlenecks
+ Scale out MongoDB cluster to handle
heavy write load
+ Determine whether best to add
indexes before or after migration
+ It takes a while to extract, transform,
and load billions of documentsFound at: http://www.dennissy.com/wp-content/uploads/2010/07/house_moving_malaysia.jpg
Choosing the Appropriate Data Store
MongoDB over relational?
+ Scale out for transaction volume
and data volume
+ Highly varying document
structure
+ Developer productivityEasy map between application and data store
+ Offload read activity in optimized
format different from data writes(a.k.a. CQRS pattern)
Found at: http://www.meonuk.com/hammers-mauls
Choosing the Appropriate Data Store
Relational over MongoDB?
+ Complex normalized data model
+ Diverse read patterns requiring
joins
+ Ad hoc reporting and analysis
+ Data integrity difficult to manage
in application layerFound at:
http://3.bp.blogspot.com/_QUmmdgc7l6A/TTPUyRWFNPI/AAAAAAAAAO8/KV_i2c2lrRk/s1600/saws+various.jpg
MongoDB @ Practice Fusion
Upcoming MongoDB projects
+ Read cache for patient medical
records
+ Online patient intake process
+ Ad campaign segmentation
+ Scale-out data store for
patient clinical observationsFound at: http://jbirdmedia.org/vessels/images/uploads/framing-new-const-lg.jpg