why edp chose mongodb

20
Why EDP chose Artyom Diky William Biesty Mark Velez

Upload: mongodb

Post on 23-Jan-2015

914 views

Category:

Documents


1 download

DESCRIPTION

The DOHMH (NYC Department of Mental Health and Hygiene) uses MongoDB for their internal document management system called DocSpace. This presentation outlines -the system -how they came to adopt MongoDB -migrating from a relational DB to a document-oriented one -the advantages and disadvantages we’ve encountered and how we have managed them -Next steps with MongoDB

TRANSCRIPT

Page 1: Why EDP Chose MongoDB

Why EDP chose

Artyom Diky

William Biesty

Mark Velez

Page 2: Why EDP Chose MongoDB

Agenda

• Who are we?

• Evolution of Document Management

• File system to relational DB

• Relational to document-oriented DB

• Paper to electronic

• Advantages and Challenges

• Questions?

Page 3: Why EDP Chose MongoDB

Who Are We?

• New York City Department of Health and Mental Hygiene

• Environmental Health Services (EHS)

• Environmental Disease Prevention (EDP) • Lead Poisoning Prevention Program (LPPP)

• MIS Unit we are here

• We support many programs within EDP

• Who are our stakeholders? • Inspectors

• Researchers

• Clinical Staff

• Lawyers (FOIL)

Page 4: Why EDP Chose MongoDB

Evolution of Document Management Paper

• A lot of legal documents on paper

• Historic - from the '70s and up

• Current (ongoing)

• Problems with Paper

• Time and Labor Intensive • Locate, Copy, Redact, Copy, Mail (Repeat….)

• Storage Space

• Disaster Recovery

Page 5: Why EDP Chose MongoDB

Evolution of Document Management eFiles

• VB6

• Scanning utilities

• File-system based storage

• Millions of files

• Identifiers based on child ID

Page 6: Why EDP Chose MongoDB

Evolution of Document Management eFiles Issues

• Technical • VB6 phased out

• Outdated 3rd party tools changed API

• License expired

• Security • Documents have been redacted permanently

• No access control to private information

• Scalability • New document types

• New indexing (tagging) mechanisms for search

Page 7: Why EDP Chose MongoDB

Evolution of Document Management

• Need for better document management

• Paperless offices mandate

• Expand searchable attributes and document text

• Update technology

• Improved security

• HIPAA compliance

• Platform for future applications

Page 8: Why EDP Chose MongoDB

File System to Relational DB

• Challenges:

• 1M+ historical documents as image files

• Need for document metadata

• Various and evolving schemas

• Security

• Updates and migration

• Fail-safe storage

Page 9: Why EDP Chose MongoDB

Technologies

• We use Microsoft technologies

• SQL Server

• .NET

• We are a small team that develop and support dozens of data collection apps (forms)

• Risk assessments

• Inspection Reports

• Research

• Case Management

Page 10: Why EDP Chose MongoDB

Example Documents event_date child ID document_type

me_num

Page 11: Why EDP Chose MongoDB

File System to Relational DB

FileStream • MSSQL 2008

o Data storage with FileStream

o Metadata with Entity-Attribute-Value

sql_variant

o Data-driven application design

• Rich service-oriented API through WCF

• Search engine

• Added features

o Versioning

Change and revert

Page 12: Why EDP Chose MongoDB

DocSpace SQL Architecture

Page 13: Why EDP Chose MongoDB

Limitations of Relational Model

• Need faster development cycle

• Double effort for development and maintenance

• On application and database level

• Document definition (metadata) first, content later

• Changing schema

• Rigid document structure • Not amenable to change

• No support for non-primitive values

Page 14: Why EDP Chose MongoDB

Effects on Development Cycle

• SQL Waterfall-like approach

• Fully develop requirements before implementation • Gotta get the schema right to avoid hassle

• Change discouraged

• MongoDB Rapid Application Development

• Prototyping

• Change accommodated

Page 15: Why EDP Chose MongoDB

Document Management System Done Right

• Faster development cycles

• No translation of complex document structure into relational model

• Application driven schema

• Document content first, metadata later

• Flexible document structure driven by user requirements

• GridFS for large documents

Page 16: Why EDP Chose MongoDB

DocSpace MongoDB Architecture

Page 17: Why EDP Chose MongoDB

Case Study - Traffic Fatalities

• A study of traffic-related fatalities in NYC

• Injury Surveillance and Prevention

• Offline data collection

• 330+ data points

• Multiple weekly changes to schema

o Add/remove fields

o Value types

• Developed in 500 hrs (3 months)

• 1 intermediate developer, 1 novice

Page 18: Why EDP Chose MongoDB

Evolving Use of MongoDB

• Single Node with Database Security

• Nightly Dump for Backup Archiving

• Master – Slave Nodes

• Replica Sets – 3 Nodes

• Distributed across Metropolitan Area Network

• Bare Iron Primary, VMware ESX and Hyper-V VM Secondaries

•Hurricane Sandy – No downtime, one node failed

Page 19: Why EDP Chose MongoDB

Thank you!

Questions