how thermo fisher is reducing mass spectrometry experiment times from days to minutes w/ mongodb...
TRANSCRIPT
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
DAT204
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to
Minutes with MongoDB & AWS
World leader in serving scienceRevenues of $17 billion50,000 employees 50 countries
A Mass Spectrometer tells you…
What’s in there and how much
Making the world cleaner and safer
Mars Organic Molecule Analyzer (MOMA) will take a modified Thermo Linear Ion Trap Mass Spectrometer to Mars in 2020
What beer looks like in a mass spec
Demo
Instrument
MongoDB
MS Instrument Connect
Demo: instrument connect
Demo: remote monitoring a mass spectrometer
Why does Thermo use MongoDB?
ThermoFisher apps using MongoDB
XML MongoDB
Starting on MongoDBOracle MongoDB
SQL Lite MongoDB
Postgres MongoDB
Amazon DynamoDB MongoDB Atlas
Scientific apps = humongous data
Big molecules = big data
instrument { UserId : "[email protected]", MachineName : "TRACEFINDER8", Location : "Austin", AcquisitionStationName : "TSQ 8000", LastErrorEventDate : "2016-09-05", LastErrorEventValue : null, RuntimeEstimate : { MeasuredElaspedDuration : 0.21966, Confidence : HighConfidence }, RunManagerStatus : { Status : "Acquire", Sequence : "Testosterone", SampleName : "Drugx", VialPosition : "1", Rawfile : "2pg_161029205505", Instmethod : "1x.meth", Instrument : "TSQ 8000", IsPaused : false, Operator : "Fred", }}
Why MongoDB was chosen
• Performance• Developer productivity• Cost effective• Runs anywhere• Rich feature set• Achieved legal and regulatory approval
MongoDB is a Swiss army knife
• Hierarchical data• Relational data • Queues• File storage• Device state
Amazon SQSAmazon S3Amazon IoT
Join example
• Version 3.2 introduced the $lookup operator
• SQL query
• MongoDB C# driver query
MongoDB has caught up to relational DBs
Notably, we show that the MUPG (match, unwind, project, group) fragment is already at least as expressive as full relational algebra over (the relational view of) a single collection, and in particular able to express arbitrary joins.
– Bolzano University in Italy
“”
Hash-Based ShardingRolesKerberosOn-Prem Monitoring
2.4GA 2013
2.6GA 2014
3.0GA 2015
3.2GA 2015
Headline Features by Release
$outIndex IntersectionText SearchField-Level RedactionLDAP & x509Auditing
Document Validation$lookupFast FailoverSimpler ScalabilityAggregation ++Encryption At RestIn-Memory Storage EngineBI ConnectorMongoDB CompassAPM IntegrationProfiler VisualizationAuto Index BuildsBackups to File System
Doc-Level ConcurrencyCompressionStorage Engine API≤50 replicasAuditing ++Ops Manager
Linearizable readsIntra-cluster compressionViewsLog RedactionGraph ProcessingDecimalCollations Faceted NavigationSpark Connector ++Zones ++Aggregation ++Auto-balancing ++ARM, Power, zSeriesBI Connector ++Compass ++Hardware MonitoringServer PoolLDAP AuthorizationEncrypted BackupsCloud Foundry Integration
3.4GA 2016Atlas
The evolution of MongoDB
1.02009
MySQL vs. MongoDB
Database schema
MySQL schema
MongoDB schema
Inserting data: MongoDB vs. MySQL
• Inserting 1,615 chemical compound records into two parent-child tables.• To optimize the MySQL query, we turned off foreign keys during insert and
used a string builder to create a bulk insert SQL statement. This improved insert performance by a factor of 360.
• Compare to MongoDB.
Database Milliseconds Lines of codeMySQL not optimized 147,600 (2.5 minutes) 21MySQL optimized 410 40MongoDB 68 1
Inserting data: MongoDB vs. MySQL
Selecting data: MongoDB vs. MySQL
• Query 600,000 rows of SampleCompound result data• To optimize the MySQL select query, we created a dictionary to lookup child
records for each parent, this improved performance by a factor of 300, optimization effort: 2 engineers and 2 weeks.
Database Seconds Lines of codeMySQL not optimized 2,400 (4.1 minutes) 20MySQL optimized 8.2 29MongoDB 17.5 7
Update: MongoDB vs. MySQL
Migrating to MongoDB reduced code by 3.5x
SQLite MongoDBData Layer Lines of Code 4271 1260
MongoDB compared to DynamoDB
MongoDB DynamoDBAnywhere AWSRich Ad-hoc Query Language + IDE No Ad-hoc query languageMany operators (Joins, Aggregation, etc.) Fewer operatorsExcellent Performance Excellent PerformanceEasy to deploy (with Atlas) Easy to Deploy each tableAdding tables requires no configuration changes
Adding tables requires additional configuration and cost
Easy to use from AWS services but not natively integrated
Native integration with AWS Services: IAM, VPC, Lambda, Kinesis
Released in 2009 Released in 2012
MongoDB vs. S3 performance
Download 220 KB object from MongoDB was 7x faster cold, and 3x faster when warm
MongoDB Amazon S3Retrieve document first time 68 ms 468 ms
Retrieve document second time 13 ms 38 ms
MongoDB vs. S3 performance
MongoDB 11x faster than S3 in the use case of partial document loading
MongoDB S3
Data size 400 Bytes 2.1 MB
Performance 19 ms 214 ms
Reducing processing from days to minutes
Frameworks used to parallelize algorithms
• AWS Lambda• Docker and Amazon ECS• Spark and Elastic Map Reduce
Parallel data processing
Why Atlas?
• Easy• Performant • Seamless Migration• Robust• No downtime, even when scaling up
Building MongoDB Atlas on Amazon Web Services
Operations burden
PATCHES
UPGRADES
SECURITY
BACKUPS
RECOVERY
99.999% UPTIME
UPSCALE
DOWNSCALE
PERFORMANCE
UAT
STAGING
MONITORING
ALERTS
PROVISION
CONFIGURE
INSTALL
Automated Available On-Demand
Secure Highly Available Automated Backups
Elastically Scalable
Database as a service for MongoDB
Fully managed MongoDB clusters
Customer only needs to choose the shape and size of the cluster
● Instance size (CPU and RAM)
● Replication factor
● Number of shards
● Disk space
● Disk speed
Screenshot of create dialog
Cluster features
VPC peering
IP address whitelist
SCRAM-SHA-1 authentication
readWriteAnyDatabase
enableSharding
clusterMonitorSSL
Using well-known CATrust system CAs by default
Security features
Backup AutomationMonitoring
Key components
AWS Account X—Region Y
VPC (Customer N)
Availability Zone A
Availability Zone B
Availability Zone C
Subnet A Subnet B Subnet C
mongod—27017
mongod—27017
mongod—27017
Customer container with replica set
AWS Account X—Region Y
VPC (Customer N)
Availability Zone A
Availability Zone B
Availability Zone C
Subnet A Subnet B Subnet C
Customer container with sharded cluster
shard0
S
shard1
S
shard2 config
shard0
S
shard1
S
shard2 config
shard0
S
shard1
S
shard2 config
mongod—27017
mongod—27017
mongod—27017
One security group per VPC applied to all Amazon EC2 instances
Three classes of security rules:
● MongoDB traffic between cluster members
● MongoDB traffic between application and clusters
● SSH traffic between production support jump box and EC2 instance
App Server Jump Box
IP firewall using security groups
173.31.248.0/21
10.0.0.0/16
VPC peering
Your VPC
Elastic LB
CIDR Block: 10.0.0.0/16
Atlas VPC
AZ 1 AZ 2 AZ 3
CIDR Block: 172.31.248.0/21
We want prime to be such a good value, you’d be irresponsible not to be a member.—Jeff Bezos
“”
Questions?
Thank you!
Remember to complete your evaluations!