mapr-db – the first in-hadoop document database
TRANSCRIPT
© 2015 MapR Technologies 1© 2015 MapR Technologies
MapR-DB: New Options For Creating Breakthrough Next Gen Apps with NoSQL And Hadoop
© 2015 MapR Technologies 2
NoSQL Was Designed For Big Data• RDBMSs has been the default
choice for applications– But face cost/time challenges for
rapidly growing, varying data sets
• NoSQL was designed for big data– E.g., User transaction data, sensor
data, IoT data, time series data, etc.
RDBMS
NoSQL
© 2015 MapR Technologies 3
Known NoSQL Database Challenges Today With Other NoSQL Databases
• Data loss• Data inconsistency• Long maintenance downtime (e.g.,
compactions, anti-entropy)• Coarse grained access controls
X
• Cluster/silo sprawl– Maintenance pains– Complexity, more error prone
• Constant data movement between database and analytics cluster
– Excessive bandwidth utilization– Delays in accessing data
• Modeling of complex data– Longer app development cycles– Higher chance of coding errors
• Multiple databases for multiple kinds of applications
© 2015 MapR Technologies 4
Requirements to Resolve Today’s Challenges• Tighter Hadoop integration
– Reduce cluster sprawl– Reduce data movement– Enable real-time analytics on live data– Lower administrative overhead
• Flexible JSON data model
• Automatic optimizations– Less maintenance downtime– Consistent high performance
• Fine grained access controls– More than simply table/document level
• Globally consistent deployment capability
Hadoop NoSQL
Data Platform
© 2015 MapR Technologies 5
MapR-DB Architectural Principles Dramatically Simpler, High-Performance at Global Scale
• Self-healing from HW and SW failures– Replicated state and data for instant recovery– Automated re-replication of data
• High performance and low latency– Integrated system with fewer software layers– Single hop to data– No compactions, low i/o amplification (patented secret sauce)
• Minimal administration– Single namespace for files and tables (and streams going forward)– Built-in data management & protection– Automatic splits and merges as data grows and shrinks
• Global low-latency replication for disaster recovery
© 2015 MapR Technologies 6
Built-into Hadoop = Real-time
Hadoop NoSQL
Churn Analysis Offers Fraud
DetectionCustomer Profiles Log files IoT Data
Batch Copies
Analytical Operational
MapR Distribution
Churn Analysis Offers Fraud
DetectionCustomer Profiles Log files IoT Data
Analytical + Operational
Analytics as it happens, no cross-cluster copying
Hadoop MapR-DB
Non-MapR:• Batch-only• Cluster sprawl
With MapR:• Real-time data access• Multi-use-case platform
Revenue Optimization
Predictive Analytics
Sentiment analysis
Click streams Call logs Social
media
© 2015 MapR Technologies 7
Real-Time Integration with Other Systems
MapR-DB replication engine is extensible for integration with any external systems
MapR-DB
Streaming
Real-Time Reliable Transport
Storm
Elasticsearch
Remote MapR-DB Tables
Future
© 2015 MapR Technologies 8
Designed For Global deployments
Multi-master (aka, active/active) replication
Active Read/Write
End Users
• Faster data access – minimize network
latency on global data with local clusters
• Reduced risk of data loss – real-time,
bi-directional replication for
synchronized data across active clusters
• Application failover – upon any cluster
failure, applications continue via
redirection to another cluster
© 2015 MapR Technologies 9
Real-Time Analytics With HadoopDistributed clusters close to the end users, with real-time analytics at central cluster
MapR-DB cluster(London)
MapR-DB cluster(New York)
MapR-DB cluster(Singapore)
MapR-DB/Hadoopcluster
Hadoop analytics
Operational and analytical workloadscombined in a single cluster in in a single datacenter
Operationally efficient, consolidated MapR cluster
Database operations
Hadoop analytics
Active Read/Write
End Users
© 2015 MapR Technologies 10
Granular SecurityUse Access Control Expressions (ACEs) to set granular permissions.
Example: user:mary | (group:admins & group:VP) & user:!bob
© 2015 MapR Technologies 11
Open Source OJAI API for JSON-Based Applications on Hadoop
Open JSON Application Interface (OJAI)
Databases Other Systems
MapR-DB
MapR-Client
{JSON}
File Systems
© 2015 MapR Technologies 12
Single Cluster Data Lake Capabilities
Paste your MapR distribution for Hadoop diagram from Part A, (slide 2) hereMapR-
DBMapR-FS
MapR Data PlatformDistribution including Apache Hadoop
MapR-DB: relational, time series,
structured data
MapR-FS: emails, blogs, tweets, log files, unstructured
data
Agile, self-service data exploration
ETL into operational reporting formats
(e.g., Parquet)
Multi-tenancy: job/data placement control, volumes
Access controls: file, table, column, column family, doc,
sub-doc levels
SourcesRELATIONAL, SAAS, MAINFRAME
DOCUMENTS, EMAILS
LOG FILES, CLICKSTREAMSENSORS
BLOGS, TWEETS,LINK DATA
DATA WAREHOUSES, DATA MARTS
Auditing: compliance, analyze
user accesses
Snapshots:track data lineage
and history
Table Replication: global multi-master, business continuity
© 2015 MapR Technologies 13
Q & A@mapr maprtech
@mapr.com
Engage with us!
MapR
maprtech
mapr-technologies