nosql for the sql server dba
DESCRIPTION
Slides from my talk at SQLSaturday 120 in Huntington Beach, CA in March 2012TRANSCRIPT
noSQL for the DBA
Lynn LangitPractioner, Author, Instructor
March 2012- for SQL Saturday SoCal
BigData = ‘Next State’ Questions
• What could happen?• Why didn’t this happen?• When will the next new thing
happen?• What will the next new thing be?• What happens?
Collecting Behavioral
data
BigData = Exponentially More Data• Retail Example -> ‘Feedback Economy’– Number of transactions– Number of behaviors (collected every minute)
12:00 12:30 1:00 1:30 2:00 2:300
500
1000
1500
2000
2500
PurchasesLocationsPhone data
So Why Change?
Hitting (Relational) Walls• For Writes– Scale (partition /
shard)– Speed (latency)
• For Reads– Failures
(availability)
Is NoSQL just Hadoop?
• HUGE Hype factor in 2011 / 2012
Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license• enables applications to work with thousands of nodes and petabytes of data• was inspired by Google's MapReduce and Google File System (GFS) papers
Working with HadoopCommon Tools / Languages• Java (JDK) / Eclipse• MapReduce
• Map (query/format)• Reduce (aggregate)• plug-in for Eclipse (Java)
• Pig (ETL -- Java)• Hive (HQL Query)
• HBase tables• Others
• Mahout (analyze)• Karmasphere (analyze)• R (analyze)
Oracle Loader for Hadoop
SQL Server Connector for Hadoop
Demo -Hadoop on Azure – Cluster Allocation
The reality…two pivots
Storage Methods• SQL (RDBMS) • noSQL
Storage Locations• On premises • Cloud-hosted
So many NoSQL options
• More than just the Elephant in the room• Over 120+ types of noSQL databases
Flavors of noSQL
Graph DatabaseUse for data with
– a lot of many-to-many relationships– recursive self-joins – when your primary objective is quickly
finding connections, patterns and relationships between the objects within lots of data
– Examples: Neo4J, FreeBase (Google)
Column Database
• Wide, sparse column sets• Examples:– Cassandra– HBase– BigTable– GAE HR DS– Azure Tables
Demo - Document Database (Mongo DB)
• Use for data that is – document-oriented (collection of JSON
documents) w/semi structured data• Encodings include XML, YAML, JSON & BSON
– binary forms • PDF, Microsoft Office documents -- Word,
Excel…)
– Examples: MongoDB, CouchDB
Key / Value Database• Schema-less• State (Persistent or Volatile)• Examples– AWS Dynamo DB– Project Voldemort
So which type of NoSQL? Back to CAP…
Consistency
AvailabilityPartitioning
CP = noSQL/columnHadoopBig TableH-baseMemCacheDB(graph)?
CA = SQL/RDBMSSQL Sever / SQL AzureOracleMySQL
AP = noSQL/document or key/valueDynamoDBCouchDBCassandraVoldemort
Example Comparison: RDBMS vs. Hadoop
Traditional RDBMS Hadoop
Data Size Gigabytes (Terabytes) Petabytes (Hexabytes)
Access Interactive and Batch Batch – NOT Interactive
Updates Read / Write many times Write once, Read many times
Structure Static Schema Dynamic Schema
Integrity High (ACID) Low
Scaling Nonlinear Linear
Query Response Time
Can be near immediate Has latency (due to batch processing)
Real-World Examples – not only SQL
• Facebook runs on Hadoop & MySQL• Twitter runs on Hadoop(ran on FlockDb/graph)• Yahoo runs on Hadoop• LinkedIn runs on Hadoop & Voldemort• Klout runs Hadoop (on Azure) &HBase (Hive) &
SQL Server SSAS BISM cubes
What about the cloud?
Cloud-hosted NoSQL up to 50x CHEAPER
NoSQL (Cloud) BLOB Storage Buckets• Amazon – S3– The gold standard
• Google – Cloud Storage– Free for developers
• Microsoft Azure BLOBS• DropBox, Box…
Cloud-hosted RDBMS• AWS RDS – mySQL, Oracle
– Medium cost– Solid feature set, i.e.
backup, snapshot• Google – mySQL
– Lowest cost– Most limited RDBMS
functionality• Microsoft – SQLAzure
– Best tooling integration– Highest cost
Other types of cloud data services
Hosting public datasets• Pay to read• Earn revenue by offering for read
Cleaning / matching (your) data • ETL – Microsoft Data Explorer, Google Refine• Data Quality – Windows Azure Data Market,
InfoChimps, DataMarket.com
Cloud – RDBMS AND NoSQL
AWS Google Microsoft Others
Cloud RDBMS Oracle / mySQL mySQL SQL Azure Hosted RDBMS on Rackspace
noSQL buckets S3 Cloud Storage HDFS on Azure
NoSQL databases
DynamoDB H/R Datastore on GAE
Azure Tables Heroku
Streaming Machine Learning
Custom EC2 Prospective Search &Prediction API
StreamInsight & Mahout with Hadoop
Document or Graph
MongoDB on EC2
Freebase (g) MongoDB on Windows Azure
Cassandra on Rackspace
Hadoop Elastic MapReduce on S3 & EC2
Big Query (HBase-like)
Hadoop on Azure
Data sets & other
Karmasphere Translation APIFull-text search
Azure DataMarket
Database.com
Pick your mix and then…
NoSQL
• Host locally• Host in the
Cloud
RDBMS
• Host locally• Host in the
Cloud
Other Services
• Use Cloud Data Markets
• Use Cloud ETL
What about me?
Common DBA Tasks in NoSQLRDBMS NoSQLImport Data Import DataSetup Security Setup SecurityPerform a Backup Make a copy of the dataRestore a Database Move a copy to a locationCreate an Index Create an IndexJoin Tables Together Run MapReduceSchedule a Job Schedule a (Cron) JobRun Database Maintenance Monitor space and resources used
Send an Email from SQL Server Set up resource threshold alerts
Search BOL Interpret Documentation
Demo - HadoopOnAzure – Part 2
• Show MapReduce Job• Show JS / Hive consoles
Making Sense – Asking Questions
Data Scientists…
Com
parin
g…
Karmasphere Studio for AWS
Hadoop Connector to Excel - Demo
NoSQL To-Do List
Understand CAP & types of NoSQL databases• Use NoSQL when business needs designate• Use the right type of NoSQL for your business problem
Try out NoSQL on the cloud• Quick and cheap for behavioral data• Mashup cloud datasets• Good for specialized use cases, i.e. dev, test , training environments
Learn noSQL access technologies• New query languages, i.e. MapReduce, R, Infer.NET • New query tools (vendor-specific) – Google Refine, Amazon
Karmasphere, Microsoft Excel connectors, etc…
The Changing Data Landscape
NoSQLRDBMS
OtherServices
www.TeachingKidsProgramming.org• Free Courseware ( • Do a Recipe Teach a Kid (Ages 10 ++)• Java or Microsoft SmallBasic
• recipes)
Toward Data Craftsmanship…
Follow me @LynnLangit
RSS my blog www.LynnLangit.com
Hire me• To help build your BI/Big Data solution• To teach your team next gen BI• To learn more about using NoSQL solutions