dan mccreary, kelly-mccreary & associates · 2014-02-05 · making sense of nosql dan mccreary,...

49
Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February 3rd, 2014 6:00pm to 8:30pm

Upload: others

Post on 22-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates

  Minnesota Web Design Community

MeetupMonday, February 3rd, 2014

6:00pm to 8:30pm

Page 2: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Food for thought…• What percentage of database transactions run on RDBMSs in the

following organizations?

2

• What percentage of all transactions in Minnesota run on RDBMSs?• Why is this number different?, Is our data fundamentally different?• If you want to start a new company, what database should you choose?

Page 3: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Background for Dan McCreary

• Co-founder of the NoSQL Now! conference• Background

– Bell Labs– NeXT Computer (Steve Jobs)– Owner of 75-person software consulting firm– US Federal data integration (National Information

Exchange Model NIEM.gov)– Native XML/XQuery for metadata management since

2006– Advocate of web standards, NoSQL and XRX systems

3Copyright Kelly-McCreary & Associates, LLC

Page 4: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Making Sense of NoSQL

• Author (with Ann Kelly) of "Making Sense of NoSQL"• Guide for managers and architects• Focus on NoSQL architectural tradeoff analysis• Basis for 40 hour course on database architectures• http://manning.com/mccreary

4

Page 5: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Kelly-McCreary & Associates 5

2006

250 Elements45 SQL Inserts

Page 6: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

6

XForms from XML Schema!

Page 7: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Kelly-McCreary & Associates 7

Page 8: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Four Translations

8

• T1 – HTML into Java Objects• T2 – Java Objects into SQL Tables• T3 – Tables into Objects• T4 – Objects into HTML

T1

T4

T2

T3

Object MiddleTier

RelationalDatabaseWeb Browser

Page 9: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Kelly-McCreary & Associates

Kurt's Suggestion

store($collection, $file-name, $data)

9

Web Browser

Save

Web Form

Use a Native XML Database!

Kurt Cagle

eXist-db

Equivalent of 45 SQL inserts in 1 line of code!

Page 10: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Kelly-McCreary & Associates

Zero Translation

• XML lives in the web browser (XForms)• REST interfaces• XML in the database (Native XML, XQuery)• XRX Web Application Architecture• No "impedance mismatch", No translation!• Department tried it and then went back to HTML, Java and SQL• …but I was forever changed

10

Web Browser XML database

XForms

Page 11: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

NoSQL on Google Trends

11

RDBMS NoSQL

http://www.google.com/trends/explore?q=NoSQL%2C+RDBMS#q=NoSQL%2C%20RDBMS&cmpt=q

Kelly-McCreary & Associates, LLC

Page 12: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Before NoSQL

12

Relational Analytical (OLAP)

Copyright Kelly-McCreary & Associates, LLC

Page 13: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

After NoSQL

13

Relational Analytical (OLAP) Key-Value

Column-Family DocumentGraph

key value

key value

key value

key value

Copyright Kelly-McCreary & Associates, LLC

Page 14: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

The NO-SQL UniverseDocument StoresKey-Value Stores

Graph/Triple Stores

Object StoresColumn-Family Stores

XML

14

Page 15: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

NoSQL – The Big Tent

• "NoSQL" – a label for a "meme" that now encompasses a large body of innovative ideas on data management

• "Not Only SQL"• Focus on non-relational

databases and hybrids• A community where new ideas

are quickly recombined to create innovative new business solutions

15

http://www.flickr.com/photos/morgennebel/2933723145/

Copyright Kelly-McCreary & Associates, LLC

Page 16: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Copyright Kelly-McCreary & Associates

Historical Context

16

Mainframe Era Commodity Processors

• 10,000 CPUs• Functional programming• MapReduce "farms"• Pennies per CPU hour

• 1 CPU• COBOL and FORTRAN• Punchcards and flat files• $10,000 per CPU hour

Page 17: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Copyright Kelly-McCreary & Associates

Two Approaches to Computation

17

Alonzo ChurchJohn von Neumann

Manage state with a program counter. Make computations act like math functions.

Which is simpler? Which is cheaper? Which will scale to 10,000 CPUs?

1930s and 40s

Page 18: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Standard vs. MapReduce Prices

18

http://aws.amazon.com/elasticmapreduce/#pricing

John's Way Alonzo's Way

Copyright Kelly-McCreary & Associates, LLC

Page 19: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

MapReduce CPUs Cost Less!

0

3

6

9

12

Cost Per CPU Hour (Cents)

19

http://aws.amazon.com/elasticmapreduce/#pricing

Cut cost from 12 to 3 cents per CPU hour!Perhaps Alonzo was right!

75% CostReduction!

Why? (hint: how "shareable" is this process)

EC2 MapReduce

Copyright Kelly-McCreary & Associates, LLC

Page 20: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Key Events on NoSQL Timeline

20

1998

Carlo StrozziIntros NoSQL

2006

GoogleBigtable

2009

Eric EvansReintroduces Term

2005

GoogleMapReduce

XQuery 1.0Recommendation

NoSQL MeetupsAmazonDynamo

2007 2008 201020042003

LiveJournal'sMemcache

Open SourceBecomes Popular

MDX

Page 21: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Pressures on Single Node RDBMS Architectures

21

OLAP/BI/DataWarehouse

Social Networks

Scalability

AgileSchema

Free

Document-

Data

Large Data Sets Reliability

Linked Data

Single NodeRDBMS

Page 22: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

An evolving tree of data types

22

Read MostlyRead/Write

StructuredUnstructured

Transactional

RDBMS BI/DWWeb Crawlers

Documents

Log Files

XML

JSON

Binary

Open Linked Data

Graph

Page 23: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Many Uses of Data

23

• Transactions (OLTP)• Analysis (OLAP)• Search and Findability• Enterprise Agility• Discovery and Insight• Speed and Reliability• Consistency and Availability

Page 24: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Three Eras of Enterprise Data

• NoSQL will not replace ERP or BI/DW systems – but they will complement them and also facilitate the integration of unstructured document data

24

1970s, 80s, 90s 1990s, 2000s

HR

Inventory Sales

FinanceERP

BI/DataWarehouse

ERP

BI/DataWarehouse

OLTP

OLAP NoSQL

Today

Siloed Systems

DocumentsDocuments

ERP Drives BI/DW NoSQL and Documents

Copyright Kelly-McCreary & Associates, LLC

Page 25: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Simplicity is a Virtue

• Many modern systems derive their strength by dramatically limiting the features in their system and focus on a specific task

• Simplicity allows database designer to focus on the primary business drivers

• Simplicity promotes "separation of concerns"

25

Photo from flickr by PSNZ Images

Copyright Kelly-McCreary & Associates, LLC

Page 26: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Google MapReduce

• 2004 paper that had huge impact of functional programming on the entire community

• Copied by many organizations, including Yahoo

26

Page 27: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Google Bigtable Paper

• 2006 paper that gave focus to scaleable databases• designed to reliably scale to petabytes of data and thousands of machines

27

Page 28: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Amazon's Dynamo Paper• Werner Vogels• CTO - Amazon.com• October 2, 2007• Used to power Amazon's

Web Sites• One of the most influential

papers in the NoSQL movement

28

Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swami Sivasubramanian, Peter Vosshall and Werner Vogels, “Dynamo: Amazon's Highly Available Key-Value Store”, in the Proceedings of the 21st ACM Symposium on Operating Systems Principles, Stevenson, WA, October 2007.

Page 29: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Copyright Kelly-McCreary & Associates

Scale Up vs. Scale Out

29

Scale Up• Make a single CPU as

fast as possible• Increase clock speed• Add RAM• Make disk I/O go faster

Scale Out• Make Many CPUs work

together• Learn how to divide your

problems into independent threads

Page 30: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Automatic Sharding

• When one node in a cluster has too much of a load

30

Warning processor at 90% capacity!

Time to "Shard" – copy ½ data to a new processorBeforeShard

AfterShard

Each processor gets ½ the load

Original processor New processor

Page 31: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

M

D

Key-Value Stores• Keys used to access opaque

blobs of data• Values can contain any type

of data (images, video)Pros: scalable, simple API (put,

get, delete)Cons: no way to query based

on the content of the value

31

key value

key value

key value

key value

Examples:Berkley DB, Memcache, DynamoDB, S3, Redis, Riak

Page 32: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

The Tunable SLA

• NoSQL systems that use many commodity processors can be precisely tuned to meet an organizations service level agreements

32

Max ReadTime

Max WriteTime

Reads PerSecond

DuplicateCopies

MultipleDatacenters

TransactionGuarantees

Writes PerSecond

$435.50

EstimatedPrice/Month

inputs output

Page 33: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

M

D

Column-Family

• Key includes a row, column family and column name

• Store versioned blobs in one large table

• Queries can be done on rows, column families and column names

• Pros: Good scale out, versioning• Cons: Cannot query blob content,

row and column designs are critical

33

Examples:Cassandra, HBase, Hypertable, Apache Accumulo, Bigtable

Page 34: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Cassendra

• Apache open source project• Originally developed by Facebook• Used as a key-value store and a bigtable store• Now used by Twitter and many others• Written in Java• Similar to Bigtable and DynamoDB• Designed for highly distributed systems• Column-family data model

34

Page 35: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Netflix

35

Page 36: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

M

D

Graph Store

• Data is stored in a series of nodes, relationships and properties

• Queries are really graph traversals• Ideal when relationships between

data is key: – e.g. social networks

• Pros: fast network search, works with public linked data sets

• Cons: Poor scalability when graphs don't fit into RAM, specialized query languages (RDF uses SPARQL)

36

Examples:Neo4j, AllegroGraph, Bigdata triple store, InfiniteGraph, StarDog

Page 37: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Ann Kelly

Dan McCreary

Each color is a different social network with many Interconnections between friends

XML Community

Dataversity Contacts

FinancialContacts

Software Engineers

Ex-NeXT

Page 38: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

M

D

Document Store• Data stored in nested hierarchies• Logical data remains stored

together as a unit• Any item in the document can be

queried• Pros: No object-relational

mapping layer, ideal for search• Cons: Complex to implement,

incompatible with SQL

38

Examples:MarkLogic, MongoDB, Couchbase, CouchDB, RavenDB, eXist-db

Page 39: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Structured Search"Bag of Words"

• All keywords in a single container• Only count frequencies are stored

with each word

"Retained Structure"

• Keywords associated with each sub-document component

• Each node (blue circle) is its own document

39

'love''hate''new'

'fear'

keywords

keywords

keywords

keywords

keywords

keywords

doc-id

Page 40: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

MongoDB

• Originally created by 10gen– Originally New York, now CA

• Open Source License (AGPL)• Document/collection centric• Sharding built-in, automatic• Stores data in binary JSON format BSON• Native Query language is JSON-like• Implemented in C++ with many APIs (C++,

JavaScript, Java, Perl, Python etc.)

40

Page 41: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

MarkLogic• Native XML database designed to scale to

Petabyte data stores• Leverages commodity hardware• ACID compliant, schema-free document

store• Heavy use by federal agencies, document

publishers and "high-variability" data• Arguably the most successful NoSQL

company

41

Page 42: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

eXist• Open source native (LGPL) XML database with strong

support in Europe• Application packaging system (.xar files)• Strong support for XQuery and XQuery extensions• Heavily used by the Text Encoding Initiative (TEI)

community and XRX/XForms communities• Integrated Lucene search• Collection triggers and versioning• Extensive XQuery libraries (EXPath)• Replication

42

Page 43: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

ATAM Process Flow

43

BusinessDrivers

QualityAttributes

UserStories

AnalysisArchitecture

PlanArchitecturalApproaches

ArchitecturalDecisions

Tradeoffs

SensitivityPoints

Non-Risks

RisksRisk ThemesDistilled info

Impacts

Page 44: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Copyright Kelly-McCreary & Associates

Quality Attribute Tree App

44

Page 45: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Copyright Kelly-McCreary & Associates 45

Page 46: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Copyright Kelly-McCreary & Associates

Start Finish

Using the Wrong Architecture

Credit: Isaac Homelund – MN Office of the Revisor

Page 47: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Copyright Kelly-McCreary & Associates

Using the Right Architecture

Start Finish

Find ways to remove barriers and empowerthe non programmers on your team.

Page 48: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Copyright Kelly-McCreary & Associates

Technology Moves Via People

48

Page 49: Dan McCreary, Kelly-McCreary & Associates · 2014-02-05 · Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates Minnesota Web Design Community Meetup Monday, February

Further Reading and QuestionsDan [email protected]

49

http://manning.com/mccreary

@dmccreary

Thank You!