nosql, xrx and functional programming version 4.0 dan mccreary monday may 6th, 2013

128
M D NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

Upload: gregory-seyler

Post on 31-Mar-2015

221 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

D

NoSQL, XRX and Functional Programming

Version 4.0

Dan McCreary

Monday May 6th, 2013

Page 2: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DKelly-McCreary & Associates, LLC

2

Topics• NoSQL

– History– Business Motivation

• XRX (XForms, REST, XQuery)• Functional Programming• NoSQL Concepts

– MapReduce/Hadoop– Key-value stores– CAP Theorem– Object-Relational Mapping

• NoSQL Product Classifications– Key-value stores, graph-stores, bigtable stores and document stores

• Sampling of Solutions– Big Data– Search– High availabily– Agility

• Analysis methods (ATAM)

Page 3: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

3

Background for Dan McCreary

• Bell Labs• NeXT Computer (Steve Jobs)• Owner of 75-person software

consulting firm• US Federal data integration (National

Information Exchange Model NIEM.gov)

• Native XML/XQuery for metadata management since 2006

• Advocate of web standards, NoSQL and XRX systems

Page 4: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

4

Making Sense of NoSQL

• Working with NoSQL since 2006• Co-founders of the NoSQL Now! conference• Authors of Manning book on NoSQL (MEAP now, print July 2013)• Guide for managers with a focus on business benefits• Focus on NoSQL architectural tradeoff analysis• http://manning.com/mccreary

Page 5: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DKelly-McCreary & Associates

• How got into NoSQL• Background as of 2006

– Worked for Steve Jobs at NeXT– Object-oriented development Objective-C, Java– Strong focus on object-relational mapping

• DBKit, WebObjects, Hibernate, Oracle, Sybase, MS-SQL Server

– 7 years doing XML (CriMNet), Metadata Registries (ISO/IEC 11179) and Semantics (RDF, OWL etc.)

My Story

5

Page 6: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DKelly-McCreary & Associates

6

Page 7: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DKelly-McCreary & Associates

7

Page 8: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DKelly-McCreary & Associates

8

Page 9: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DKelly-McCreary & Associates

9

XForms Rocks!

Page 10: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DKelly-McCreary & Associates

10

Page 11: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

D

Four Translations

11Copyright 2010 Dan McCreary & Associates

• T1 – HTML into Java Objects• T2 – Java Objects into SQL Tables• T3 – Tables into Objects• T4 – Objects into HTML

T1

T4

T2

T3

Object MiddleTier

RelationalDatabaseWeb Browser

Page 12: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

D

Kurt's Suggestion

store($collection, $file-name, $data)

12Copyright 2010 Dan McCreary & Associates

Web Browser

Save

Web Form

Use a Native XML

Database!

Kurt Cagle

eXist-db

Page 13: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

D

Zero Translation

• XML lives in the web browser (XForms)• REST interfaces• XML in the database (Native XML, XQuery)• XRX Web Application Architecture• No translation!

13Copyright 2010 Dan McCreary & Associates

Web Browser XML database

XFormsModel

REST Interface

Native XML Database

Page 14: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DKelly-McCreary & Associates

14

Database Tradeoff Analysis

My Way• 10,000 lines of code to break

CRV XML document into Java component and use Hibernate to store each element into columns of tables

• 45 SQL inserts store• 20 joins to extract• Six months• Four developers (Java, SQL,

DBA, Project Manager)

Kurt's Way• One line of codes to store

CRB• One day to write• One week to test• 100x increase in agility

Page 15: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DKelly-McCreary & Associates

15

The Paradigm Shift

2006

RDBMSs are the onlyway to store enterprise

data

There are many ways to store enterprise data and if you pick the right one your agility can go

up x1,000

Page 16: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DKelly-McCreary & Associates

16

Reality Sets In

What I wanted• Objective architecture

analysis• Fairly weigh the pros and

cons of each alternative• Be surrounded by people

that know the strengths and weakness of many alternatives

What I got• Architecture decisions

driven by an RDBMS license• Architecture decisions

made by one person with limited exposure to alternatives

• Architecture decisions made by lack of knowledge

• Architecture by fear of the unknown

Page 17: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DKelly-McCreary & Associates

Evaluating Software Architectures: Methods and Case Studies

by Paul Clements, Rick Kazman, and Mark Klein

• Addison-Wesley, 2001

Key Book for ATAM

17

We need "Evaluating Database Architectures"

Page 18: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

18

Three Eras of Databases

• RDBMS for transactions, Data Warehouse for analytics and NoSQL for scalability

RDBMSRDBMS

DataWarehouse

1985-1995

1995-2010 2010-Now

DataWarehouseRDBMS

NoSQL

Page 19: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

19

Sample of NoSQL Jargon

Document orientation

Schema free

MapReduce

Horizontal scaling

Sharding and auto-sharding

Brewer's CAP Theorem

Consistency

Reliability

Partition tolerance

Single-point-of-failure

Object-Relational mapping

Key-value stores

Column stores

Document stores

Memcached

Indexing

B-Tree

Configurable durability

Documents for archives

Functional programming

Document Transformation

Document Indexing and Search

Alternate Query Languages

Aggregates

OLAP

XQuery

MDX

RDF

SPARQL

Architecture Tradeoff Modeling

ATAM

Note that within the context of NoSQL many of these terms have different meanings!

Page 20: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

20

NoSQL – The Big Tent

• "NoSQL" – a label for a "meme" that now encompasses a large body of innovative ideas on data management

• "Not Only SQL"• Focus on non-relational

databases and hybrids• A community were new ideas

are quickly recombined to create innovative new business solutions

http://www.flickr.com/photos/morgennebel/2933723145/

Page 21: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

21

Pressures on SQL Only Systems

SQLOLAP/BI/DataWarehouse

Social Networks

Scalability

AgileSchema

FreeDocument-D

ata

Large Data Sets Reliability

Linked Data

Page 22: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

22

Before NoSQL DB Selection Was Easy!

Does it look like

document?

Use MicrosoftOffice

Use theRDBMS

Start

Stop

No

Yes

Page 23: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

23

An evolving tree of data types

Read MostlyRead/Write

StructuredUnstructured

Transactional

RDBMS BI/DWWeb Crawlers

Documents

Log Files

XML

JSON

Binary

Open Linked Data

Graph

Page 24: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

24

Many Uses of Data

• Transactions (OLTP)• Analysis (OLAP)• Search and Findability• Enterprise Agility• Discovery and Insight• Speed and Reliability• Consistency and Availability

Page 25: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

D

Before NoSQL

Relational Analytical (OLAP)

25

Page 26: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

D

After NoSQL

Relational Analytical (OLAP) Key-Value

Column-Family DocumentGraph

key value

key value

key value

key value

26

Page 27: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

27

Three Eras of Enterprise Data

• NoSQL will not replace ERP or BI/DW systems – but they will complement them and also facilitate the integration of unstructured document data

1970s, 80s, 90s 1990s, 2000s

HR

Inventory Sales

FinanceERP

BI/DataWarehouse

ERP

BI/DataWarehouse

OLTP

OLAP NoSQL

Today

Siloed Systems

DocumentsDocuments

ERP Drives BI/DW NoSQL and Documents

Page 28: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

28

Simplicity is a Virtue

• Many modern systems derive their strength by dramatically limiting the features in their system and focus on a specific task

• Simplicity allows database designer to focus on the primary business drivers

Photo from flickr by PSNZ Images

Page 29: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

29

Simplicity is a Design Style

• Focus only on simple systems that solve many problems in a flexible way

• Examples:– Touch screen interfaces– Key/Value data stores

Page 30: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

30

Historical Context

Mainframe Era

• 1 CPU• COBOL and FORTRAN• Punchcards and flat files• $10,000 per CPU hour

Commodity Processors

• 10,000 CPUs• Functional programming• MapReduce "farms"• Pennies per CPU hour

Page 31: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

D

Two Approaches to Computation

31Copyright 2010 Dan McCreary & Associates

Alonzo ChurchJohn von Neumann

Manage state with a program counter. Make computations act like math functions.

Which is simpler? Which is cheaper? Which will scale to 10,000 CPUs?

1930s and 40s

Page 32: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

32

Standard vs. MapReduce Prices

http://aws.amazon.com/elasticmapreduce/#pricing

John's Way Alonzo's Way

Page 33: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

33

MapReduce CPUs Cost Less!

0

5

10

15

20

25

30

35

40Cost Per CPU Hour (Cents)

http://aws.amazon.com/elasticmapreduce/#pricing

Cust cost from 32 to 6 cents per CPU hour!Perhaps Alanzo was right!

82% CostReduction!

Why? (hint: how "shareable" is this process)

Page 34: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DKelly-McCreary & Associates, LLC

34

Its about transformation!

34

Page 35: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DKelly-McCreary & Associates, LLC

35

Architectural Tradeoffs

I want a fast car with good mileage.

I want a scaleable database with low cost that runs well on the 1000 CPUs in our data center.

Page 36: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DKelly-McCreary & Associates, LLC

36

Recent History

• The term NoSQL became re-popularized around 2009

• Used for conferences of advocates of non-relational databases

• Became a contagious idea "meme"• First of many "NoSQL meetups" in San

Francisco organized by Jon Oskarsson• Conversion from "No SQL" to "Not Only

SQL" in recent year

Page 37: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

37

Key Events on NoSQL Timeline

1998

Carlo StrozziIntros NoSQL

2006

GoogleBigtable

2009

Eric EvansReintroduces Term

2005

GoogleMapReduce

XQuery 1.0Recommendation

NoSQL MeetupsAmazonDynamo

2007 2008 201020042003

LiveJournal'sMemcache

Open SourceBecomes Popular

MDX

Page 38: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

38

RDBMS vs. NoSQL

• NoSQL is real and it’s here to stay

http://www.google.com/trends/explore#q=nosql%2C%20rdbms&date=1%2F2009%2051m&cmpt=q

RDBMS

NoSQL

Google Trends

Page 39: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DKelly-McCreary & Associates, LLC

39

NoSQL and Web 2.0 Startups

• Many web 2.0 startups did not use Oracle or MySQL

• They built their own data stores influenced by Amazon’s Dynamo and Google’s Bigtable in order to store and process huge amounts of data

• In the social community or cloud computing applications most of these data stores became open source software

Page 40: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

40

2003 - Memcached

• Originally developed for LiveJournal as a "side-cache"• Allowed databases to put data in a "RAM Cache" in front

of a SQL server to speed up read-mostly operations• Cache in all front-end servers can be shared via a simple

protocol• Dramatically lowers stress on SQL server• Ideal "front end" to SQL systems to increase performance

Page 41: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

41

memcachedb

Partition Select and Updates

• Most frequently read data always stays in RAM• Complex update transactions go directly to the database

memcachedbSelect

SQL "System ofRecord"

InsertUpdateDelete

memcachedb

Sync with memcached protocol

Page 42: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

42

Google MapReduce

• 2004 paper that had huge impact of functional programming on the entire community

• Copied by many organizations, including Yahoo

Page 43: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

43

Google Bigtable Paper

• 2006 paper that gave focus to scaleable databases

• designed to reliably scale to petabytes of

data and thousands of machines

Page 44: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

44

Amazon's Dynamo Paper

• Werner Vogels• CTO - Amazon.com• October 2, 2007• Used to power Amazon's

Web Sites• One of the most

influential papers in the NoSQL movement

Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swami Sivasubramanian, Peter Vosshall and Werner Vogels, “Dynamo: Amazon's Highly Available Key-Value Store”, in the Proceedings of the 21st ACM Symposium on Operating Systems Principles, Stevenson, WA, October 2007.

Page 45: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DKelly-McCreary & Associates, LLC

45

2009: the NoSQL "Revolt"

“NoSQLers came to share how they had overthrown the tyranny of slow, expensive relational databases in favor of more efficient and cheaper ways of managing data.”

Computerworld magazine, July 1st, 2009

NoSQL!

Page 46: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright 2008 Dan McCreary & Associates

46

Many Processes Today Are Driven By…

The constraints of yesterday…

Challenge:Ask ourselves the question…Do our current method of solving problems with tabular data…Reflect the storage of the 1950s…Or our actual business requirements?What structures best solve the actual business problem?

Page 47: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright 2008 Dan McCreary & Associates

47

No-Shredding!

• Relational databases take a single hierarchical document and shred it into many pieces so it will fit in tabular structures

• Document stores prevent this shredding

My FormData

Page 48: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright 2008 Dan McCreary & Associates

48

Is Shredding Really Necessary?

• Every time you take hierarchical data and put it into a traditional database you have to put repeating groups in separate tables and use SQL “joins” to reassemble the data

Page 49: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

D

The Addition of XML Web Services

• T1 – HTML into Java Objects• T2 – Java Objects into SQL Tables• T3 – Tables into Objects• T4 – Objects into HTML• T5 – Objects to XML• T6 – XML to Objects

49

Copyright 2011 Kelly-McCreary & Associates

T1

T3

T2

T4

Object MiddleTier

RelationalDatabaseWeb Browser

T5

Web Service

T6

Page 50: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

50

"The Vietnam of Applications"

• Object-relational mapping has become one of the most complex components of building applications today

• A "Quagmire" where many projects get lost• Many "heroic efforts" have been made to solve

the problem– Java Hibernate Framework– Ruby on Rails

• But sometimes the best way to avoid complexity is to keep your architecture very simple

Page 51: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

D

"Schema Free"

• Systems that automatically determine how to index data as the data is loaded into the database

• No a priori knowledge of data structure• No need for up-front logical data modeling

– …but some modeling is still critical

• Adding new data elements or changing data elements is not disruptive

• Searching millions of records still has sub-second response time

51Copyright 2010 Dan McCreary & Associates

Page 52: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

52

Schema-Free Integration

"We can easily store the data that we actually get, not the data we thought we would get."

XMLv1

XMLv2

XMLv3

Enterprise Messaging System NoSQLDatabase

Page 53: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

53

Upfront ER Modeling is Not Required

• You do not have to finish modeling your data before you insert your first records

• No Data Definition Language "DDL" is needed

• Metadata is used to create indexes as data arrives

• Modeling becomes a statistical process – write queries to find exceptions and normalize data

• Exceptions make the rules but can still be used

• Data validation can still be done on documents using tools such as XML Schema and business rules systems like Schematron

Page 54: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

D

Monoculture and Mono-architecture

54Copyright 2011 Kelly-McCreary & Associates

Image Source: Wikipedia

Page 55: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DKelly-McCreary & Associates, LLC

55

Eric Evans

“The whole point of seeking alternatives [to RDBMS systems] is that you need to solve a problem that relational databases are a bad fit for.”

Eric EvansRackspace

Page 56: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

D

Finding the Right Match

56Copyright 2011 Dan McCreary & Associates

Schema-Free

Mature Query Language

Standards Compliant

Use CMU's Architectural Tradeoff and Method (ATAM) Process

Page 57: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

57

ACID Transactions

• Atomic – transaction either complete or fail – no in-between

• Consistent – no reports can ever catch data between states

• Isolated – never allow users to touch data that is incomplete

• Durable – recover committed transactions from partial system failures

Page 58: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

58

BASE

• Base Availability of Soft state Eventually Consistent

• "Eventually Consistent" is not appropriate for many financial reporting systems

Page 59: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DKelly-McCreary & Associates, LLC

59

Sharding and Partition Tolerance

• When a database gets full how easy is it for your system to automatically move half the data to new systems?

Time to "Shard"

New Server

Before autoshard the node is near 100% capacity.

After autoshard the nodes are "rebalanced" with each node taking half the load.

copy 1/2 data

Page 60: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DKelly-McCreary & Associates, LLC

60

Brewer's CAP Theorem

Consistency

Availability Partition Tolerance

You can not have all three in

so pick two!

Page 61: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DKelly-McCreary & Associates, LLC

61

Proof of Brewer's CAP Theorem

• In 2002, Seth Gilbert and Nancy Lynch of MIT, formally proved Brewer's Theorem to be correct

• Focus turned away from trying to get all three to trying to pick the right two for the job

• Many systems are becoming "CAP tunable"

Page 62: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

62

CAP Systems

• CA – Consistent and Available– Example – a single site, single database, no auto-

sharding– If you have to partition, stop, partition, restart

• CP – Consistent with Partition Tolerance– Some data not available when partitioning– What data you have is always consistent

• AP – Available with Partition Tolerance– Always on, but during partitioning, some data may

be inconsistent– After sharding eventual consistency

Page 63: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DKelly-McCreary & Associates, LLC

63

CAP Regions

Consistency

Availability Partition Tolerance

CP

AP

CA

NoSQL Innovation

RDBMS

Page 64: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DKelly-McCreary & Associates, LLC

64

When does CAP Apply?

Consistency AND Availability

Each transaction can select between Consistency OR Availability depending on

the context

If you have…

A single processor or many processors on a

working network:

Many processors and network failures:

then you get…

Page 65: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

65

Scale Up vs. Scale Out

Scale Up• Make a single CPU as fast as

possible• Increase clock speed• Add RAM• Make disk I/O go faster

Scale Out• Make Many CPUs work

together• Learn how to divide your

problems into independent threads

Page 66: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

66

The Tunable SLA

• NoSQL systems that use many commodity processors can be precisely tuned to meet an organizations service level agreements

Max ReadTime

Max WriteTime

Reads PerSecond

DuplicateCopies

MultipleDatacenters

TransactionGuarantees

Writes PerSecond

$435.50

EstimatedPrice/Month

inputs output

Page 67: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

67

Example: Amazon DynamoDB

• Tune your service to the business requirements (challenging to do with a RDBMS)

TuningKnobs

Page 68: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DKelly-McCreary & Associates, LLC

68

"One Size Fits…"

"One Size Does Not Fit All"James Hamilton Nov. 3rd, 2009

http://perspectives.mvdirona.com/CommentView,guid,afe46691-a293-4f9a-8900-5688a597726a.aspx

James works for Amazon Web Services

Page 69: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DKelly-McCreary & Associates, LLC

69

NoSQL and Cloud Computing

• High scalability– Especially in the horizontal

direction (multi CPUs)

• Low administration overhead– Simple web page

administration

• Lower fees for MapReduce transformations

Page 70: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

70

Distribution Models

Master-Slave Peer-to-Peer

MasterStandbyMaster

Node Node

Node

Node

Node

Node

Node

requestsrequests

Used only if primary master fails

Page 71: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

D

The NO-SQL Universe

71

Copyright 2010 Dan McCreary & Associates

Document StoresKey-Value Stores

Graph/Triple Stores

Object StoresColumn-Family Stores

XML

Page 72: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

72

Relational

• Data is usually stored in row by row manner (row store)

• Standardized query language (SQL)• Data model defined before you add

data• Joins merge data from multiple tables• Results are tables• Pros: mature ACID transactions with

fine-grain security controls• Cons: Requires up front data

modeling, does not scale well

Examples:Oracle, MySQL,PostgreSQL,Microsoft SQL Server, IBM DB/2

Page 73: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

73

Analytical (OLAP)

• Based on "Star" schema with central fact table for each event

• Optimized for analysis of read-analysis of historical data

• Use of MDX language to count query "measures" for "categories" of data

• Pros: fast queries for large data• Cons: not optimized for

transactions and updates

Examples:Cognos, Hyperion, Microstrategy, Pentaho, Microsoft, Oracle, Business Objects

Page 74: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

74

Key-Value Stores

• Keys used to access opaque blobs of data

• Values can contain any type of data (images, video)

Pros: scalable, simple API (put, get, delete)

Cons: no way to query based on the content of the value

key value

key value

key value

key value

Examples:Berkley DB, Memcache, DynamoDB, S3, Redis, Riak

Page 75: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

75

Key Value Stores

• A table with two columns and a simple interface– Add a key-value– For this key, give me the

value– Delete a key

• Blazingly fast and easy to scale (no joins)

Key Value

Blob datatype

string datatype

Page 76: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

76

The Locker Metaphor

Key:

Value:An arbitrary container data

Page 77: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

77

No Subset Queries in Key-Value Stores

Page 78: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

78

Types of Key-Value Stores

• Eventually‐consistent key‐value store• Hierarchical key‐value stores• Key-Value stores in RAM• Key-Value stores on disk• High availability key-value store• Ordered key‐value stores• Values that allow simple list operations

Page 79: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

79

Memcached

• Open source in-memory key-value caching system• Make effective use of RAM on many distributed web servers• Designed to speed up dynamic web applications by alleviating database

load• RAM resident key-value store for small chunks of arbitrary data (strings,

objects) from results of database calls, API calls, or page rendering• Simple interface for highly distributed RAM caches• 30ms read times typical• Designed for quick deployment, ease of development• APIs in many languages

Page 80: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

80

Riak

• Open source distributed key-value store with support and commercial versions by Basho

• A "Dynamo-inspired" database• Focus on availability, fault-tolerance,

operational simplicity and scalability• Support for replication and auto-sharding and

rebalancing on failures• Support for MapReduce, fulltext search and

secondary indexes of value tags• Written in ERLANG

Page 81: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

81

Redis

• Open source in-memory key-value store with optional durability

• Focus on high speed reads and writes of common data structures to RAM

• Allows simple lists, sets and hashes to be stored within the value and manipulated

• Many features that developers like– expiration, transactions, pub/sub, partitioning

Page 82: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

82

Amazon DynamoDB

• Amazon DynamoDB• Based around scalable key-value store• Fastest growing product in Amazon's

history• SSD only database service• Focus on throughput not storage and

predictable read and write times• Strong integration with S3 and Elastic

MapReduce

Page 83: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

83

Column-Family

• Key includes a row, column family and column name

• Store versioned blobs in one large table

• Queries can be done on rows, column families and column names

• Pros: Good scale out, versioning• Cons: Cannot query blob content,

row and column designs are critical

Examples:Cassandra, HBase, Hypertable, Apache Accumulo, Bigtable

Page 84: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

84

Column Family (Bigtable)

• The champion of "Big Data"• Excel at highly saleable systems• Tightly coupled with MapReduce• Technically a "sparse matrix" were most cells have

no data• Generating a list of all columns is non-trivial• Examples:

– Google Bigtable– Hadoop HBase– Hypertable

Page 85: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

85

Spreadsheets Use a Row/Column as a Key

• Bigtable systems use a combination of row and column information as part of their key

Column ID

Row ID

Page 86: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

86

Keys Include Family and Timestamps

• Bigtable systems have keys that include not just row and column ID but other attributes

• Column Families are created when a table is created

• Timestamps allows multiple versions of values• Values are just ordered bytes and have no

strongly typed data system

Page 87: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

87

Column Store Concepts

• Preserve the table-structure familiar to RDBMS systems

• Not optimized for "joins"• One row could have millions of

columns but the data can be very "sparse"

• Ideal for high-variability data sets• Colum families allow to query all

columns that have a specific property or properties

• Allow new columns to be inserted without doing an "alter table"

• Trigger new columns on inserts

Col1 Col100000…

Page 88: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

88

Column Families

• Group columns into "Column families"

• Group column families into "Super-Columns"

• Be able to query all columns with a family or super family

• Similar data grouped together to improve speed

Table

SuperCol X

SuperCol Y

Fam1 Fam2

Col-A Col-B

Page 89: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

89

Hadoop/Hbase

• Open source implementation of MapReduce algorithm written in Java

• Initially created by Yahoo– 300 person-years development

• Column-oriented data store• Java interface• HBase designed specifically to work with

Hadoop• High-level query language (Pig)• Strong support by many vendors

Page 90: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

90

Cassandra

• Apache open source column family database supported by DataStax

• Peer-to-peer distribution model• Strong reputation for linear scale out

(millions of writes/second)• Database side security• Written in Java and works well with HDFS

and MapReduce

Page 91: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

91

Netflix

Page 92: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

92

Graph Store

• Data is stored in a series of nodes, relationships and properties

• Queries are really graph traversals• Ideal when relationships between

data is key: – e.g. social networks

• Pros: fast network search, works with public linked data sets

• Cons: Poor scalability when graphs don't fit into RAM, specialized query languages (RDF uses SPARQL)

Examples:Neo4j, AllegroGraph, Bigdata triple store, InfiniteGraph, StarDog

Page 93: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

93

Graph Stores

• Used when the relationship and relationships types between items are critical

• Used for– Social networking queries: "friends of my friends"– Inference and rules engines– Pattern recognition– Used for working with open-linked data

• Automate "joins" of public data

Page 94: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

94

Nodes are "joined" to create graphs

• How do you know that two items reference the same object?

• Node identification – URI or similar structure

Page 95: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

95

Open Linked Data

Page 96: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

96

Neo4J

• Graph database designed to be easy to use by Java developers

• Dual license (community edition is GPL)

• Works as an embedded java library in your application

• Disk-based (not just RAM)• Full ACID

Page 97: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

97

Document Store

• Data stored in nested hierarchies

• Logical data remains stored together as a unit

• Any item in the document can be queried

• Pros: No object-relational mapping layer, ideal for search

• Cons: Complex to implement, incompatible with SQL

Examples:MarkLogic, MongoDB, Couchbase, CouchDB, eXist-db

Page 98: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

98

Document Stores

• Store machine readable documents together as a single blob of data

• Use JSON or XML formats to store documents• Similar to "object stores" in many ways• No shredding of data into tables• Sub-trees and attributes of documents can still be

queried XQuery or other document query languages• Quickly maturing to include ACID transaction support• Lack of object-relational mapping permits agile

development• Fastest growing revenues (MarkLogic, MongoDB,

Couchbase)

Page 99: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

99

Estimated Big Data and NoSQL Sales

Document Stores

Page 100: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

100

Document Structure

<books> is our root element

<books> contain a

sequence of one to many

<book> elements

Each <book> contains

the following sequence of

elements

Darker lines mean "required" and light lines mean optional

elements

Id and title are required

Books have 0 to many author-names

Format and license elements are codes that must be in a fixed list of choicesOnly valid URL

characters

Must be a valid decimal number

Page 101: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

101

MarkLogic

• Native XML database designed to scale to Petabyte data stores

• Leverages commodity hardware• ACID compliant, schema-free document

store• Heavy use by federal agencies, document

publishers and "high-variability" data• Arguably the most successful NoSQL

company

Page 102: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

102

MongoDB

• Open Source JSON data store created by 10gen

• Master-slave scale out model• Strong developer community• Sharding built-in, automatic• Implemented in C++ with many APIs (C++,

JavaScript, Java, Perl, Python etc.)

Page 103: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

103

Couchbase

• Open source JSON document store• Code base separate from CouchDB• Built around memcached• Peer to peer scale out model• Written in C++ and Erlang• Strengths in scale out, replication and

high-availability

Page 104: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

104

CouchDB

• Apache CouchDB• Open source JSON data store• Document Model• Written in ERLANG• RESTful JSON API• Distributed, featuring robust, incremental

replication with bi-directional conflict detection and management

• B-Tree based indexing• Mobile version

Page 105: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

105

eXist

• Open source native XML database• Strong support for XQuery and XQuery

extensions• Heavily used by the Text Encoding Initiative

(TEI) community and XRX/XForms communities

• Integrated Lucene search• Collection triggers and versioning• Extensive XQuery libs (EXPath)• Version 2.0 has replication

Page 106: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

106

Structured Search

• What happens if you retain document structure when you index your documents?

• How can use document structure to increase your precision and relevancy

Page 107: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

D

Information Retrieval Textbook

Introduction to Information Retrieval

by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze

Cambridge University Press, 2008

107

http://nlp.stanford.edu/IR-book/information-retrieval-book.html

Page 108: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

D

Table 10.1

RDB search unstructured retrieval

structured retrieval

objects records unstructured documents

trees with text at leaves

model relational model vector space & others ?

main data structure table inverted index ?

queries SQL free text queries ?

108

XML - Table 10.1 and structured information retrieval. SQLRDB (relational database) search, unstructured information retrieval

Page 109: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

D

Table 10.1 - Revised

RDB search unstructured retrieval

structured retrieval

objects records unstructured documents

trees with text at leaves

model relational model vector space & others XML hierarchy

main data structure table inverted index

trees with node-ids for document ids

queries SQL free text queries XQuery fulltext

109

XML - Table 10.1 and structured information retrieval. SQLRDB (relational database) search, unstructured information retrieval

Page 110: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DKelly-McCreary & Associates, LLC

Two Models

"Bag of Words"

• All keywords in a single container• Only count frequencies are stored

with each word

"Retained Structure"

• Keywords associated with each sub-document component

110

'love'

'hate''new'

'fear'

keywords

keywords

keywords

keywords

keywords

keywords

doc-id

Page 111: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

D

Keywords and Node IDs

• Keywords in the reverse index are now associated with the node-id in every document

111

Node-id

Node-id

Node-id

Node-id

Node-id

Node-id

keywords

keywords

keywords

keywords

keywords

keywords

document-id

Page 112: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

112

Hybrid architectures

• Most real world implementations use some combination of NoSQL solutions

• Example:– Use document stores for data– Use S3 for image/pdf/binary storage– Use Apache Lucene for document index stores– Use MapReduce for real-time index and

aggregate creation and maintenance– Use OLAP for reporting sums and totals

Page 113: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

113

Other Terms

• Hash (Merkle) Trees –where a hash function is used for each leaf and branches up to the root of a file system – fast ways to see if complex documents in tree structures have changed

• B-Trees, B+ Trees – binary trees that are used to create fast searches of large data sets

• Vector Clocks – an algorithm for detecting dependencies between remote computer systems – used to track what systems may have corrupted data

• Bloom Filters – hashing tricks to avoid unnecessary disk operations on large data sets

• Consistent Hashing – A way of converting strings of data into a unique key that can be used for fast comparisons

Page 114: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

114

Tools to Help You Select A System

• ATAM – Architecture Tradeoff Methodology

• CMU developed process to objectively select a system architecture based on business driven use-cases and quality metrics

Page 115: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

115

ATAM Process Flow

BusinessDrivers

QualityAttributes

UserStories

AnalysisArchitecture

PlanArchitecturalApproaches

ArchitecturalDecisions

Tradeoffs

SensitivityPoints

Non-Risks

RisksRisk ThemesDistilled info

Impacts

Page 116: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

116

Sample Use Cases/User Stories

• Effort to Load New Data• Effort to Query Data• Effort to Transform Data• Effort to Create RESTful Web Services

Page 117: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

D 117

Insert/Select/Publish Comparison

Insert Query CreatePublishing

Web Service

SQL

WebDAV SQL XQuery

JavaTomcatAXISJDBC

Total Effort

XQuery

logicaldatamodeling

SQL

SQL

XQuery

Effort

Page 118: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DKelly-McCreary & Associates, LLC

118

Sample Quality Attribute Utility Tree

UtilityUtility

SearchabilitySearchability

XML ImportabilityXML Importability

TransformabilityTransformability

AffordabilityAffordability

SustainabilitySustainability

InteroperabilityInteroperability

Easy to add new XML data. (C, H)

Use OpenSource Software. (H, H)

Use long standing Standards. (VH, H)

Use W3C Standards. (VH, H)

Fulltext search on document data. (H, H)

Easy to transform XML data. (H, H)

StandardsStandards

Ease of ChangeEase of Change Use declarative languages. (VH, H)

SecuritySecurity Prevents unauthorized access. (H, H)

Fulltext SearchFulltext Search

XML SearchXML Search

Custom ScoringCustom Scoring

Drag-and-dropDrag-and-drop

Bulk ImportBulk Import

No License FeesNo License Fees

XQueryXQuery

XSLTXSLT

Web ServicesWeb Services

Fine Grain ControlFine Grain Control

Standards BasedStandards Based

No TranslationNo Translation

Key: (Importance, Score)Important to the Project C=Critical, VH=Very High, H=High, M=MediumArchitectural Score H=High, M=Medium, L=Low

Scoring via XQuery. (M, H)

Role-basedRole-based

Works with W3C Forms. (H, M)

Works with web standards. (VH, H)

No complex languages to learn. (H, H)

Fast searching. (H, H)

Staff training. (H, M)

Collection-based access. (H, M)Centralized security policy. (H, M)

Mashups wtih REST Interfaces. (VH, H)

Batch Import tools. (M, M)

Transform to HTML or PDF. (VH, H)

Customizable by non-programmers. (H, H)

Page 119: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

119

The Hard Part

• Assigning organizational "value" to– Ease of use– Standards– Vendor viability– Open source project viability– Portability

• Example:– WebDAV support

Page 120: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

120

Evolution of Ideas in OpenSource

• How quickly can new ideas be recombined into new database products?• OpenSource software has proved to be the most efficient way to quickly recombine

new ideas into new products

Product A

Product B

Product B

OpenSource

Proprietary SoftwareNew Database Ideas

Schema-free

MapReduceAuto-sharding

New Products

Cloud Computing

The Nature of Technology: What It Is and How It Evolves by W. Brian Arthur

Page 121: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DKelly-McCreary & Associates, LLC

121

Start Finish

Using the Wrong Architecture

Credit: Isaac Homelund – MN Office of the Revisor

Page 122: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DKelly-McCreary & Associates, LLC

122

Using the Right Architecture

Start Finish

Find ways to remove barriers to empoweringthe non programmers on your team.

Page 123: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

D 123

If You Give a Kid a Hammer…

• People solve problems using familiar tools

• People develop specific Cognitive Styles* based on training and experience

• What are we teaching the next generation of developers?

* Source: Shoshana Zuboff: In the Age of the Smart Machine (1988)

…the whole world becomes a nail

Page 124: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

124

Cognitive Bias in NoSQL• Anchoring bias - the tendency to produce an estimate near a cue amount "Our managers were

expecting an RDBMS solution so that’s what we gave them"• Availability heuristic - the tendency to estimate that what is easily remembered is more likely

than that which is not. - "I hear that NoSQL does not support ACID"• Bandwagon effect - the tendency to do or believe what others do or believe - "Everyone else

here and in our local area uses RDBMS"• Confirmation bias - the tendency to seek out only that information that supports one's

preconceptions – "We only read posts from the Oracle groups"• Framing effect - the tendency to react to how information is framed, beyond its factual content

"We know of some NoSQL projects that failed"• Gambler's fallacy (aka sunk cost bias) the failure to reset one's expectations based on one's

current situation – "We already paid for our Oracle license…"• Hindsight bias, the tendency to assess one's previous decisions as more efficacious than they

were – "Our last five systems worked on RDBMS solutions".• Halo effect, the tendency to attribute unverified capabilities in a person based on an observed

capability. – "Oracle sells billions of dollars of licenses each year, how could so many people be wrong".

• Representativeness heuristic, the tendency to judge something as belonging to a class based on a few salient characteristics  - "Our accounting systems work on RDBMS so why not our product search?"

http://en.wikipedia.org/wiki/Cognitive_bias_mitigation

Page 125: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

125

The Future of the NoSQL Movement

• Will data sets continue to grow at exponential rates?• Will new system options become more diverse?• Will new markets have different demands?• Will some ideas be "absorbed" into existing RDBMS vendors products?• Will the NoSQL community continue to be the place where new

database ideas and products are incubated?• Will the job of doing high-quality architectural tradeoffs analysis become

easier?

Growth Diversity

Page 126: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

126

Making Sense of NoSQL

http://manning.com/mccreary

Page 127: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

127

2013 NoSQL Now!

• Dataversity's NoSQL Conference• August 20-22• San Jose California

Page 128: NoSQL, XRX and Functional Programming Version 4.0 Dan McCreary Monday May 6th, 2013

M

DCopyright Kelly-McCreary & Associates, LLC

128

Questions

Thank You!

Dan McCreary

President, Kelly-McCreary & Associates

[email protected]

twitter: dmccreary