mongodb and big data - university of stirling · what is mongodb? modern document-model operational...

38
MongoDB and Big Data Presenter: John Page

Upload: others

Post on 17-Oct-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

MongoDB and Big Data Presenter: John Page

Page 2: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

2

DINGBATS

CUT CUT CUT

CUT

Page 3: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

3

DINGBATS

JOB AN

Page 4: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

4

DINGBATS

DATA

Page 5: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

5

What is ‘Big Data’

“Big Data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it.”

Page 6: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

6

What is ‘Big Data’

“Big Data is problems are where the Volume, Velocity or Variety of data mean traditional data processing techniques are no longer successful.”

Page 7: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

7

DINGBATS

Page 8: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

8

DINGBATS

C++

PHP

Javascript

Python

Haskell

Algol68

Perl

Go Scheme

Erlang

COBOL

Page 9: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

9

Column Family

Key/Value Store

Relational

Document Store

Options for building a Operational Database

Graph Database

Page 10: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

10

The World Has Changed

Volume Velocity Variety

Iterative Agile

Short Cycles

Always On Scale Global

Open-Source Cloud

Commodity

Data Time

Risk Cost

Page 11: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

11

What is MongoDB?

Modern Document-model operational database.

Designed to back today’s business applications.

•  Developer and Operations oriented.

•  Easy to scale horizontally.

•  Business Critical is the norm.

•  Lessons learned from 40 years of RDBMS.

Page 12: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

12

Relational Model

ArrtVal AttrName AttrVal 100 1 50mm

200 2 130g

SKU Name Category Brand Stock Count

Price

11574646 Pentax 50mm Lens

500 1500 6531 179.99

SKUAttributeID ItemSKU Attrval 1 11574646 100 2 11574646 200

AttrNameID AttrName 1 Focal Length 2 Weight

CategoryID Department 500 Photography

BrandID Brand 1500 Pentax

Page 13: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

13

Relational Model

SKU Name Category Brand Stock Count

Price

11574646 Pentax 50mm Lens

Photography Pentax 6531 179.99

SKUAttributeID SKU AttrVal 1 11574646 100 2 11574646 200

AttrValID AttrName AttrVal 100 FocalLength 50mm

200 Weight 130g

Page 14: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

14

Document Model

SKU Name Category Brand Stock Count

Price Attributes

11574646 Pentax 50mm Lens

Photography Pentax 6531 179.99 Focal 50mm

Weight 130g

Page 15: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

15

Document Model

SKU Name Category

Brand Stock Count

Price Attributes

1157464 Pentax 50mm Lens

Photography

Pentax 6531 179.99 Focal 50mm Weight 130g

{

_id : 1157464,

Name: ”Pentax 50mm Lens",

Category: ”Photography",

Brand: ”Pentax",

StockCount: 6531,

Price: 179.99,

Attributes: [

{name: ”FocalLen",

value: ”50mm" },

{name: ”Weight",

value: ”130g" }

]

}

Page 16: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

16

MongoDB - Agility

SKU Name Category Brand Stock Count

Price Attributes

1157464 Pentax 50mm Lens

Photography

Pentax 6531 179.99

SKU Name Category Brand Stock Count

Price Restricted

11574646 Penknife Camping Victorinox 156 18.99 TRUE

SKU Name Category Price Attributes 228918 3 yr

Warranty Service 179.99

Page 17: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

17

Shell Command-line shell for interacting directly with database

MongoDB - Usability

Drivers Drivers for most popular programming languages and frameworks

> db.collection.insert({product:“MongoDB”, type:“Document Database”}) > > db.collection.findOne() {

“_id” : ObjectId(“5106c1c2fc629bfe52792e86”), “product” : “MongoDB” “type” : “Document Database”

}

Java

Python

Perl

Ruby

Haskell

JavaScript

Page 18: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

18

MongoDB - Utility •  Complex Indexed Queries

•  Aggregation.

Age > 65 AND Male living near “LEEDS”

Age Profit Margin 1-17 0 18-35 20

36-50 80 51-65 50 66+ 5

Page 19: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

19

MongoDB - Scalability

•  High Availability.

•  Auto Sharding.

•  Data Compression.

•  Fine concurrency.

•  Enterprise Management.

Page 20: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

20

High Availability

•  Automated replication and failover

•  Multi-data center support

•  Improved operational simplicity (e.g., HW swaps)

•  Data durability and consistency

Page 21: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

21

MongoDB - Scalability

•  High Availability.

•  Auto Sharding.

•  Data Compression.

•  Fine concurrency.

•  Enterprise Management.

Page 22: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

22

Working Set Exceeds Physical Memory

Page 23: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

23

Scalability

Auto-Sharding

•  Increase capacity as you go

•  Commodity and cloud architectures

•  Improved operational simplicity and cost visibility

Page 24: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

24

Sharding and Replication

Page 25: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

25

Routing and Balancing

Shard Shard Shard

Mongos

1

2

3

4Operations Run on Specific Shards. Or in Parallel on many.

Page 26: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

26

Partitioning

•  User defines shard key •  Shard key defines range of data •  Key space is like points on a line

•  Range is a segment of that line

-∞ +∞Key Space

Page 27: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

27

Initially 1 chunk Default max chunk size: 64mb MongoDB automatically splits & migrates chunks when max reached

Data Distribution

Node 1SecondaryConfigServer Shard 1

MongosMongos Mongos

Shard 2

Mongod

Page 28: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

28

Aggregation

zrsquare = {'$multiply' : [ '$zr','$zr' ]}; zisquare = {'$subtract' : [ 0 , {'$multiply' : [ '$zi','$zi' ]}]}; zixzrx = { '$multiply' : [{'$multiply' : [ '$zr','$zi' ]},2]}; inciflow = { '$cond' : [ { '$lte' : [ '$zr' , 4 ]} , 1 , 0] }; itterate = { '$project' : { 'cr' : 1, 'ci' : 1 , 'zr' : { '$add' : [ zrsquare , zisquare, '$cr']}, 'zi' : { '$add' : [ zixzrx, '$ci' ] } , 'it' : { '$add' : [ "$it" , inciflow]} } };

Page 29: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

29

Aggregation

Page 30: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

30

Sharding Aggregation

shard3shard1$match$project$group1

shard2$match$project$group1

shard2$match$project$group1

result

Page 31: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

31

When MongoDB should be used.

•  When you have high speed access to complex objects •  A complex object can be updated in a fast atomic operation. •  A complex object can be retrieved in a single quick operation. •  A complex object can be queried. •  Search capabilities don’t need joins.

•  When you want to store larger data structures. •  Arrays of 10,000 values or objects •  Text up to 16MB •  Transparent not opaque BLOBS •  Blobs can be stored in with data.

22 [ 2 , 3, 4,] { a: 5 bob : { a { e:3} 22 [ 2 , 3, 4,] { a: 5 bob : { a { e:3} 22 [ 2 , 3, 4,] { a: 5 bob : { a { e:3}

Page 32: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

32

When MongoDB should be used.

•  When you value rapid development and evolution. •  Direct Object Models – lack of Mapping •  Application defined Schemas •  Rich feature sets and Search

•  Where you need to store structures of any shape. •  Direct Object Models •  Application defined Schemas •  Heterogeneous schemas.

Page 33: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

33

When MongoDB should be used.

•  When you have large data volumes. •  When data volumes are growing •  Where growth is potentially unlimited. •  Where you don’t want to pay for future growth just now.

•  When you want distributed data access or high uptime. •  Worldwide sites want low access times. •  Data must stay at point of origin legally. •  Data mirroring should be as live time as possible.

Page 34: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

34

9,000,000+ MongoDB Downloads

180,000+ Online Education Registrants

30,000+ MongoDB User Group Members

20,000+ MongoDB Days Attendees

35,000+ MongoDB Management Service (MMS) Users

MongoDB - Global Community

Page 35: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

35

Global 360 degree view of customers’ policy portfolio and interactions

Problem Why MongoDB Results

•  70 systems and 20 screens to view customer policies

• Many CSR calls taken just to reroute customer

•  Poor customer experience

•  Source systems are hard to change

•  Dynamic schema: can combine 70 systems easily

•  Performance: can handle all data in one DB

•  Replication: local reads and high availability

•  Sharding: can add data easily by scaling out

•  Delivered in 3 months with $3M – previous attempts failed with $25M

•  Unified customer view available to all channels

•  Shorter and less calls re-routed

•  Increased customer satisfaction

Single View Case Study: Tier 1 Global Insurance Provider

Page 36: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

36

Mainframe offloading / Mirroring for next generation of applications.

Problem Why MongoDB Results

• Mainframe costly to maintain and won’t handle additional load.

•  No way to meet customer needs for mobile and similar apps.

•  High degree of scalability.

•  Ability to define data formats from the mainframe views as needed.

•  Broad range of functional capability.

•  New applications online with Mainframe data cached in MongoDB.

•  Increase in customer satisfaction across Personal Banking.

Single View Case Study: Large Retail Bank

Page 37: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

37

Stores billions of posts in myriad formats with MongoDB

Case Study

Problem Why MongoDB Results

•  1.5M posts per day, different structures

•  Inflexible MySQL, lengthy delays for making changes

•  Data piling up in production database

•  Poor performance

•  Flexible document-based model

•  Horizontal scalability built in

•  Easy to use

•  Interface in familiar language

•  Initial deployment held over 5B documents and 10TB of data

•  Automated failover provides high availability

•  Schema changes are quick and easy

Page 38: MongoDB and Big Data - University of Stirling · What is MongoDB? Modern Document-model operational database. Designed to back today’s business applications. • Developer and Operations

38

Stores one of world’s largest record repositories and searchable catalogues in MongoDB

Case Study

Problem Why MongoDB Results

• One of world’s largest record repositories

• Move to SOA required new approach to data store

•  RDBMS could not support centralized data mgt and federation of information services

•  Fast, easy scalability

•  Full query language

•  Complex metadata storage

• Will scale to 100s of TB by 2013, PB by 2020

•  Searchable catalogue of varied data types

•  Decreased SW and support costs