141115 dev opsdays_jso_ntosql

Post on 21-Aug-2015

498 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

cwvhogue@gmail.com @cwvhogue

Schema From Nothing

Transformation of JSON into PostgreSQL 9.4 SQL+JSONB Power Schema

Christopher W.V. Hogue, Ph.D.

cwvhogue@gmail.com @cwvhogue

From JSON to Data Warehouse

• SQL Normalization

You stay in DevOpsLandAnd I show you how deepThe Rabbit Hole Goes.

• NoSQL Solution

This story ends.You wake up in your bed and believe whatever you want to believe.

cwvhogue@gmail.com @cwvhogue

From JSON to Data Warehouse

• 1st Normal Form• 2nd Normal Form• 3rd Normal Form• BCNF• Denormalization• Star Schema• Materialized Views• Non-unique Indexes

Normalization:

Like a splinter in your mind, driving you mad.

Schema:

A prison for your mind.

cwvhogue@gmail.com @cwvhogue

JSON as The Matrix.

cwvhogue@gmail.com @cwvhogue

Power Law – Power Schema

Sparse Edge

RepetitiveEdge

cwvhogue@gmail.com @cwvhogue

Power Schema

• Sparse Table– Simple key-value store– Flexible, agile– Add anything you like,

later.

cwvhogue@gmail.com @cwvhogue

Power Schema

• Sparse Table– Simple key-value store

• Main Table– SQL typed columns– JSONB columns

cwvhogue@gmail.com @cwvhogue

Power Schema

• Sparse Table– Simple key-value store

• Main Table

• Repetitive Table– Hashed lookup– Unique column

combinations

The Matrix, Reloaded

The Power Schema

Sparse Table

Main Table

Repetitive Table

cwvhogue@gmail.com @cwvhogue

Sample – Measure Information

cwvhogue@gmail.com @cwvhogue

Scalable Classifier Heuristic

cwvhogue@gmail.com @cwvhogue

Classifying JSON

cwvhogue@gmail.com @cwvhogue

JSONB Implementation

• PostgreSQL 9.4b introduces JSONB– JSONB makes each SQL column behave like a

NoSQL JSON document store.

• Blue Pill Solution: – Jam JSON into ONE JSONB column, done.

• Red Pill Solution:– Realize that PostgreSQL is strongly typed. – JSONB CASTs will fail at query time.– Detect types, and CAST at LOAD Time.

cwvhogue@gmail.com @cwvhogue

JSONB Implementation

• PostgreSQL 9.4b introduces JSONB– JSONB makes each SQL column behave like a

NoSQL JSON document store.

• Blue Pill Solution: – Jam JSON into ONE JSONB column, done.

• Red Pill Solution:– Know that PostgreSQL columns are strongly typed. – JSONB CASTs may fail at query time. {“time” : “default”}– Detect types, and CAST before LOADing.

cwvhogue@gmail.com @cwvhogue

Inferring Types from JSON

• Detect column type information– Streaming Decision Tree &

‘Meta’ Finite State Machine (node.js)

• Use type information to– Generate SQL table structure and types– Convert JSON into TSV for fast loading

cwvhogue@gmail.com @cwvhogue

Detect Which Types?

• Numeric types• VARCHAR• Nested JSON• IP addresses• Mac addresses• Timestamps• Arrays (of the above)

Convert these to strict PostgreSQL types when possible prior to batch loading.

Especially JSON arrays!

cwvhogue@gmail.com @cwvhogue

JSON ETL steps to TSV

• JSON – Sample & Classify – Split into 3 tables

• _m• _r • _s

– Detect types in each table, each column.

• Tab Separated Value– PostgreSQL load form– Fast parser – Column-type rules

• Type detection step information used to generate painless TSV load form AND SQL table declaration

cwvhogue@gmail.com @cwvhogue

JSON ETL steps to TSV

• JSON – Sample & Classify – Split into 3 tables

• _m• _r• _s

– Detect types in each table, each column.

• Tab Separated Value– PostgreSQL load form– Fast parser – Column-type rules

• Type detection step information for– painless TSV load form – SQL table declaration

cwvhogue@gmail.com @cwvhogue

Load in to PostgreSQL 9.4b

• COPY table_m FROM ’file.tsv’;

• Query with SQL + JSONB operators with fewer CASTS and validation errors at query time.

• A Tableau, R, ODBC, JDBC, ready analytical data source.

• Your JSON is now in SQL.

cwvhogue@gmail.com @cwvhogue

Repo (In progress)

https://github.com/joyent/moray-etl-jsonbEmail: cwvhogue@gmail.com

Acknowledgements: @benr

top related