141115 dev opsdays_jso_ntosql

20
Schema From Nothing Transformation of JSON into PostgreSQL 9.4 SQL+JSONB Power Schema Christopher W.V. Hogue, Ph.D. [email protected] @cwvhogue

Upload: christopher-hogue

Post on 21-Aug-2015

498 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: 141115 dev opsdays_jso_ntosql

[email protected] @cwvhogue

Schema From Nothing

Transformation of JSON into PostgreSQL 9.4 SQL+JSONB Power Schema

Christopher W.V. Hogue, Ph.D.

Page 2: 141115 dev opsdays_jso_ntosql

[email protected] @cwvhogue

From JSON to Data Warehouse

• SQL Normalization

You stay in DevOpsLandAnd I show you how deepThe Rabbit Hole Goes.

• NoSQL Solution

This story ends.You wake up in your bed and believe whatever you want to believe.

Page 3: 141115 dev opsdays_jso_ntosql

[email protected] @cwvhogue

From JSON to Data Warehouse

• 1st Normal Form• 2nd Normal Form• 3rd Normal Form• BCNF• Denormalization• Star Schema• Materialized Views• Non-unique Indexes

Normalization:

Like a splinter in your mind, driving you mad.

Schema:

A prison for your mind.

Page 4: 141115 dev opsdays_jso_ntosql

[email protected] @cwvhogue

JSON as The Matrix.

Page 5: 141115 dev opsdays_jso_ntosql

[email protected] @cwvhogue

Power Law – Power Schema

Sparse Edge

RepetitiveEdge

Page 6: 141115 dev opsdays_jso_ntosql

[email protected] @cwvhogue

Power Schema

• Sparse Table– Simple key-value store– Flexible, agile– Add anything you like,

later.

Page 7: 141115 dev opsdays_jso_ntosql

[email protected] @cwvhogue

Power Schema

• Sparse Table– Simple key-value store

• Main Table– SQL typed columns– JSONB columns

Page 8: 141115 dev opsdays_jso_ntosql

[email protected] @cwvhogue

Power Schema

• Sparse Table– Simple key-value store

• Main Table

• Repetitive Table– Hashed lookup– Unique column

combinations

Page 9: 141115 dev opsdays_jso_ntosql

The Matrix, Reloaded

The Power Schema

Sparse Table

Main Table

Repetitive Table

Page 10: 141115 dev opsdays_jso_ntosql

[email protected] @cwvhogue

Sample – Measure Information

Page 11: 141115 dev opsdays_jso_ntosql

[email protected] @cwvhogue

Scalable Classifier Heuristic

Page 12: 141115 dev opsdays_jso_ntosql

[email protected] @cwvhogue

Classifying JSON

Page 13: 141115 dev opsdays_jso_ntosql

[email protected] @cwvhogue

JSONB Implementation

• PostgreSQL 9.4b introduces JSONB– JSONB makes each SQL column behave like a

NoSQL JSON document store.

• Blue Pill Solution: – Jam JSON into ONE JSONB column, done.

• Red Pill Solution:– Realize that PostgreSQL is strongly typed. – JSONB CASTs will fail at query time.– Detect types, and CAST at LOAD Time.

Page 14: 141115 dev opsdays_jso_ntosql

[email protected] @cwvhogue

JSONB Implementation

• PostgreSQL 9.4b introduces JSONB– JSONB makes each SQL column behave like a

NoSQL JSON document store.

• Blue Pill Solution: – Jam JSON into ONE JSONB column, done.

• Red Pill Solution:– Know that PostgreSQL columns are strongly typed. – JSONB CASTs may fail at query time. {“time” : “default”}– Detect types, and CAST before LOADing.

Page 15: 141115 dev opsdays_jso_ntosql

[email protected] @cwvhogue

Inferring Types from JSON

• Detect column type information– Streaming Decision Tree &

‘Meta’ Finite State Machine (node.js)

• Use type information to– Generate SQL table structure and types– Convert JSON into TSV for fast loading

Page 16: 141115 dev opsdays_jso_ntosql

[email protected] @cwvhogue

Detect Which Types?

• Numeric types• VARCHAR• Nested JSON• IP addresses• Mac addresses• Timestamps• Arrays (of the above)

Convert these to strict PostgreSQL types when possible prior to batch loading.

Especially JSON arrays!

Page 17: 141115 dev opsdays_jso_ntosql

[email protected] @cwvhogue

JSON ETL steps to TSV

• JSON – Sample & Classify – Split into 3 tables

• _m• _r • _s

– Detect types in each table, each column.

• Tab Separated Value– PostgreSQL load form– Fast parser – Column-type rules

• Type detection step information used to generate painless TSV load form AND SQL table declaration

Page 18: 141115 dev opsdays_jso_ntosql

[email protected] @cwvhogue

JSON ETL steps to TSV

• JSON – Sample & Classify – Split into 3 tables

• _m• _r• _s

– Detect types in each table, each column.

• Tab Separated Value– PostgreSQL load form– Fast parser – Column-type rules

• Type detection step information for– painless TSV load form – SQL table declaration

Page 19: 141115 dev opsdays_jso_ntosql

[email protected] @cwvhogue

Load in to PostgreSQL 9.4b

• COPY table_m FROM ’file.tsv’;

• Query with SQL + JSONB operators with fewer CASTS and validation errors at query time.

• A Tableau, R, ODBC, JDBC, ready analytical data source.

• Your JSON is now in SQL.

Page 20: 141115 dev opsdays_jso_ntosql

[email protected] @cwvhogue

Repo (In progress)

https://github.com/joyent/moray-etl-jsonbEmail: [email protected]

Acknowledgements: @benr