141115 dev opsdays_jso_ntosql
TRANSCRIPT
[email protected] @cwvhogue
Schema From Nothing
Transformation of JSON into PostgreSQL 9.4 SQL+JSONB Power Schema
Christopher W.V. Hogue, Ph.D.
[email protected] @cwvhogue
From JSON to Data Warehouse
• SQL Normalization
You stay in DevOpsLandAnd I show you how deepThe Rabbit Hole Goes.
• NoSQL Solution
This story ends.You wake up in your bed and believe whatever you want to believe.
[email protected] @cwvhogue
From JSON to Data Warehouse
• 1st Normal Form• 2nd Normal Form• 3rd Normal Form• BCNF• Denormalization• Star Schema• Materialized Views• Non-unique Indexes
Normalization:
Like a splinter in your mind, driving you mad.
Schema:
A prison for your mind.
[email protected] @cwvhogue
JSON as The Matrix.
[email protected] @cwvhogue
Power Schema
• Sparse Table– Simple key-value store– Flexible, agile– Add anything you like,
later.
[email protected] @cwvhogue
Power Schema
• Sparse Table– Simple key-value store
• Main Table– SQL typed columns– JSONB columns
[email protected] @cwvhogue
Power Schema
• Sparse Table– Simple key-value store
• Main Table
• Repetitive Table– Hashed lookup– Unique column
combinations
The Matrix, Reloaded
The Power Schema
Sparse Table
Main Table
Repetitive Table
[email protected] @cwvhogue
Sample – Measure Information
[email protected] @cwvhogue
Scalable Classifier Heuristic
[email protected] @cwvhogue
Classifying JSON
[email protected] @cwvhogue
JSONB Implementation
• PostgreSQL 9.4b introduces JSONB– JSONB makes each SQL column behave like a
NoSQL JSON document store.
• Blue Pill Solution: – Jam JSON into ONE JSONB column, done.
• Red Pill Solution:– Realize that PostgreSQL is strongly typed. – JSONB CASTs will fail at query time.– Detect types, and CAST at LOAD Time.
[email protected] @cwvhogue
JSONB Implementation
• PostgreSQL 9.4b introduces JSONB– JSONB makes each SQL column behave like a
NoSQL JSON document store.
• Blue Pill Solution: – Jam JSON into ONE JSONB column, done.
• Red Pill Solution:– Know that PostgreSQL columns are strongly typed. – JSONB CASTs may fail at query time. {“time” : “default”}– Detect types, and CAST before LOADing.
[email protected] @cwvhogue
Inferring Types from JSON
• Detect column type information– Streaming Decision Tree &
‘Meta’ Finite State Machine (node.js)
• Use type information to– Generate SQL table structure and types– Convert JSON into TSV for fast loading
[email protected] @cwvhogue
Detect Which Types?
• Numeric types• VARCHAR• Nested JSON• IP addresses• Mac addresses• Timestamps• Arrays (of the above)
Convert these to strict PostgreSQL types when possible prior to batch loading.
Especially JSON arrays!
[email protected] @cwvhogue
JSON ETL steps to TSV
• JSON – Sample & Classify – Split into 3 tables
• _m• _r • _s
– Detect types in each table, each column.
• Tab Separated Value– PostgreSQL load form– Fast parser – Column-type rules
• Type detection step information used to generate painless TSV load form AND SQL table declaration
[email protected] @cwvhogue
JSON ETL steps to TSV
• JSON – Sample & Classify – Split into 3 tables
• _m• _r• _s
– Detect types in each table, each column.
• Tab Separated Value– PostgreSQL load form– Fast parser – Column-type rules
• Type detection step information for– painless TSV load form – SQL table declaration
[email protected] @cwvhogue
Load in to PostgreSQL 9.4b
• COPY table_m FROM ’file.tsv’;
• Query with SQL + JSONB operators with fewer CASTS and validation errors at query time.
• A Tableau, R, ODBC, JDBC, ready analytical data source.
• Your JSON is now in SQL.
[email protected] @cwvhogue
Repo (In progress)
https://github.com/joyent/moray-etl-jsonbEmail: [email protected]
Acknowledgements: @benr