designing a data warehouse - what would a bi solution recommend?

16
Segah Meer Sr. Data Consultant, Professional Services Connect. Describe. Explore.

Upload: segah-meer

Post on 13-Apr-2017

1.850 views

Category:

Data & Analytics


4 download

TRANSCRIPT

Page 1: Designing a Data Warehouse - what would a BI solution recommend?

Segah MeerSr. Data Consultant, Professional Services

Connect. Describe. Explore.

Page 2: Designing a Data Warehouse - what would a BI solution recommend?

Designing a Data Warehouse- what would a BI solution recommend?

Page 3: Designing a Data Warehouse - what would a BI solution recommend?

4 Rules of Thumb

▪ Transparent E(T)L process

▪ Single copy of data

▪ Performance

▪ Shortest path

Page 4: Designing a Data Warehouse - what would a BI solution recommend?

Transparent E(T)L process

Perform transformations to optimize on performance and shortest-path, but avoid making broad assumptions about the final use case. Ex: how the revenue is calculated

account profit

1 1000

You seeaccount value

1 {revenue: 2000, expenses: 500,account_payable: 500, is_current: true}

2 {revenue: 2000, expenses: 100, is_current: false}

Actual Data

Page 5: Designing a Data Warehouse - what would a BI solution recommend?

Single Copy of Data

If data can change, store it in a single row. Avoid redundant tables. Ex: customer information

name phone_number

Segah Meer 650-575-5410

... ...

Segah Meer 650-575-5411

account profit

1 1000

account revenue cost

1 1500 500

OR

Redundant TablesDuplicate rows

Page 6: Designing a Data Warehouse - what would a BI solution recommend?

Performance

▪ databases focused on large data volume reads behave differently from those focused on frequent and “easy” inserts

▪ slow queries are a function of 1) LookML = f(model), 2) db resources, 3) and how the data is stored

Use flatter (wider) tables and don’t be afraid of redundant date columns

Page 7: Designing a Data Warehouse - what would a BI solution recommend?

Shortest Path

There is very little analytical value derived from modeling “long path” designs with Looker

extra +1 join adds modeling complexity

Page 8: Designing a Data Warehouse - what would a BI solution recommend?

Imagine a ride-sharing app

id created_at attribute_id

100001 2016-01-01 1

100002 2016-01-01 2

App Events

Example values:

id value

1 {json...}

Attributes

Page 9: Designing a Data Warehouse - what would a BI solution recommend?

One Intuitive Solution

- explore: events joins: - join: attributes sql_on: ${events.attribute_id} = ${attributes.id}

- explore: users joins: - join: attributes relationship: one_to_many sql_on: ${users.id} = ${attributes.user_id}

- joins: events relationship: one_to_many sql_on: ${attributes.id} = ${events.attribute_id}

- view: attributes fields: - dimension: user_id sql: JSON_EXTRACT(${value}, 'user_id')

- dimension: service_charge sql: JSON_EXTRACT(${value}, 'service_charge')

- dimension: amount sql: ${service_charge} + ${wait_charge} + ${tax}

Page 10: Designing a Data Warehouse - what would a BI solution recommend?

Let’s see how we did

Bad Bad Bad... Sure O.K.

Shortest Path ✗

Performance ✗

Single Source of Truth

Transparency ✓

Page 11: Designing a Data Warehouse - what would a BI solution recommend?

Can we do better?

id created_at event_type amount location

100001 2016-01-01 transaction 14.3

100002 2016-01-01 ride_started 37.7833° N, 122.4167° W

Production

Data Warehouse

... ... ...

.. .. ...

... ... ..

... ...

.. ..

... ...

ETL

Page 12: Designing a Data Warehouse - what would a BI solution recommend?

Pre-flattening the table

SELECT id , created_at , JSON_EXTRACT(attribute.value,'type') AS event_type , JSON_EXTRACT(attribute.value,'service_charge') + JSON_EXTRACT(attribute.value,'wait_charge') +JSON_EXTRACT(attribute.value,'tax') AS amount , JSON_EXTRACT(attribute.value,'location') AS locationFROM eventsLEFT JOIN attributes ON events.attribute_id = attributes.id

ETL

Page 13: Designing a Data Warehouse - what would a BI solution recommend?

Let’s see how we did #2

Bad Sure O.K.

Shortest Path ✓

Performance ✓

Single Source of Truth ✗

Transparency ✗

- explore: users joins: - joins: event_attributes relationship: one_to_many sql_on: ${users.id} = ${event_attributes.user_id}

- view: event_attributes fields: - dimension: user_id sql: ${TABLE}.user_id

- dimension: amount sql: ${TABLE}.amount...

Page 14: Designing a Data Warehouse - what would a BI solution recommend?

Let’s try another improvement

id created_at user_id service_charge

wait_charge

tax

100001 2016-01-01 1 10 3 1.3

Data WarehouseTransaction Events

id created_at user_id location

100002 2016-01-01 1 37.7833° N, 122.4167° W

Ride_started Events

Page 15: Designing a Data Warehouse - what would a BI solution recommend?

Let’s try another improvementModel- explore: events joins: - joins: transaction_events view_label: 'Events' relationship: one_to_one sql_on: ${events.id} = ${transaction_events.id}

- explore: users joins: - join: events relationship: one_to_many sql_on: ${users.id} = ${events.user_id}

- view: events: derived_table: sql: | SELECT id, created_at, user_id FROM transaction_events UNION ALL SELECT id, created_at, user_id FROM ride_started_events

- view: transaction_events...

- view: ride_started_events...

Page 16: Designing a Data Warehouse - what would a BI solution recommend?

Let’s see how we did #3

Bad Sure O.K.

Shortest Path ✓

Performance ✓

Single Source of Truth ✓

Transparency ✓