demystifying data modeling -...
TRANSCRIPT
DEMYSTIFYING DATA MODELING
© Sisense Inc, 2015
→Goals of Data Modelling
AGENDA
→The Challenge of Data Modelling
→Process of Data Modelling
→Three Challenges
→What Data Do I Need? - Map Data
→How Do I Connect Different Sources? – Join Data
→How Do I Want To Analyze Data? – Clean Data
→Demo: Sisense Data Modelling
© Sisense Inc, 2015
GOALS OF DATA MODELLING
© Sisense Inc, 2015
The ability to create a new report, dashboard or just get a new analytic question answered in real-time, or at least in-time.
…What's the Goal
© Sisense Inc, 2015
What Business Needs
ACCURATE UP-TO-DATE READY FOR ANALYSIS
© Sisense Inc, 2015
CHALLENGE OF DATA MODELLING
© Sisense Inc, 2015
CEO: "We need to increase our sales!"
MRKT MNGR: “What other offerings can we sell to customers?"
IT MNGR: While upgrading platforms and implementing a new CRM system, estimates that the information will be available in 20-30 days...
MRKT MNGR: A month? Don't we already have this data in our system?
IT MNGR: Yes the data is there but it DOESNT HAVE THE RIGHT STRUCTURE to answer those questions
MRKT MNGR: Keeps thinking: If the data is there, why is it so difficult to get answers?
IT MNGR: Keeps thinking: The marketing manager asks for weird things with no time at all!
CEO: Just wants to sell more
Most sold products? Most successful product bundles?
Imagine This Scenario
DISPERSED
WHAT MAKES DATA COMPLEX
SIZE
QUERY LANGUAGE
TYPE
STRUCTURE
GROWTH RATE
DETAIL
© Sisense Inc, 2015
DS DS DS
ETL (EXTRACT, TRANSFORM, LOAD)
CENTRALIZE
ANALYZE
DATA SOURCES
QUERY/IMPORT
Modelling Steps
DISPERSED
QUERY LANGUAGESTRUCTURE
SIZE GROWTH RATE
DETAILQUERY LANGUAGE
© Sisense Inc, 2015
MAP DATAWHAT DATA DO I NEED?
© Sisense Inc, 2015
WHAT DATA DO I NEED? - MAP THE DATA
Facts Filter & OrderDimensions
Key business entities (subjects) that we want to analyze
Performance measurementsA set of conditions and order that specify the data subset
that we want to look at
12
DIMENSIONS
Dimensions Are Mostly Categorical –
Each Has A Discrete Set Of Values
• Place – UK/USA
• Person - Customer
• Object - Products
• Time and Date - Year
• Process- Packaging
• Hierarchy – Country> City>Zip
FACTS
A set of conditions that specify the data subset
AND order in which to see the aggregations
FILTER & ORDER
• Greater than
• Between
• When
• True/False
• Range
Facts are presented in aggregate format: Max, Sum,
Average, Variance, Median, Count, Year-to -Date
• Number of transactions
• Quantity
• Amount
• Cost
• Revenue
• Discount
• Profit
13
Example Business Inquiry Structure
Show me an aggregation of certain
along certain
under certain
in a certain
Do certain stores sell “Bikes” significantly better than others?Do certain stores sell “Bikes” significantly better than others?
FACTS
DIMENSIONS
FILTER CONDITIONS
ORDER
14
Correspondence Between Business Question And SQL Queries
Select <Dimensions>,
<Facts>
From <Tables>
Where <Conditions>
Group by <Dimensions>
Having <Conditions>
Order by <Order Specifications>
“What were the best-selling
products this year, per
country?
(show only products that
sold more than 20,000
units)”
Select Country, Product,
Sum (quantity)
From OrdersSales
Where Getyear( SaleDate ) = 2015
Group by Country, Product
Having Sum (quantity) > 20,000
Order by State, sum (quantity)
1 2 3
Business Question SQL Structure SQL Query
JOIN DATA HOW DO I CONNECT DIFFERENT SOURCES?
© Sisense Inc, 2015
HOW DO I CONNECT DIFFERENT SOURCES? - JOIN DATA
Relationship Join Types Key
The way separate data sources can reference each other
The total portion of data included when connecting separate data sources
Field(s) used to connect data sources
Data Relationships
Many-to-ManySubjectStudent
How an instance of data from one source is related to data in another source
One-to-ManySongArtist
One-to-OneWifeHusband
© Sisense Inc, 2015
Data Relationships
What portion of the connected data is required for analysis
Inner Join Left Join Right Join Full Join
Other Join Options© Sisense Inc, 2015
TABLE A: SALES
PRODUCT ID
EMPLOYEE ID
ORDER DATE
DELIVERY DATE
PRODUCT ID
CLIENT ID
AMOUNT
TABLE B: STOCK
PRODUCT ID
STOCK DATE
UNITS
COST
EMPLOYEE ID
Data Keys
© Sisense Inc, 2015
CLEAN DATAHOW DO I WANT TO ANALYZE DATA?
© Sisense Inc, 2015
HOW DO I WANT TO ANALYZE DATA? – CLEAN DATA
Valid Accurate Complete & Consistent
Corrections related to missing, incomplete, incorrect or inconsistent data
Data is precise and shows the right values Data is correct and reasonable
Valid
Stable response
Example: Compare samplesHave a sufficient portion of data.
Example: Access comprehensive
portion of data
Measures what it is supposed to.
Example: Compare multiple
measurements
© Sisense Inc, 2015
Accurate
Data Capture
Example: Correct at source of entry
Data Decay + Movement
Example: Constant updates
© Sisense Inc, 2015
Complete and Consistent
Data correction
Example: Transform data
Data consistency
Example: Standardization
Data completeness
Example: Merge Data
© Sisense Inc, 2015
DATA MODELING IN SISENSE
© Sisense Inc, 2015
PREPARE FOR ANALYSSACCESS
Visual with
No Coding
Connect Directly to
Raw Data
Single Model - Many Sources, Rows & Columns
Drag & Drop to Join Varied Data Sources
Automatically Model
Based on Query
Complete Solution
ETL & Analysis
Change Incrementally
as Needed
ACCURATE + ON TIME
Ease of Modelling in Sisense
Synchronization
DEMO:
Sisense Data
Modelling Environment
Thank you
Image Credits
pakorn
Stuart Miles
winnond
adamr
sattva
markuso
Mister GC
John Kasawa
Images courtesy of
tungphoto
at FreeDigitalPhotos.net
© Sisense Inc, 2015