a walk through the kimball etl subsystems with oracle data integration
Post on 14-Apr-2017
238 Views
Preview:
TRANSCRIPT
info@rittmanmead.com www.rittmanmead.com @rittmanmead 1
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Michael Rainey | KScope16
A Walk Through the Kimball ETL Subsystems with Oracle Data Integration
2
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Introduction
3
• Michael Rainey - Data Integration Lead - America- Oracle Data Integration expertise - Blog: http://ritt.md/mRainey - Oracle ACE Director
@mRainey
info@rittmanmead.com www.rittmanmead.com @rittmanmead
About Rittman Mead
4
•World’s leading specialist partner for technical excellence, solutions delivery and innovation in Oracle Data Integration, Business Intelligence, Analytics and Big Data
•Providing our customers targeted expertise; we are a company that doesn’t try to do everything… only what we excel at
•70+ consultants worldwide including 1 Oracle ACE Director and 3 Oracle ACEs, offering training courses, global services, and consulting
•Founded on the values of collaboration, learning, integrity and getting things done
Unlock the potential of your organization’s data
•Comprehensive service portfolio designed to support the full lifecycle of any analytics solution
info@rittmanmead.com www.rittmanmead.com @rittmanmead 5
Visual Redesign Business User Training
Ongoing SupportEngagement Toolkit
Average user adoption for BI platforms is below 25%
Rittman Mead’s User Engagement Service can help
More info: http://ritt.md/ue
info@rittmanmead.com www.rittmanmead.com @rittmanmead
What’s Most Important for YOU in Data Integration?
6
• Big data?• Cloud?
• Financial Reporting on “one version of the truth”?
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Let’s take a walk…and talk about ETL
7
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Wait! What are Kimball ETL Subsystems?Do you all know of Ralph Kimball?
8
www.kimballgroup.com
Ralph Kimball founded the Kimball Group. Since the mid-1980s, he has been the DW/BI industry’s thought leader on the dimensional approach and trained more than 20,000 students. Prior to working at Metaphor and founding Red Brick Systems, Ralph co-invented the first commercially-available workstation with a graphical user interface at Xerox’s Palo Alto Research Center (PARC). Ralph has his Ph.D. in Electrical Engineering from Stanford University.
info@rittmanmead.com www.rittmanmead.com @rittmanmead
The Kimball GroupDo you all know of Ralph Kimball?
9
info@rittmanmead.com www.rittmanmead.com @rittmanmead
The Kimball 34 Subsystems of ETL
10
• Extracting Data - Data Profiling - Change Data Capture System - Extract System
info@rittmanmead.com www.rittmanmead.com @rittmanmead
The Kimball 34 Subsystems of ETL
11
• Cleaning and Conforming Data - Data Cleansing System - Error Event Schema - Audit Dimension Assembler - Deduplication System - Conforming System
info@rittmanmead.com www.rittmanmead.com @rittmanmead
The Kimball 34 Subsystems of ETL
12
• Delivering Data for Presentation - Slowly Changing Dimension
Manager - Surrogate Key Generator - Hierarchy Manager - Special Dimensions Manager - Fact Table Builders - Surrogate Key Pipeline - Late Arriving Data Handler
- Multi-Valued Dimension Bridge Table Builder
- Dimension Manager System - Fact Provider System - Aggregate Builder - OLAP Cube Builder - Data Propagation Manager
info@rittmanmead.com www.rittmanmead.com @rittmanmead
The Kimball 34 Subsystems of ETL
13
• Managing the ETL Environment - Job Scheduler - Backup System - Recovery and Restart System - Version Control System - Version Migration System - Workflow Monitor - Sorting System
- Lineage & Dependency Analyzer
- Problem Escalation System - Parallelizing / Pipelining System - Security System - Compliance Manager - Metadata Repository Manager
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Oracle Data Integration Solutions
14
Copyright*©*2015,*Oracle*and/or*its*affiliates.*All*rights*reserved.**|* Oracle*Open*World*2015* 1*
NoETL*Engine*100%*NaEve*Data*TransformaEon*
Data$Integrator$
Big$Data$Prepara/on$
GoldenGate$
Data$Quality$
Data$Service$Integrator$
Metadata$Management$
NonIinvasive*CDC,*RealEme*streaming*
data*delivery*
Profile,*Cleanse,*Match,*and*
Remediate*Data*
Prepare,*Secure,*Enrich*and*Publish*Unstructured*Data*
Catalog,*Trace*and*View*Models*across*
the*Enterprise*
Federate*Data*Across*DBs,*Services*and*ApplicaEons*
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Who’s coming with us?
15
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Data model - where we’re going
16
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Data model - where we’re going
16
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Data model - where we’re going
16
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Now let’s take a walk through the ETL Subsystems
17
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Extracting Data
18
• Data Profiling- Oracle Enterprise Data Quality • Change Data Capture System• Extract System- Oracle Data Integrator - Oracle GoldenGate
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Extracting Data
18
• Data Profiling- Oracle Enterprise Data Quality • Change Data Capture System• Extract System- Oracle Data Integrator - Oracle GoldenGate
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Extracting Data - Data Profiling with EDQ
19
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Extracting Data - Data Profiling with EDQ
19
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Extracting Data - Data Profiling with EDQ
19
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Extracting Data - Data Profiling with EDQ
19
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Extracting Data - Data Profiling with EDQ
19
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Extracting Data - Data Profiling with EDQ
20
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Extracting Data - Data Profiling with EDQ
20
• Small dataset due to sampling percentage
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Extracting Data - Data Profiling with EDQ
20
• Small dataset due to sampling percentage
• _projectid looks like a primary key
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Extracting Data - Data Profiling with EDQ
20
• Small dataset due to sampling percentage
• _projectid looks like a primary key
• Investigate school_district blanks
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Extracting Data - Data Profiling with EDQ
20
• Small dataset due to sampling percentage
• _projectid looks like a primary key
• Investigate school_district blanks
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Extracting Data - Oracle Data Integrator
21
• Extract from many different systems? Yes!
- Multiple technologies OOTB - Custom technologies can be added • Data Server - connection to the
data source- Physical Schema - Logical Schema
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Extracting Data - Oracle Data Integrator
22
• Models - Based on a single data
source • Datastores- Logically represent a
table, file, XML, etc - Reverse engineer or
build manually
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Extracting Data - Oracle Data Integrator
22
• Models - Based on a single data
source • Datastores- Logically represent a
table, file, XML, etc - Reverse engineer or
build manually
info@rittmanmead.com www.rittmanmead.com @rittmanmead
City
Extracting Data - Oracle Data Integrator
22
• Models - Based on a single data
source • Datastores- Logically represent a
table, file, XML, etc - Reverse engineer or
build manually
info@rittmanmead.com www.rittmanmead.com @rittmanmead
City
Extracting Data - Oracle Data Integrator
22
• Models - Based on a single data
source • Datastores- Logically represent a
table, file, XML, etc - Reverse engineer or
build manually
State
info@rittmanmead.com www.rittmanmead.com @rittmanmead
City
Extracting Data - Oracle Data Integrator
22
• Models - Based on a single data
source • Datastores- Logically represent a
table, file, XML, etc - Reverse engineer or
build manually
StateZip Code
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Extracting Data - Oracle Data Integrator
23
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Extracting Data - Oracle Data Integrator
23
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Extracting Data - Oracle Data Integrator
23
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Extracting Data - Changed Data Only
24
• Change Data Capture- Extract only the changed data since the last ETL extract • Methods- Audit columns - Timed extract - Full “diff compare” - Database log scraping
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Extracting Data - Changed Data Only
24
• Change Data Capture- Extract only the changed data since the last ETL extract • Methods- Audit columns - Timed extract - Full “diff compare” - Database log scraping
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Extracting Data - CDC with Oracle GoldenGate
25
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Extracting Data - CDC with Oracle GoldenGate
26
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Extracting Data - CDC with Oracle GoldenGate
26
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Extracting Data - CDC with Oracle GoldenGate
26
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Extracting Data - CDC with Oracle GoldenGate
26
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Extracting Data
27
• Data Profiling- Oracle Enterprise Data Quality • Change Data Capture System• Extract System- Oracle Data Integrator - Oracle GoldenGate
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Extracting Data
27
• Data Profiling- Oracle Enterprise Data Quality • Change Data Capture System• Extract System- Oracle Data Integrator - Oracle GoldenGate
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Cleaning and Conforming Data
28
• Data Cleansing System- ODI & EDQ • Error Event Schema- Built on ODI E$ tables • Audit Dimension Assembler• Deduplication System- EDQ • Conforming System
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Cleaning and Conforming - Data Cleansing System
29
• ODI - Check Knowledge Module- Check logical constraints - “Bad” data moves to error table • EDQ- Data cleansing audit processors
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Cleaning and Conforming - ODI Constraints
30
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Cleaning and Conforming - ODI Constraints
30
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Cleaning and Conforming - ODI Constraints
30
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Cleaning and Conforming - ODI Constraints
30
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Cleaning and Conforming - ODI Constraints
31
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Cleaning and Conforming - ODI Constraints
31
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Cleaning and Conforming - ODI Constraints
31
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Cleaning and Conforming - ODI Constraints
31
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Cleaning and Conforming - ODI Constraints
31
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Cleaning and Conforming - ODI Constraints
31
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Cleaning and Conforming - ODI Constraints
31
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Cleaning and Conforming - ODI Constraints
31
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Cleaning and Conforming - EDQ Data Cleansing
32
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Cleaning and Conforming - EDQ Data Cleansing
32
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Cleaning and Conforming - EDQ Data Cleansing
32
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Cleaning and Conforming - EDQ Data Cleansing
32
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Cleaning and Conforming - Error Event Schema
33
Image From: Data Warehouse Lifecycle Toolkit (Wiley Publishing, Inc: 2008).
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Cleaning and Conforming - Error Event Schema
33
SNP_CONDSNP_KEYSNP_JOIN
E$ Tables
Image From: Data Warehouse Lifecycle Toolkit (Wiley Publishing, Inc: 2008).
SNP_LPI_RUN
info@rittmanmead.com www.rittmanmead.com @rittmanmead
IOUG SELECT Journal Article
34
http://ritt.md/err-schema-odi
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Cleaning and Conforming - Deduplication System
35
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Cleaning and Conforming - Deduplication System
35
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Cleaning and Conforming Data
36
• Data Cleansing System- ODI & EDQ • Error Event Schema- Built on ODI E$ tables • Audit Dimension Assembler• Deduplication System- EDQ • Conforming System
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Delivering Data
37
• Slowly Changing Dimension Manager
• Surrogate Key Generator• Hierarchy Manager• Special Dimensions Manager• Fact Table Builders• Surrogate Key Pipeline• Late Arriving Data Handler
•Multi-Valued Dimension Bridge Table Builder•Dimension Manager System•Fact Provider System•Aggregate Builder•OLAP Cube Builder•Data Propagation Manager
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Delivering Data
38
• Slowly Changing Dimension Manager- ODI Integration Knowledge Module - Set SCD behavior type for each
target column • Surrogate Key Generator- Database Sequence objects and ODI Sequences • Fact Table Builder- Lookups in ODI
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Delivering Data - Slowly Changing Dimension in ODI
39
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Delivering Data - Slowly Changing Dimension in ODI
39
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Delivering Data - Slowly Changing Dimension in ODI
39
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Delivering Data - SCD in ODI - Surrogate Keys
40
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Delivering Data - SCD in ODI - Surrogate Keys
40
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Delivering Data - SCD in ODI - Surrogate Keys
40
Additional audit columns
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Delivering Data - SCD in ODI - Surrogate Keys
40
Additional audit columns
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Delivering Data - Fact Table Builder
41
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Delivering Data - Fact Table Builder
41
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Delivering Data
42
• Slowly Changing Dimension Manager- ODI Integration Knowledge Module - Set SCD behavior type for each
target column • Surrogate Key Generator- Database Sequence objects and ODI Sequences • Fact Table Builder- Lookups in ODI
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Managing the ETL Environment
43
• Job Scheduler• Backup System• Recovery and Restart System• Version Control System• Version Migration System• Workflow Monitor• Sorting System
• Lineage & Dependency Analyzer
• Problem Escalation System• Parallelizing / Pipelining
System• Security System• Compliance Manager• Metadata Repository Manager
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Managing the ETL Environment - Job Scheduler
44
• Create ODI schedule on execution object
- Tied to an agent and context
• Limited flexibility- Custom Fiscal Month end,
for example
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Managing the ETL Environment - Job Scheduler
45
• Alternative to ODI scheduler - external scheduling tool- ODI Scenarios and Load Plans can be executed via command
line script or web service
./startloadplan.sh LOAD_EDW GLOBAL 6 -AGENT_URL=http://localhost:20910/oraclediagent
info@rittmanmead.com www.rittmanmead.com @rittmanmead
12.2.1
Managing the ETL Environment - Version Control/Migration
46
• ODI 12.2.1 Lifecycle Management- Integrated with Subversion - Deployment Archives for code
migration between environments
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Managing the ETL Environment - Workflow Monitor
47
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Managing the ETL Environment - Workflow Monitor
47
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Managing the ETL Environment - Workflow Monitor
47
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Managing the ETL Environment - Workflow Monitor
47
Drilldown from ODI Session to SQL detailed activity report
Obtain real-time and historical agent statistics
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Managing the ETL Environment
48
• Job Scheduler• Backup System• Recovery and Restart System• Version Control System• Version Migration System• Workflow Monitor• Sorting System
• Lineage & Dependency Analyzer
• Problem Escalation System• Parallelizing / Pipelining
System• Security System• Compliance Manager• Metadata Repository Manager
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Where did we end up?
49
• The Kimball ETL Subsystems will guide your data warehouse program
• Oracle Data Integration can help you fully implement the ETL Subsystems
- Extract, Load, Transform with ODI and GoldenGate
- Profile and cleanse data with Enterprise Data Quality
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Where did we end up? One version of the truth…
50
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Questions?
51
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Questions?
52
• Websites- kimballgroup.com - rittmanmead.com/blog • Contact- info@rittmanmead.com - michael.rainey@rittmanmead.com • Twitter- @rittmanmead - @mRainey
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Rittman Mead at KScope16
53
Oracle GoldenGate and Apache Kafka: A Deep Dive into Real-Time Data Streaming
Michael Rainey | Monday Jun 27, 4:30pm | Level 2 - Missouri
Free-Form Data Visualizations: First Session
Charles Elliott | Tuesday Jun 28, 8:30am | Level 2 - Superior A
Lunch & Learn: BI and Data Warehousing
Michael Rainey | Tuesday Jun 28, 12:45pm | Ballroom Level - Sheraton II
Lunch & Learn: Big Data and Advanced Analytics
Mark Rittman | Tuesday Jun 28, 12:45pm | Ballroom Level - Sheraton III
OBIEE 12c and Essbase: What’s New for Integration and Reporting Against EPM Sources
Mark Rittman | Wednesday Jun 29, 10:15am | Ballroom Level - Sheraton III
A Walk Through the Kimball ETL Subsystems with Oracle Data Integration
Michael Rainey | Wednesday Jun 29, 11:30am | Level 2 - Mayfair
How to Brand and Own Your OBIEE Interface: Past, Present, and Future
Andy Rocha & Pete Tamisin | Wednesday Jun 29, 2:00 pm | Ballroom Level - Sheraton III
Free-Form Data Visualizations: Second Session
Charles Elliott | Wednesday Jun 29, 2:00pm | Level 2 - Superior AOracle Big Data Discovery: Extending into Machine Learning and Advanced Visualizations
Mark Rittman | Wednesday Jun 29, 3:15pm | Level 2 - Missouri
info@rittmanmead.com www.rittmanmead.com @rittmanmead 54
top related