dw and bi architecture, design and implementation · dw and bi architecture, design and...
TRANSCRIPT
© 2011 IBM Corporation
DW and BI Architecture,Design and ImplementationTuesday November 15, 2011
Mike CainDB2 for i Center of ExcellenceRochester, MN [email protected]
© 2011 IBM Corporation
Today’s Reporting Requirements
• Remove Dependency on IT– Ease IT backlog of reporting requests– Reduce Report Maintenance– Empower End Users
• Client Independence– Web Based
• Reduced Software Maintenance
• Multiple Viewing Options– Dashboards/Scorecards– Spreadsheet Integration– Board Room Quality PDF
• Automated Report Distribution– E-mail and mobile distribution
• Application Integration– Reporting as a function of Line of Business apps– Portal interfaces
© 2011 IBM Corporation
Terms, Concepts, Philosophies…
© 2011 IBM Corporation
What is Business Intelligence?
© 2011 IBM Corporation
What is Business Intelligence?
OS/EAI-Operation Systems/Enterprise Application Integrations
Source: The Data Warehousing Institute, Smart Companies in the 21st Century, July 2003
Trending/OLAP Data Mining (Predictive Analytics)
Business Performance Management
REPORTINGWHAT HAPPENED?
MONITORWHAT IS
HAPPENING? ANALYSIS
WHY DID IT HAPPEN? PREDICT
WHAT WILL HAPPEN?
ODS / Data Warehouses / Data Marts) Real-Time Data
DBMS
Data Mining Query/
Reporting OnLine
Analytics Dashboards/Scorecards
© 2011 IBM Corporation
Customers… C tableOrder Summaries… O tableOrder Details… D tableItems… I tableSales People… S table
Very good design for maintenance and transactions
DB2
COD
I S
Normalized OLTP Data Base
© 2011 IBM Corporation
Update customer informationTake an orderRecord a payment
OLTP usually works with small sets of data
DB2
COD
I S
Follow a transaction
© 2011 IBM Corporation
Who are my best customers?
DB2
COD
I S
But Ask A Simple Question
© 2011 IBM Corporation
Who are my best Salespeople?Who are they selling to?What are they selling?When are they selling it?
DB2
COD
I S
And more Questions…
OLTP designs start to fall apart
© 2011 IBM Corporation
Are you in Spreadsheet quicksand?
Rekeyed
SourceSystems
ERP System
POS
Spreadsheets
Other Sources
Excel
Annual RepQuarter1298 this is a bogus report &is only for thepurpose of cre-ating an icon...
Reports
Excel
Excel
ExcelExcel
Excel
Access Excel
Excel
1 + 1 = 2
1 + 1 = 3
1 + 3 = 7
2 + 1 = 1.5
Rekeyed
Rekeyed
Downloaded
Rekeyed
Cut & Paste
Downloaded
Uploaded
1 + 1 = 2
Rekeyed
© 2011 IBM Corporation
Are you stuck with “green screens”?
© 2011 IBM Corporation
“The most widespread technical problem reported by practitioners was slow query performance.”
• Survey of over 2000 companies that have implemented Business Intelligence Applications– The BI Survey 8 – Nigel Pendse,
Are you sluggish and tardy with information?
© 2011 IBM Corporation
Managing the Access to Production Data
• Shield report authors and end users from complexities of the database
• Optimize the environment and infrastructure
• Minimize impact on production systems
• Get some assistance
© 2011 IBM Corporation
The Enterprise Data Warehouse Architecture
Data Propagation
ODS Data Warehouse
Operational System(s)
SalesFinance
Data Mart
Data Mart
Data Mart
Mfg
Extraction, Transformation and Loading
Tactical operational
decision support
Data Staging Area
PC or Browser Web Visualization Products
Cleansed, TransformedData
OLAPApplications
ChangedDataCapture
© 2011 IBM Corporation
Define Requirements
Create/Refine Data Model Design Data
WarehouseSource Data Warehouse
Data
Select Data Mart Tools
Design Data Mart Prototype Data
Mart
Source Data Mart Data
Tune Data Warehouse and
Data Mart
Monitoring Usage and
Maintenance
Technical Education and Project Management are needed throughout
Methodology to create Business Intelligence applications that can grow rapidly over time.
– Each subsequent data mart should be easier and less costly to implement.
– More value added to the business in a shorter amount of time.Some steps can be done in parallel and by two separate groups.Most of process is iterative so activities/conclusions in one steps can lead you back to others.
Methodology
© 2011 IBM Corporation
Reasons you may choose a data warehouse– Different requirements than OLTP system and application
– Manage larger volumes of data in a different data model• Star Schema
– Add or augment data from sources other than production systems• Purchased demographic data• Non IBM i databases
– Cleanse/Transform the data• An ODS does not necessarily solve a lot of “data quality” issues
– Work Management• Separate server/partition allows for different tuning knobs to be turned• May be a different allocation of resources to manage this very different workload
– Separation of Powers• Data Warehouse Team versus Operational Systems Team• Separate Decisions
– OS or resource upgrades
– Single Version of the Truth
© 2011 IBM Corporation
Common “data” Challenges
• Hidden meanings and conditional rules…– 2nd character of column X means ..– if column Y = ‘S’, value Z must be multiplied by -1– If record type is ‘1’, there must be a matching record in table B.
If type is ‘2, there may be a record. If type is ‘3’ there should not be a record.
– For data older than 2/11/2003, column X will be blank – but it must be a valid value from then on.
• Data errors…– failed lookups– invalid dates– missing values– inconsistent values
© 2011 IBM Corporation
Common “data” Challenges
• The same, but different…– multiple instances of same table, with duplicate key values
– or different versions of same entity• incompatible data types• duplicates
CUSTNO CUSTNAME1001 John Smith1002 Mary Jones1003 Chris Anderson1004 David Perry
Customer File - USCUSTNO CUSTNAME
1001 Harry Potter1002 Jeremy Carr1003 Penny Hayes1004 Debbie Thornton
Customer File - Canada
CUSTID CUSTNAMAA234 Julie JohnsonAA235 Fred HunterAB670 John SmithBD309 Alan Jordan
Customer File - CanadaCUSTNO CUSTNAME
1001 John Smith1002 Mary Jones1003 Chris Anderson1004 David Perry
Customer File - US
© 2011 IBM Corporation
Unlimited formats, structures & attributes
Source 1
Source 2
Source 3
Personal Name Address Information
Bob Christiansan 416 Columbus Ave #2, Boston, Massachusetts 02116Kate A. Roberts 4 New York Plaza Floor 23, Manhattan NY, 10036 James Trenton 125-A Washington, Los Angeles, CA 90066
Robert Christiansen Four sixteen Columbus Avenue APT2, Boston, Mass 02116Katherine Roberts Four NY Plaza, FL-23, New York New York, 10036Trenton, James 125 Washington Unit A, LA, California, 90066
R.J. Christensen 416 Columbus Suite #2, Suffolk County 02116Mrs. K. Roberts 4 NY Plaza, LVL23, NYC 10036Mr & Mrs J.Trenton One-twenty-five Washington #A, Los Angeles Cnty 90066
Common “data” Challenges
© 2011 IBM Corporation
E.T.L.
Extract data from somewhere(may be MANY sources)
Transform it somehow(may be simple or extensive)
Load it somewhere else(and load it FAST)
Transport it somehow(may be simple or complex)
Source(s) Target
© 2011 IBM Corporation
Surrogate key is a sequential integer
with no correlation to replaced value(s)
CUSTNO CUSTNAME1001 John Smith1002 Mary Jones1003 Chris Anderson1004 David Perry
Customer Table - USCUSTNO CUSTNAME
1001 Harry Potter1002 Jeremy Carr1003 Penny Hayes1004 Debbie Thornton
Customer Table - Canada
CUSTNUMBER CUSTNAME REGION OLDNUM1 John Smith US 10012 Mary Jones US 10023 Chris Anderson US 10034 David Perry US 10045 Harry Potter CANADA 10016 Jeremy Carr CANADA 10027 Penny Hayes CANADA 10038 Debbie Thornton CANADA 1004
Customer Table - Data Warehouse
PKSecondary Index
Transformation Example: Surrogate Keys
© 2011 IBM Corporation
E.T.L.
• There are two VITAL additional requirements
Validate – remember: garbage in / garbage out!
Manage – what do you do with bad data?– what represents good data?– how do you administer ETL jobs?
Validate
Transform
Manage
60-80% of the work
© 2011 IBM Corporation
ETL Alternatives
• Build it yourself– Usually not recommended
• IBM i centric– IBM InfoSphere Change Data Capture– Information Builder’s Data Migrator– Coglin Mill’s Rodin
• Cross Platform– IBM InfoSphere DataStage– Informatica
© 2011 IBM Corporation
Remote Journaling during normal business processing hoursTrickle Feed Staging Area/ODSEliminate EXTRACTION impact on production systems
No Charge Feature of IBM iRequires Program (e.g., CDC) to read data from journal receivers
Can add SQL logic to remove unwanted fields, change data types,
DW Staging
Area IBM i LPAR
DWERP
IBM i LPAR
Remote Journaling
ShippedLogs
CDC
StagedData
Or ODS
ETL Tool
Virtualization Engine Technologies•Optimize resources for supporting production and daytime data warehouse queries
•High speed data transfers over Virtual Ethernet
•Common Backup and other Shared I/O
Near Real Time Architecture
© 2011 IBM Corporation
On Line Analytical Processing (OLAP)
• OLAP is INTERACTIVE and ITERATIVE– vs query, which is usually batch, list oriented result sets
• Accessing business data with numerous dimensions– 'anything' by 'anything' by 'anything' analysis– data can be easily analyzed from many different viewpoints– data is modeled to the business– summaries and aggregations are calculated– data is viewed across, down and through the various dimensions
• Helps answer business questions– How are my different departments performing?– Is this pattern the same every year?– Can we look at the information another way?
© 2011 IBM Corporation
OLAP is uniquely suited for applications like:
• Budgeting
• Planning
• Forecasting
• Business Modeling
• Financial Consolidation
• Sales & Performance Analysis
• Customer & Product Profitability
© 2011 IBM Corporation
MOLAP ROLAP# of users Many Few
engine Cubing Engine Query Optimizationarchitecture Depends DBMS Backend
via complex loading complex SQLmetadata in engine Meta Data Layer
Examples Cognos DB2 Web Query(OLAP option)
speed of thought Will varydata strategy Summary with drill
through to detail Summary or Detail
BI Tool Application BI Tool
SQL 3SQL 2SQL 1
Relational Data
Data Load
What is the right OLAP Technology?
© 2011 IBM Corporation
DB2 for i Enablers for Data Warehousing
• POWER7 Processors• Solid State Disks• IBM i 7.1• SQL Query Engine (SQE)
– Self Learning, Self Adapting, Self tuning
• Database Parallelism via DB2 SMP feature• Real time and autonomic statistics • Materialized Query Tables• Star Schema Join Optimization• Query Rewrite• Encoded Vector Indexing with Aggregates• Remote Journaling (Trickle Feed)• Single Level Storage and ASync Parallel IO• System wide Index Advisor• Autonomic Indexes
© 2011 IBM Corporation
• Cloud Infrastucture – Starter Kit for Cloud (SOD) – IBM i Provisioning
• Unlock Core Business Data– DB2 for i– IBM DB2 Web Query for i – IBM i for Business Intelligence
• High Availability– PowerHA SystemMirror for i– IBM i CBU with PowerHA
• Deploy New Business Solutions – Application Development– Rational Application Development Enhancements– Application Runtime Expert– Zend Server Community Edition with Zend DBi
• Virtualize and Consolidate– System, Storage, and Network Virtualization
Metropolitan
Transportation
and Roads
Tax and Revenue Managem
ent
Integrated Urban
Infrastructure
Safety and
Security
UnlockCore BusinessData
High Availability
Virtualize&
Consolidate
CloudInfrastructure
Deploy NewBusinessSolutions
Top Client Initiatives
new
October 2011 Announcements - IBM i 7.1 + TR3
© 2011 IBM Corporation
IBM i for Business Intelligence is a packaged solution that is easy to order and easy to implement
This solution combines the strengths of Power Systems (P7), IBM i 7.1, DB2 for i, DB2 Web Query, and data transportation software to deliver an integrated for Business Intelligence platform
This new solution is offered as three configurations (small, medium, and large BI Editions) that include software and required licensing, as well as three days of services designed to get you up and running
See your IBM representative or business partner for more information
IBM i for Business Intelligence
© 2011 IBM Corporation
• Base Program Product– Order via econfig and AAS (5733-QU2)– Will require IBM i V5R4 at a minimum– Named user based pricing
• Minimal Users included in BASE
– Modernize Query/400 Reports– 4 Development tools
• Additional Chargeable IBM Features– Active Reports (Disconnected)– On Line Analytical Processing – Developer’s Workbench– Run Time User Licensing– Spreadsheet Client– SQL Server Adapter– JD Edwards Application Adapter
• Additional DB2 Web Query PRODUCTS– Automated Report Distribution with
Report Broker– Application Integration with the Software Development Toolkit
IBM DB2 Web Query for System i Powered By Information Builders
http://www.ibm.com/systems/i/software/db2/webquery
31
© 2011 IBM Corporation
© 2011 IBM Corporation