carnegie mellon university ©2006 - 2008 robert t. monroe 45-875 bi tools and techniques business...
TRANSCRIPT
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Business IntelligenceTools and Techniques
Robert Monroe
March 18, 2008
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Agenda
• Quick survey• Overview of Business Intelligence Tools and Techniques• Course structure, grading, and expectations • Data management fundamentals
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Survey
• Please complete and hand back the survey
• Survey helps me to:– Understand your goals and expectations for the course
– Evaluate your previous IT knowledge and experience
– … adjust the class accordingly
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Introducing Business Intelligence Tools and Techniques
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Corporations Are Drowning in Data
• … but thirsty for actionable knowledge• Our ability to collect and store data seems to have surpassed our
ability to make sense of it!• Important trends:
– Storage capacity continues to rise rapidly– Cost of storage continues to drop
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Business Intelligence
• Core question: How can an organization manage and leverage large data sets to make better business decisions?
• Business Intelligence (BI)– A broad category of applications and technologies for
gathering, storing, analyzing, and providing access to data to help enterprise users make better business decisions. (Wikipedia)
• Two common uses for BI tools– Measuring where you are / how your business is performing – Identifying problems and opportunities
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Business Intelligence Systems Improve Decision Making
Source: O’Brien, Management Information Systems, 6th ed.
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
In-Class Exercise
• Take out a piece of paper and pencil• Select a company that you are familiar with and a
managerial role in that company• Write down five pieces of quantitative information that
you would most want to have to manage your business (or your part of the business) effectively
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
A Business Intelligence Roadmap
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Module 1: Course Intro, Data Management Fundamentals
• What is Business Intelligence?– How can it help me make better
business decisions?– What kinds of questions can BI
tools help me answer?
• What is the relationship between data, information, & knowledge?
• What does it mean to ‘Compete on Analytics’– Why would I want to do so?– How might I do so effectively?
Data
Info
Knowledge
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Module 2: Data Warehousing
• What is a Data Warehouse?– How about a Data Mart?
– How is a Data Warehouse different from a ‘regular’ database?
• Why do we need another database that just duplicates data that we already have?
• How can fill a data warehouse with comprehensive, timely, and high-quality data?
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Module 3: Reporting and OLAP
• How do I convert the data in my data warehouse into actionable information or knowledge?
• What tools are available to help non-programmers analyze warehouse data?
• What is dimensional modeling? Why is it powerful?
• What kinds of questions are OLAP tools designed to answer?
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Module 4: Info Viz and Data Mining
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Module 5: Dashboards
• What is an executive dashboard?– Are they only for executives?
– Why are they useful?
– What are their drawbacks?
• How can I implement dashboards effectively in my organization?
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Module 6: ‘Real-Time’ Business Intelligence
• How can we move from historical analysis to ‘real-time’ analysis?
• Why is this hard to do in practice?
• What tools and techniques are available to support real-time analysis?
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Module 7: Implementing BI, Ethical use of BI
• What does my organization need to do to implement a successful BI program?
• What ethical issues arise with BI capabilities?
• How can we insure that our BI capabilities are used ethically? – What does it mean to do so?
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Dashboard: Expected Effort
• First two weeks focus on BI foundation– Eat your vegetables, exercise more
• Middle classes focus on using various BI tools effectively– Use the tools, Luke
• Final classes combine fundamentals, tools, people, processes, and ethics– Pull it all together
R
eadi
ng L
oad
Week #
Wor
k w
ith B
I Too
ls Week # →
→
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Course Structure, Grading, and Expectations
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Course Goals
• Understand how to apply various Business Intelligence (BI) tools and techniques to analyze and evaluate large data sets to make better business decisions
• Understand the benefits, drawbacks, and applicability of various approaches to BI
• Improve awareness of a variety of challenges and ‘gotchas’ that arise when implementing BI systems– … and how to avoid them
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Course Philosophy
• Focus on applying BI technology to solve business problems, not building BI systems
• You will develop new skills by doing and participating– You will need to use the BI tools– When in doubt try something, experiment– Most work done in teams – learn from/with your peers– Casual interactive class – your participation is important
• Many of the technologies we will look at are relatively new– Not everything will work perfectly the first time…– Flexibility, patience, and a willingness to explore will help a lot
• Let’s have some fun – life’s too short to do otherwise
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Expectations, Etiquette, and Academic Integrity
• Waitlist• Office hours, 3:30 – 4:30 MWF• Expectations and etiquette• Academic integrity
• Teaching Assistant– Bao-Jun Jiang, [email protected]
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Pass/Fail
I allow students to take the course pass/fail provided that they agree to:– Attend class regularly
– Prepare for class as if they were taking it for a grade
– Complete all of the assignments
– Take the final exam at its regular time and place
– Complete all of the necessary administrative paperwork
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Blackboard And The Course Wiki
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Grading
• Grades will be computed as follows:– Homework exercises (3) 45%
– Final exam 30%
– Class attendance, preparation, 25%and participation
Late assignments policy: 25% deduction each day late I curve final grades, not individual assignments Please see regrade request policy in syllabus document
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Assignments
• Three homework assignments– Groups of 2-4 people
• Assignment #1: Data warehousing– Analyze data warehousing scenario and make business, technology, and
process recommendations based on your analysis (management option)– Create a simple data warehouse and ETL process to load it (tech option)
• Assignment #2: Reporting and OLAP tools– Use Microsoft’s Reporting and/or OLAP tools to retrieve, analyze, and
present useful information from a data warehouse and OLAP cubes
• Assignment #3: Case analysis, dashboards or visualizations– Case analysis – Continental or SYSCO cases (management option)– Analyze scenario/case and design dashboard(s) and/or data visualizations
to meet business needs (tech option)
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Computing Resources
• There are many good BI platforms
• We will primarily use Microsoft’s SQL Server 2005– Client tools– Reporting Services– Analysis Services– Integration Services (ETL tool – optional)
• We will also experiment with a variety of other BI tools
• You must provide a laptop that can run SQL Server 2005 client– At least client tools, servers are optional– 600Mhz proc, 512MB of RAM, 0.5–2.0GB of disk space– Install instructions are available on Blackboard– Please try to install SQL Server 2005 client tools before Monday’s class
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Data Management Fundamentals
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Definitions
• What is the difference between data, information, and knowledge?
• Data is a collection of raw value elements or facts used for calculating, reasoning, or measuring. Data may be collected, stored, or processed but not put into a context from which any meaning can be inferred. [Los03]
• Information is the result of collecting and organizing data in a way that establishes relationships between data items, which thereby provides context and meaning. [Los03]
• Knowledge is information to which experience, interpretation, and reflection are added by individuals so that it becomes a high value form of information
– The OR Society http://www.orsoc.org.uk/about/topic/projects/kmwebfiles/knowledge.htm
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Exercise
3/21/05 $27.74 3/22/05 $27.013/21/05 $19.78 3/22/05 $19.723/21/05 $21.41 3/22/05 $21.503/21/05 $83.81 3/22/05 $84.24
MSFTINTCCSCOIBM
3/21/05 3/22/05 3/22/05 3/21/05 3/22/05 3/22/05 3/21/05 3/21/05 $27.74 $19.78 $21.41 $83.81 $27.01 $19.72 $21.50 $84.24 CSCO MSFT INTC IBM
Closing Stock Prices
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Goal: Convert Data to (Actionable) Knowledge
Data
Info
Knowledge
IncreasingValue
Why is this so hard to do in practice?
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Challenge: What To Capture and Store?
• The amount of data that can be captured is enormous– Storing data is relatively cheap ( free @ the margin)– Structuring and retrieving data is relatively expensive– Converting large data sets to actionable knowledge tends to be
relatively challenging and expensive
• Rules of thumb for deciding what to capture and store– Start with what you want to get out and work backwards– Evaluate what is already available– Insure that you capture high-quality data– Analyze fundamental data requirements for the enterprise,
independent of the specific project at hand
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Exercise: What To Capture And Store
• Scenario 1: You are a marketing VP for a large chain food retailer. You need to figure out how to properly price and promote a specific brand of snack chips over the next year
• What questions do you need to ask?• What analyses would you like to do to answer them?• What data will you need to do these analyses?• Where will you get that data?
– Is your organization likely to already have all the data that you need?
– Are there other data sources that you should try to take advantage of and incorporate into your analyses?
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Exercise: What To Capture And Store
• Scenario 2: You are an executive at Ferrari who needs to decide how to allocate the latest and greatest sports car your company is introducing in six months to maximize your company’s profits long-term
• What questions do you need to ask?• What analyses would you like to do to answer them?• What data will you need to do these analyses?• Where will you get that data?
– Is your organization likely to already have all the data that you need?
– Are there other data sources that you should try to take advantage of and incorporate into your analyses?
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Exercise: What To Capture And Store
• Scenario 3: You are a HR executive responsible for recruiting salespeople. Your bonus each year is directly tied to how well the salespeople you bring in do in their first three years at your company. You’ve read Moneyball and Competing on Analytics, and you want to take a more analytic approach to your job
• What questions do you need to ask?• What analyses would you like to do to answer them?• What data will you need to do these analyses?• Where will you get that data?
– Is your organization likely to already have all the data that you need?– Are there other data sources that you should try to take advantage of and
incorporate into your analyses?
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
The Relational Data Model
• The Relational Model has become the de-facto standard for managing operational business data
• Core concepts in a relational model:– Tables (relations)
– Records (rows)
– Data fields (columns)
– Primary keys
– Foreign keys
Products
Product ID Description Color Size Qty Available
52 Shoes (pair) Blue 10 25
64 Socks (pair) White Large 200
145 Blouse Green 7 14
158 Pants Blue 32/34 0
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Data, Information, Database Example
Purchases
Order ID Customer Name Product ID Quantity Date
5623 Jimmy Hwang 52 3 12/15/2004
5624 Sue Smith 64 5 12/16/2004
5625 Jane Chen 145 1 12/16/2004
Products
Product ID Description Color Size Qty Available
52 Shoes (pair) Blue 10 25
64 Socks (pair) White Large 200
145 Blouse Green 7 14
158 Pants Blue 32/34 0
Jimmy Hwang purchased 3 pairs of size 10 shoes on 12/15/2004
What other information can we derive from these data tables?
Data in Database Tables Information
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Relational Data, Tables, Records, and Metadata Example
Purchases
Order ID Customer Name Product ID Quantity Date
5623 Jimmy Hwang 52 3 12/15/2004
5624 Sue Smith 64 5 12/16/2004
5625 Jane Chen 145 1 12/16/2004
Products
Product ID Description Color Size Qty Available
52 Shoes (pair) Blue 10 25
64 Socks (pair) White Large 200
145 Blouse Green 7 14
158 Pants Blue 32/34 0
Table Name: ProductsProductID Int (pkey)Description Text(50)Color Text(50)Size Text(20)QtyAvailable Int
Table Name: PurchasesOrderID Int (pkey)CustomerName Text(75)ProductID Int (fkey)Quantity DecimalDate DateTime
Data (Records) in Database Tables Metadata
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Normalization And Denormalization
• Data normalization is the process of decomposing relations with anomalies to produce smaller, well-structured relations– Basic idea: each table only holds data about one ‘thing’
• Goals of normalization include:– Minimize data redundancy
– Simplifying the enforcement of referential integrity constraints
– Simplify data maintenance (inserts, updates, deletes)
– Improve representation model to match “the real world”
• Normalization sometimes hurts query performance
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Example: Denormalized Table
• Insertion anomaly: when an employee takes a new class we need to add duplicate data (Name, Dept_Name, and Salary)
• Deletion anomaly: If we remove employee 140, we lose information about the existence of a Tax Acc class
• Modification anomaly: Employee 100 salary increase forces update of multiple records
• These anomalies exist because there are two themes (entity types) into one relation – course and employee, resulting in duplication, and an unnecessary dependency between the entities
Employee
Emp_ID Name Dept_Name Salary Course_Title Date_Completed
100 Margaret Simpson Marketing 48000 SPSS 6/19/2005
100 Margaret Simpson Marketing 48000 Surveys 10/7/2004
140 Alan Beeton Accounting 52000 Tax Acc 12/8/2004
110 Chris Lucero Info Systems 43000 SPSS 1/12/2004
110 Chris Lucero Info Systems 43000 C++ 4/22/2003
190 Lorenzo Davis Finance 55000
150 Susan Martin Marketing 42000 Java 8/12/2002
150 Susan Martin Marketing 42000 SPSS 6/19/2005
Example Derived from Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Normalizing Previous Employee/Class Table
Course_Completion
Emp_ID Course_ID Date_Completed
100 1 6/19/2005
100 2 10/7/2004
140 3 12/8/2004
110 1 1/12/2004
110 4 4/22/2003
150 1 6/19/2005
150 5 8/12/2002
Employee
Emp_ID Name Dept_Name Salary
100 Margaret Simpson Marketing 48000
140 Alan Beeton Accounting 52000
110 Chris Lucero 43000
190 Lorenzo Davis Finance 55000
150 Susan Martin Marketing 42000
Course
Course_ID Course_Title
1 SPSS
2 Surveys
3 Tax Acc
4 C++
5 Java
This seems more complicated
Why might this approach be superior to the previous one?
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Indexing
• An index is a table or other data structure used to determine the location of rows in a file that satisfy some condition
• Indices reduce the time needed to retrieve records• … but increase the time and cost to insert, update, or delete• Indexing is critical for high performance in large, complex db’s,
– Especially data warehouses and data marts
Products
Product ID Description Color Size
52 Shoes (pair) Blue 10
145 Socks (pair) White Large
62 Blouse Green 7
12 Pants Blue 32/34
532 Skirt Green 7
… … … …
Product_Index
Product ID Row
12 4
52 1
62 3
145 2
532 5
… …
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Alternative Data Models
• The relational data model is the current de-facto standard for storing and managing corporate data
• There are other data storage models, usually associated with legacy systems– The data you need for your analysis may be stored in them!
• Four common alternative data models– Flat file– Hierarchical– Network– Object
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Structured Query Language (SQL)
• SQL provides a standard language for describing, manipulating, and querying data from relational databases
• SQL allows applications to interact with databases without requiring a tight binding between the application and the underlying DBMS
• All of the major relational database vendors implement some form of SQL in their database products
• Example Query:SELECT ProductName, ProductPriceFROM ProductsWHERE SupplierName=‘Acme’ORDER BY ProductsPrice DESC;
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Query Example
English: Find the 10 most expensive products that we stock
SQL:
SELECT TOP 10 Products.ProductName AS TenMostExpensiveProducts, Products.UnitPriceFROM ProductsORDER BY Products.UnitPrice DESC;
Query Results:
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Transactional and Analytical Systems
• Transactional systems:System that are used to run a business in real time, based on current data. Also called “systems of record”
• Analytical systems:Systems designed to support decision making based on historical point-in-time and prediction data for complex queries or data mining applications
• BI systems are generally analytical systems
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Examples of Transactional and Analytical Systems
Transactional System Examples• Supermarket checkout
system• ATM machines• Purchase order processing• Student course registration• Warehouse/inventory tracker• Airline ticketing system• E-Z Pass
Analytical System Examples• Data warehouses• Data marts• Enterprise spend analysis
– Where do we spend our $$$
• Sales force productivity analysis– By sales person, region, or
product line
• Product-line profitability analysis– Which products are most
profitable?– Which do we lose money on?
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Why Not Use Operational Data Stores For BI?
• It is good practice to separate operational and analytical systems and data
• Why?– To improve system performance
– To improve database managability and maintainability
– Optimize each type of system for it’s primary purpose
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Wrap Up
Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
For Thursday
• We will be discussing part 1 of Competing on Analytics– Reading assignment is available on the wiki
• Come prepared to apply the concepts in part 1 of the book in class discussions to analyze how some well-known organizations might be able to improve their business by aggressively pursuing the principles of analytic excellence described in the book– Feel free to suggest organizations to discuss prior to class: I’ll
be taking requests as I spin your favorite on-the-fly cases – Post suggestions for organizations to discuss in class, along
with a brief description of why they would be an interesting to discuss, to the course wiki by Wednesday evening.