final exam review spring 2011. exam format about 75 questions 45% multiple choice and t/f 30%...
TRANSCRIPT
Final Exam ReviewSpring 2011
Exam formatAbout 75 questions
45% multiple choice and T/F
30% short fill-ins
25% short-paragraph explanations
What to study50 questions from Exam 1 and 2
12-15 questions about presentation topics
10-13 questions will come from labs Concepts from
Market Basket: Data Mining SCM: RFID (hardward) + XML(concept) Fund Trading: DBs for Optimization Pivot Chart: DBs for Discovery & Prediction Wagemart: DBs for Decision Support
Market Basket Analysis Support: Probability (P) that an item is in someone’s checkout basket
A,B,E A,B,F A,B A,B,F,G A,D,F
C,D C,D,G E,F,G E,F E,G
P(A) = 5/10 = 50%
P(AB) = 4/10 = 40%
P(C) = 2/10 = 20%
P(CD) = 2/10 = 20%
Market Basket Analysis Confidence X Y = P(XY)/P(X) : If item X is purchased, what is
the probability that item Y is also purchased
Confidence B A = P(AB)/P(A) = 40%/50% = 80%
Confidence C D = P(CD)/P(C) = 20%/20% = 100%
Given: P(A) = 5/10 = 50%
P(AB) = 4/10 = 40%
P(C) = 2/10 = 20%
P(CD) = 2/10 = 20%
Market Basket AnalysisQuality X Y = Confidence X Y * P(YX)
High quality association rules
Quality A B = 80% * 40% = 32%
Quality C D = 100% * 20% = 20%
Apriori Algorithm: Calculate high quality association rules given
billions of transactions millions of items
Complex Association Rule ADGMS CLPT (50% quality, 80% confidence), i. 5 items (A,D,G,M, and S) imply with great
confidence that 4 items (C, L, P, and T) are purchased.
Without the Apriori Algorithm, the calculation would take too long (millions of years).
Apriori Algorithm How it works:
By setting minimum support level, the algorithm can prune low confidence pairs (2-itemsets) to compute 3-itemsets.
Then, the pruned 3-itemsets can compute 4-itemsets. The algorithm is guaranteed to return all the itemsets above the minimum support level.
When you get to 5-, 6-, or 7-itemsets, the pruning reduces the number of possible sets from trillions to a few thousand or hundred, which can help humans discover very complex, high quality association rules.
Importance of Apriori AlgorithmA process
An innovation that takes terabytes of data and reduces it to meaningful rules
Raw Data Relevant and Timely Information
A.I. Data Mining
Market Basket Analysis
Pivot Chart LabGreat example of Online Analytical
Processing (OLAP)
Slice & Dice data (Temp., Mood, Day, Weather)
Drill Down (look at only incorrect predictions)
Unlike Data Mining, the process is interactive
a person participates in the process The process is Ad. Hoc.
The process is not pre-determined like Apriori Algo.
Significance of Pivot Chart LabBusiness Intelligence (like A.I.)
Use OLAP to find patterns
Encode patterns as IF statements to predict future cases.
The spreadsheet can automate the human decision making process on a large scale, faster than a human.
Such a system enables timely, accurate predictions without a human decision-maker (Business Intelligence System)
Excel Pivot Charts as a toolFirst: Pattern is noticed
Second: Interactive analysis tools (Pivot Chart) helps to confirm and pin-point the pattern
Example: A marketer thinks that geography plays a role in sales; a Pivot chart shows that Southern stores do have better sales.
Database queries as toolsFirst: The data mining reveals numerous
patterns (association rules)
Second: Human intelligence can derive the theory behind the pattern.
Example: The Apriori algorithm discovers a high quality association rule (Beer Diapers). Later, Marketers try to unravel the reason why. The data analysis must come before the hypothesis
because the data is too big for humans to analyze.
Fund Trading Lab Decision Support Automation: Using a Database to
compute the optimal sequence of trades. Too many combinations for a human to analyze
Another Example of Business intelligence
1. At first we use a graph and human intuition to make the trades
2. We do better if we use a query to calculate and sort all possible transactions
3. We use Database tools to pick the best one’s that don’t overlap
Decision Support Systems: Wagemart vs. Fund TradingWagemart
start with tons of data individual salaries,
availability
reduce it to simple info total cost, average rating
to help make a decision.
Fund Trading
start with less data Fund value for each day
compute every possible transaction Much more data
Queries are used to find the optimal transactions
Decision Support Systems: Wagemart vs. Fund Trading Both system model scenarios to compute the
outcome of decisions
one is structured one scenario to optimize
the other unstructured many different scenarios to consider
Fund Trading was more structured, i.e., you can only buy and sell; you just have to decide the optimal day and funds to buy/sell.
Wagemart was very unstructured, many different ways to cut costs.
Porter’s 5-forcesDo companies complete because its fun?
Maybe some…
They compete because of the threat of going out of business.
Profitability is the penultimate measure of success
Why?
What are the threats?A new competitor
Will take away your sales and profits? Because they are better?
In business what does better really mean?
The five forces/threatsNew entrants
Substitute products
Rivalry
Bargaining power of consumers
Bargaining power of suppliers
ExampleTarget forces their supplier to use XML-
formatted shipment data and boxes tagged with RFID chips.
Apple refuses and wins
Target has to use Apple’s system to sell Apple’s products.
What force is this?
ExampleIndirect: Brooke visits Google Shopping and
Shopzilla to compare prices on a new camera. She’ll buy from the most inexpensive online
retailer
Direct: Bradley uses Lending Tree.com where banks try to underbid each other to get his business.
ExampleDisney World implements a new ride tracking
system, that directs visitors to the rides with shortest wait times.
Forces Universal Studios to invest in a similar system.
ExampleEveryone at the gym is using their iPhone or
Android phone to listen to music
MP3 players are now collecting dust
ExampleNetflix emerges and puts 120 Blockbuster
videos stores out of business
Competitive strategiesTo fight the forces
1. Do something totally new (innovation)
2. Be inexpensive (cost leadership)
3. Be big to increase power (growth) Lock-in your customers Lock-out your competition
4. Make mutually beneficial partnerships (alliance)
5. Be different but in a good way (differentiation)
Put up barriers to the competition
Example Imagine if Blockbuster decided to use
Internet/Mail delivery before Netflix.
But Blockbuster was NOT ______________
By the way, Netflix created a totally new process for renting videos. How does an IS make this possible? How is the IS better than the old-fashioned
process.
E-commerce It was an innovation at one point
Now it necessary to stay in business
ExampleWalmart’s efficient supply chain cuts cost.
RFID and XML play a role
Their size allows them to negotiate low prices with suppliers. Large companies absolutely need information
systems for good management
Walmart’s strategy is 2-fold.
How do Information Systems really help businesses to compete?
The labs provide many examples
RFID, XML More accessible, timely information for improving
supply chain.
Market Basket More relevant information for increasing
sales/profits
How do Information Systems really help businesses to compete?
The labs provide many examples
Wagemart More accurate information for modeling decisions
Pivot Chart & Fund Trading Flexible information; manipulated in real-time to
solve problems (prediction & optimization)
The 11 information attributes are fair gameFlexibility and accessibility are different.
Putting something on the web makes it more accessible
Storing data electronically can make it more flexible
Putting electronic data in a robust, standardized format (XML) improves both.
Attribute Trade-offsSimple vs. Complete
Secure vs. Accessible
PresentationsDon’t forget to review presentations
The websites will be linked on Tuesday
Textbook ReadingLow priority
Top PriorityReview past exams and lookup correct answers
(Text and Google) Will post them on Tuesday
Skim lab materials and instructions on Blackboard
Create cheat sheet