Adding Value To Information

Download Adding Value To Information

Post on 08-Dec-2014




3 download




  • 1. Adding Value to Information via Analytics.Perspective from BA&MS Research and Projects May 2008

2. Outline

  • Historical perspective. When can analytics enhance value of information?
  • Using analytics to utilize information.
    • Supply chain
    • Workforce management
    • Carbon management
  • Using analytics to extract information.
    • Collaborative filtering, Netflix challenge
    • ASCOT
    • BANTER
  • Using analytics to collect information.
    • Prediction markets
    • Peer-to-peer services
    • Personal benchmarking

3. Information / Analytic services start up when a new sector of economic activity begins to take-off Information / Analytic Service Starting Points 2000 1990 1980 1970 1960 1950 1940 1930 1920 1900 IMS Health Brand Pharmaceutical market begins to take off R.L. Polk meets with Alfred Sloan to discuss information needs in growing auto market Polk Auto Registry Database A.C. Nielsen Network TV advertising opens up Early Mover position in an emerging market is critical Getty Images Digital Photography takes over Navteq GPS becomes commercially usable Stock market crash of 1907 Moodys aQuantive Internet advertising begins to grow Morningstar Take-off in individual mutual fund investing Fair-Isaac Consumer credit goes mass market 4. Outline

  • Historical perspective. When can analytics enhance value of information?
  • Using analytics to utilize information.
    • Supply chain
    • Workforce management
    • Carbon management
  • Using analytics to extract information.
    • Collaborative filtering, Netflix challenge
    • ASCOT
    • BANTER
  • Using analytics to collect information.
    • Prediction markets
    • Peer-to-peer services
    • Personal benchmarking

5. Utilizing Information

  • We consider situations where information is already available
    • From ERP or other business process automation tools
      • Historical data
      • Some enterprise generated view of the future
    • May be combined with purchased data from information services
    • Most examples now are within an enterprise or an enterprise driven value net
  • We focus on the case where analytics are applied to the information with the goal of optimizing the use of resources
  • Examples:
    • Supply Chain
    • Workforce management
    • Carbon management

6. Supply Chain Collaboration: IBM Buy Analysis Tool ( i BAT) Improve Inventory Cost in IBM's Extended Supply Chain Business Problem Solution Business Value

    • A significant percentage of IBMs hardware sales in high-velocity servers are sold through major channel partners such as Arrow, Ingram, and Tech Data.
    • Lack of alignment between procurement, manufacturing, and channel sales resulted in significant price protection and sales incentive costs for IBM and high inventory-related costs for our channel partners
    • Web-based collaboration platformfor IBMs channel replenishment planningthat c ombines innovative forecasting and inventory analytics with up-to-date visibility of channel sales and inventory data
    • Optimized buy recommendations for channel partnersbased on statistical forecasting techniques and risk-optimized inventory replenishmentmodels
    • Proactive r eview system that initiates demand shaping based on supply and demand imbalances
    • Standard SOA-based solution design which can easily be adapted to specific ERP environments
    • Patent-pending methodology
    • Cornerstone of IBM Server Groups Business Partner Transformation Initiative
    • Fully deployed with IBMslargest channel partners across the United States, Canada and Europe
    • Solution enables business partners to carry 15-25% less inventory without negatively impacting their delivery performance
    • Lower channel inventory resulted in lower price protection expenses for IBM, improved cash flow, and higher operating margins

7. Available to Sell (ATS) Find saleable product recommendations to consume excess inventory Business Problem Solution Business Value

    • With shrinking product lifecycles, component supply overages can quickly lead to obsolescence requiring costly inventory writeoffs.One way to avoid this costs is to find products to build and sell that would consume the excess supply.
    • In a complex product environment such as IBM Servers, product build-out typically requires additional procurement of non-excess parts to square with the excess supplies.With part commonality across many possible product configurations, this leads to an enormous number of potential build-out strategies to choose from.Additional factors such as part substitution, re-work costs, and marketing constraints make this a difficult optimization problem.
    • ATS Engine uses IBMs Watson Implosion Technology to find optimal sales recommendation portfolio given: excess part supplies, bill of material, procurement and value-add costs, product demand upper bounds, and product pricing.
    • Pegging module assigns excess consumption additional costs to each product in the sales recommendation allowing users to pick which build-outs to execute and promote in market.
    • What-if capability enables users to cost a targeted build-out plan, supporting end-of-life processes.
    • ATS Engine and Process fully deployed in IBMs Systems Technology Group since 2002.
    • Solution drove build-outs and sales recommendations which consumed $200 million worth of excess inventory in 2002.
    • Ongoing usage of the tool keeps excess supply from becoming obsolete.
    • System is integrated with IBMs Central Planning Engine with Web-based, on-demand availability within IBM STG.

8. Application Areas in Workforce Management Many opportunities to improve workforce management through utilization of information JAN APR JUL DEC DEMAND FORECASTING CAPACITY PLANNING STRATEGIC PLANNING TRAINING AND LEARNING SKILL&ENGAGEMENT ANALYTICS MATCHING & SCHEDULING ? x Now Target 9. Workforce challenges -The DATA is distributed in many enterprise applications

  • There is no single Enterprise Resource Planning tool for labor management
  • Supply (given in terms of roles or skills)
    • Traditional HR systems contain information about the current job
      • Structured: Position code, salary, location, shift, etc
      • Unstructured: Education, IBM courses, dept history, awards
    • New Job Role/Skill Set with job taxonomy and skill list
      • Full Text Resumes
  • Demand (given in terms of engagements or contracts)
    • Past and Current Contracts (and history of deal closure)
    • New opportunities: Sales Opportunity Database
  • Missing link
    • Bill of resources = set of skills required to deliver an engagement
    • But billing database includes detail (by individual) on employees participation in engagements
    • And additional sources include contractor/engagement data

10. Business Consulting Examples Can range from one month, one skill set.. .to more than 10 months, 16K hours, and wide range of job roles/skill sets Weekly variations appear to be driven by calendar effects, vacation schedules, and resource availability Supply Chain-PLM Engagements 11.

  • Several different sources of dataHigh level account information, such as
      • Client name
      • Account description
      • Offering information
      • Billing (Fixed price, best estimate)
    • Ledger information
      • Project cost, revenue
    • Labor claiming information
      • Hours claimed per week by each employee on a project
    • Employee information
      • Line of Business, Job Role, Skill Set, global resource, etc.
  • For US contracts over past 18 months
    • Approximately 10K accounts
    • More than 2M labor claim records

Analysis of Data to estimate Bill Of Resources

  • Data Issues
    • Cant tell if individual is deployed in primary Job Role/Skill Set
    • JR/SS table has current state only
      • Beginning to collect longitudinal data
    • High % of missing JR/SS information
      • JR/SS not tracked consistently at subcontractor or global resource level
      • No information for consultants no longer with IBM
  • Over 400 valid JR/SS combinations
  • Account descriptions give little to no indication of scope of work

History reflects what actually happened, not necessarily best practice 12. Engagement Profiling

  • Service offerings/opportunities are typically specified in terms of revenue and solution
    • Using statistical analysis and clustering, develop template staffing structure for offerings, which can be used to translate offering revenue forecasts and opportunity revenue into staffing resource requirements
    • Semi-automated and parameterized process for generating staffing templates and supporting software
  • Value
    • Standardized project templates allow for planning of staffing decisions at earlier stages of the engagement process, more reliable forecasting of resource needs and better workforce planning
    • Enables partners/project managers to quickly develop staffing plans early in the opportunity cycle
    • Predictive accuracy of 70-80% at engagement level and 90-95% at aggregate level formajorjob roles
    • Deployed by GBS in the Demand Capture Tool 2.1 released in December 2006

ABCClient Name Plan Names No Linked to other projects?4700000 Estimated Revenue 12/31/2004 End Date 1/2/2004 Start Date Package Configuration and Implementation Project Type SAP.SCM Modules SAP ISVSupply Chain Management Service Industrial Sector 13. Risk Based Capacity Planning Allows development of capacity plans according to business strategy. The best solution will be based on a combination of expected revenues/costs/profits, allowed risk tolerances with respect to revenue loss, and other business concerns, such as market-share and growth TECHNOLOGY ADOPTION PRODUCT SERVICES, US, 3Q05 Revenue at Risk ($M) Revenue curve Labor Cost curve Gross Profit curve 251 266 292 346 247 Capacity 14. Workforce Does Not Happen Overnight The use of analytics and optimization in workforce management applications requires significant maturity levels in terms of data, process and business understanding Automation Job taxonomies How to describe skills and activities View of supply Infrastructure and process to capture available resources Bills of materials Templates to describe projects/tasks to be performed View of demand Infrastructure, process and analytics to forecast demand Analytics & Optimization Nothing 15. Carbon as a New Variable in Supply Chain Decisions

  • Typical supply chain optimization only considers the direct monetary costs
  • Inventory and supply policies can be significantly different with the inclusion of broader environmental costs, and constraints
  • A good model can quantify both the cost and the carbon impact of various supply chain policies.
  • A comprehensive model can identify areas where carbon and cost reduction can be achieved simultaneously (e.g. minimization of wastage, rework etc)

Transportation Options Inventory Policy Options Quality CO 2 Cost Service Supply Chain Trade-offs Design Options Energy Options Packaging Options Process Options Component Options 16. Any Supply Chain Carbon View must be Multi-Dimensional Shrinkage ($, CO 2cost) Breakage ($, CO 2cost) Real Estate ($ cost) Handling ($, CO 2cost) Transportation ($, CO 2cost) Utilities ($, CO 2cost) Manufacturing ($, CO 2cost) Component Supply ($, CO 2cost) Packaging Options Transportation Options Energy Options Inventory Policy Options Process Options Supply Options 17. Green Sigma TM Carbon Management Dashboard 18. Outline

  • Historical perspective. When can analytics enhance value of information?
  • Using analytics to utilize information.
    • Supply chain
    • Workforce management
    • Carbon management
  • Using analytics to extract information.
    • Collaborative filtering, Netflix challenge
    • ASCOT
    • BANTER
  • Using analytics to collect information.
    • Prediction markets
    • Peer-to-peer services
    • Personal benchmarking

19. Extracting Information

  • We consider situations when vast amount of data is available.
    • Typically a mix of structured and unstructured data
    • Often incomplete and/or noisy data
  • Data may come from multiple sources, but typically includes at least some private data.
  • The data owner wants to use the data to improve some aspect of the business operations, but a specific business objective is typically not fully articulated.
  • Analysis (and pre-analysis data preparation) need to be automated.
  • Examples:
  • KDD cup and Netflix Challenge

20. 21. October 2006 Announcementof the NETFLIX Competition

  • USAToday headline:
  • Netflix offers $1 million prize for better movie recommendations
  • Details:
  • Beat NETFLIX current recommender model Cinematch by 10% based on absolute rating error prior to 2011
  • $50.000 for the annual progress price (relative to baseline)
  • Data contains a subset of 100 million movie ratings from NETFLIX including 480,189 users and 17,770 movies
  • Performance is evaluated on holdout movies-users pairs
  • NETFLIX competition has attracted 24,396 contestants on 19,799 teams from 155 different countries
  • 25115 valid submissions from 3335 different teams
  • current best result is 9.08% better than baseline (from 6.7% as of March 2007)

22. KDD-Cup 2007

  • The 2007 KDD-Cup was based on a subset of the Netflix prize data
    • The Netflix grand prize competition (a different task on the same data) attracts 24396 contestants on 19799 teams from 155 different countries (no IBM participants due to IP issues)
    • The data contains a subset of 100 million movie ratings from including 480,189 users and 17,770 movies
    • Ratings of users and movies were collected from Nov-1999 until Dec-2005
  • Task 1: Who Rated what in 2006
    • Given a list of 100,000 pairs of users and movies, predict for each pair the probability that the user rated the movie in 2006
  • Task 2: Number of ratings per movie in 2006
    • Given a list of 8863 movie, predict the number of additional reviews that all existing users will give in 2006

23. Task 1: Probability of a member rating a movie

  • Extracted features:
    • Movie-based features
      • Graph topology: # of ratings per movie (across different years), adjacent scores between movies calculated using SVD on the graph matrix
      • Movie content: similarity of two movies calculated using Latent Semantic Indexing based on bag of words from (1) plots of the movie and (2) other information, such as directory, actors
    • User profile
      • Graph topology:#rating per user (across different years),adjacent scores between users in the graph calculated using SVD
      • User content: user preference based on the movies being rated: key word match count
  • Learning Algorithm:
    • Single classifiers: logistic regression, Ridge regression, decision tree, support vector machines (best run: RMSE = 0.2647)
    • Nave Ensemble: combining sub-classifiers built on different types of features with pre-set weights (best run: RMSE = 0.2642)
    • Ensemble classifiers: combining sub-classifiers with weight learnt from the development set (best run: RMSE = 0.2629)

24. Task 2: Number of additional ratings per movie

  • Perform in depth analysis of the domain
    • All movies and users were in the NETFLIX database already in Dec 2005
    • Model the aging process of movies
  • Understand the way the specific data for the competition was created
    • The new ratings in 2006 were split into two sets by random sampling of movies
    • The ratings for Task 1 were sampled according to the MARGINAL distribution of ratings in 2006
    • We can use the test set for Task 1 as a surrogate training set for Task 2
    • short of a scaling factor that is unknown, and modeled separately
  • Estimate Poisson regression on the marginal as found in test set for task 1
    • Variables: Lagged reviews, genre, age, director, actor,
    • Correct for missing duplicates based on the estimated rating marginal of the users
  • Estimate the Scalar to rescale from marginal to total
    • 4 Poisson regression models: 1, 2, 3 and 4 quarter ahead prediction of the number of ratings for all movies
    • Correct for decreasing user base by creating lagged datasets with removed users after deadline
  • Key point: Understanding the data domain and how the sampling was done was critical factor in accuracy of prediction

25. ASCOT( A utomatedS earch forC ollaborationO pportunities byT ext-mining)

  • We currently build OnTARGET models to predict purchase probability for existing IBM clients as well as Whitespace -- e.g. will they purchase an IBM Rational software product?
    • These models use historical IBM transactional data joined with D&B data
    • What if we added indexed content crawled from each companys website?
  • We apply Active Feature Acquisition to minimize number of web sites we need to crawl

We find interesting terms on a company website that increases likelihood of a Rational SW purchase And the resulting model is more accurate than our existing OnTARGET model Improvementdue to web content Percent of Websites Processed Accuracy (AUC) Active Feature Acquisition Random Acquisition With Web Content Existing OnTARGET model (Without Web Content) 0510152025 26. BANTER( B logA nalysis ofN etworkT opology andE volvingR esponses) 77MBlogs TechnologyBlogs Enterprise Software Blogs

  • 1. How do we identify the relevant sub-universe of blogs?
  • We submit set of relevant keywords to Technorati, include out-linked blogs,and then refine this sub-universe via active learning
  • 2. How do we determine authorities in this sub-universe?
  • We use page-rank-like algorithms against cross-reference structure, combined with SNA concepts (e.g. Information Flow)
  • 3. How do we detect emerging topics and themes in this sub-universe?
  • One approach is to predict link (cross-reference) formation using network evolution and content (keywords) at the nodes (blogs)
  • 4. How do we detect sentiment associated with specific posts?
  • One approach is to learn a model using text features against labeled product ratings (1-5 stars) scraped from Amazon

OBJECTIVE: Apply machine-learning to extract business insight from technology-based blogs OpenID Buzz in January 27. Outline

  • Historical perspective. When can analytics enhance value of information?
  • Using analytics to utilize information.
    • Supply chain
    • Workforce management
    • Carbon management
  • Using analytics to extract information.
    • Collaborative filtering, Netflix challenge
    • ASCOT
    • BANTER
  • Using analytics to collect information.
    • Prediction markets
    • Peer-to-peer services
    • Personal benchmarking

28. Outline

  • Historical perspective. When can analytics enhance value of information?
  • Using analytics to utilize information.
    • Supply chain
    • Workforce management
    • Carbon management
  • Using analytics to extract information.
    • Collaborative filtering, Netflix challenge
    • ASCOT
    • BANTER
  • Usinginformation andanalytics to collectmoreinformation.
    • Prediction markets
    • Peer-to-peer services
    • Personal benchmarking

29. Collecting (more) Information

  • Can available data be made more useful through the addition of a small amount of additional data?
    • What to collect?
    • How to collect?
    • Where (from whom) to collect?
    • Given what you have, how do you determine what else do you need?
  • What additional data is becoming available?
  • How can it be effectively utilized?
  • Examples:
  • Prediction markets: collective prediction of event probabilities, ranking bets in prediction markets to figure out experts.
  • Peer-to-peer services: information exchange to establish reputation, common interests, groups of similar peers.
  • Personal benchmarking

30. What is a Prediction Market?

  • An online forum, usually in a stock market format, thatgathers collective wisdomfor decision-making and forecasting
    • One method of Crowdsourcing or using the wisdom of crowds
    • Considered an emerging Enterprise 2.0 technology
      • Concept is decades old, but until recently was not used within enterprises
  • Questions are posed regarding future events, andparticipantsvote by investing in their forecastusing virtual currency
    • i.e., IBM stock price will hit $120 by January 1st, or Proposition123 will pass into law before YE 2008
  • Different markets for different topics, events or decisions
    • No specific knowledge or expertise is required , regardless of the topic
  • Stock Pricesare interpreted as event probability , while analysis of trading behavior provides valuable data on how information flows
  • Participants are recognizedfor their prediction accuracy, providingmotivation to share valuable knowledge- truthfully
  • Contains algorithms foraggregating diverse opinions
  • Often used as sole prediction method, but also used to complementother forecasting mechanisms
  • Synonyms include:Predictive markets, information markets, decisionmarkets, idea futures, event derivatives, virtual markets

31. Political Examples 32. Public prediction markets? 33. Collective intelligence harnessed from prediction marketsyields myriad benefits for enterprises and employees

  • Strategic foresightinto emerging issues fromlarge, diverse and global population
  • Quick,efficient aggregationof employeeknowledge
    • Insight which even the best Business Intelligencesolution could not provide
  • Real-time analyticson social networking, social capital
  • More effectiveandmore accuratethan polls, surveys, ratings
  • Circumvention of bureaucracyimpeding flow of information
  • Elimination of personal biasesin decision-making
  • Improvedinnovationcultureand employeemorale
    • Participants given a voicein decision-making and/or forecasting
    • Sponsors provide non-monetaryincentivesfor employees todisclosevaluable informationand oftenuntapped knowledge
    • Increase invisibility and opportunities for participantsby building areputationfor gooddecision-makingandforesight

34. Do they work?Properly executed prediction markets are more accurate than teams of experts, or any other traditional forecasting method

  • Examples of Market Accuracy
  • TheIowa Electronic Markets (IEM)predictions for the presidentialelections between 1988 and 2000 were off by an average of 1.37%; more accurate than any exit polls
  • InTradeMarketscorrectly forecast the 2004 presidential race in all 50 states and 49:50 State Senate races
  • HPs internal prediction market, over a three year period, outperformed HPs official printer sales forecasts 75% of the time
  • Intel established a prediction market to allocate manufacturingcapacity, which yielded a 100% efficiency improvement
  • Siemens prediction market, to assess their ability to meet a project deadline, correctly forecast the missed deadline; management had predicted success
  • Hollywood Stock Exchange (HSX)correctly predicted32:39 Oscar nominees and 7:8 Oscar winners in 2006
  • Farmers Almanac has long been a trusted source for weather predictions because of its surprising accuracy

35. Peer-to-peer services

  • Governments and large institutions are becoming less effective and efficient at providing affordable and reliable basic services (retirement benefits, health care, insurance, education) for individuals.
  • Individuals need to become increasingly self-sufficient in these regards
    • Individuals are turning to other individuals in a peer-to-peer fashion, to tap into the collective knowledge and financial pockets of communities (both virtual and physical).
  • In developing countries self-sufficiency may be only practical solution.
  • As peer-to-peer networks progress from serving lighter (e.g., entertainment) needs to serving these long-term, basic needs, a more robust set of IT, communications and business services is required
    • manage new peer-to-peer applications
    • provide high-quality information and analytics services to individuals.

36. Needs and Opportunities

  • Peer-to-peer services (e.g., social/micro lending, peer-to-peer insurance, homeschooling) are growing
  • There are risks and sources of uncertainty associated with peer-to-peer service: - Reliability and accuracy of web-based data - Fraud & Reputation (how do you know who you are really dealing with?) - Security of personal information - Reliability of web-based IT infrastructure
  • These risk factors are not new. However, the models required to adequately capture thecharacteristics of uncertainty in a peer-to-peer services environment may be differentfrom traditional models used in more centralized business environments.
  • Additionally,the types of services that participants in the P2P environment require may also be different(e.g., more personalized uncertainty analytics services, mobile web).
  • Core technologies are available and gaining adopters (P2P, electronic health records, social networking sites, business integrity, business intelligence)
  • Will we see an emergence of companies whose business is to support P2P services networks?

37. Example: Peer-to-Peer Lending

  • Potentially transformative financial business -Prosper , a peer-to-peer borrowing and lending system.
    • The system lets anybody make a case for why they need to borrow money.
    • Lenders can select which cases they want to take onand easily put a little money to work in dozens or even hundreds of them, diversifying their risk.
  • Since launch, over 200,000 consumers around the world have become Zopa members, as they seek the innovative loans and returns on investments that Zopa offers.
    • More recentlygrowth has been boosted by the global credit crunch which is driving unprecedented demand for P2P loansas banks become less competitive and tighten their lending criteria.
  • Online peer-to-peer lending services,Prosper ,ZopaandCircleLendingall have significant lead time and lots of venture backing;
    • Zopa, for example, has raised around $34 million.
  • Lending Clubis the first of its kind tointegrate its services into a social network.
  • These services are generating a huge number of lending transactions
    • How can this transaction data be utilized to provide new information to government and/or industry?

38. 39. Peer-to-Peer Insurance

  • Peer-to-Peer Insurance ispreparing to launch a new type of insurance product, is based on pooling people together to insure each other at rates cheaper than they currently pay, without automatically losing the money they pay as premium. The Peer-to-Peer Insurance Project:
    • Peer-to-Peer Auto Insurance(safe drivers pooled together to insure each other)
    • Peer-to-Peer Home Insurance(categories of homeowners pooled together to insure each other)
  • Value Proposition:
    • Participantswill not automatically, and permanently, lose all the money paid for coverage.
    • Incentive for safe driving (personal, and social good)
    • Credit score will not be used to set premium.
    • No age discrimination
    • No fine print. None of that sleek legal lingo buried in the middle of a thousand pages of policy.
  • What information is used to create pools? What information about pool is provided to participants? New methods for calculating risk may be required.

40. Personal benchmarking

  • Log onto your favoriteweb browser and you'll likely be offered a chance to do some personal benchmarking.
  • There are opportunities to compare everything from body mass index to the trade-in value of your car or how your local school district ranks.
  • Beyond a chance to feed any competitive streak, benchmarking can motivate change and help monitor progress.
  • But what else can the information be used for?

41. Examples

  • Carbon Footprint - Calculate, Reduce and Offset.
    • www.carbonfootprint.comcalculates, compares to national average and proposes products to reduce or offset the footprint (like donating money for reforestation)
    • Enter information about your car make and model and miles you travel. Energy bills, flights you take, number of people in household, state of residence.
    • Can use for targeted marketing of alternative energy sources, hybrid cars, even travel packages.
  • Health and Fitness
    • www.revolutionhealth.combuilds your profile, enables members to create webpages on topics interesting to them, supports blogs and communities, helps people find communities with similar health related interests.
    • Enter information such as age, interests, health history, fitness routine, etc.
    • Can use for health insurance marketing, drug marketing, weight loss programs, etc.

42. Examples

  • Diving community
    • allows members to create a profile and log diving information.
    • Enter information about where and when you dive, how long, how deep, with whom, what equipment you bought for how much and when.
    • Can use for profiling travel preferences, frequency, destinations. Independent travel vs. large resorts, consumer profile, level of risk averseness.
  • Knittingcommunity
    • ,a members only knitting community, launched in May 2007
      • By February 2008 had over 80,000 members.
      • Adds 800+ per day, but waiting list is consistently over 5000
    • Includes stash and project management tools, connections to flickr for images of finished items, pattern repository, forum, groups (2 IBM groups, 4 math groups)
    • Enter information about what you own, finished and current projects, etc
    • Used for event and product promotions, pattern and material sales, social networking and assorted competitive events

43. Conclusions

  • Amount of information is growing
    • IT automation
    • Instrumentation
    • End users
  • There are established analytics methods for extracting addition value from data
    • For standard automated business processes
  • There are new analytic methods being developed
    • To support new business processes and business models
    • To leverage combinations of public and private data
  • Scalability will continue to be an issue
  • Personalization of analytics is an opportunity
  • Early Mover position in an emerging market is critical