1 business intelligence technologies – data mining lecture 5 personalization, k-nearest neighbors
Post on 01-Jan-2016
Embed Size (px)
*Business Intelligence Technologies Data Mining
Lecture 5 Personalization, k-Nearest Neighbors
*PersonalizationPersonalization/customization tailors certain offerings by providers to consumers based on knowledge about them with certain goals in mind.CustomerPersonalized offeringsHow?
*What is Currently Being PersonalizedPersonalized recommendations of products and services e.g., recommend books, CDs and vacations;Personalized products for individual consumerse.g., custom-made CDs, Dell computersPersonalized emails Personalized content e.g., Yahoos personalized home pageAmazons channel managementPersonalized (dynamic) prices
*Sample Automated E-mail
*Personalization ProcessUnderstand-Deliver-Measure Cycle
*Building Profiles from DataData NeededPersonal information, preferences & interestsRegistration data, including demographic data Customer ratingsPurchasing dataWhat was bought, when and where Browsing & visitation dataClickstream (Weblog files)Building customer profiles Demographic (e.g., name, address, age)Behavioral (e.g., favorite type of book adventure, largest transaction - $295)Things learned from data
*Matchmaking ProblemExample: Large e-Commerce Site10M customers1M productsQuestion: How to match (target) the products to individual customers? What 10 books (out of 1M) should I show to Jane on her homepage?Solution: To do matchmaking, usecustomer profilesvarious recommendation technologies
*Recommendation TechnologiesCollaborative filteringFind the closest customers and recommend what they buyContent-based filteringSee what a customer has bought in the past, and use this information to predict what he would like in the future. e.g. Recommendation things that are similar to the things he bought before.Rule-based approachIdentify business rules about what products should be recommendedExample:IF a customer fits a certain profile (e.g. male, age 25-35), THEN recommend a certain set of products.
*Nearest Neighbor ApproachesBased on the concept of similarityMemory-Based Reasoning (MBR) k Nearest Neighbor (KNN)Collaborative Filtering (CF)
*K Nearest Neighbor (KNN)K-Nearest Neighbor can be used for classification/prediction tasks.
Step 1: Using a chosen distance metric, compute the distance between the new example and all past examples.
Step 2: Choose the k past examples that are closest to the new example.
Step 3: Work out the predominant class of those k nearest neighbors - the predominant class is your prediction for the new example. i.e. classification is done by majority vote of the k nearest neighbors. For prediction problem with numeric target variable, the (weighted) average of the k nearest neighbors is used as the predicted target value.
*Each example is represented with a set of numerical attributes
Closeness is defined in terms of the Euclidean distance between two examples. The Euclidean distance between X=(x1, x2, x3,xn) and Y =(y1,y2, y3,yn) is defined as:
Distance (John, Rachel)=sqrt [(35-41)2+(95K-215K)2 +(3-2)2]John:Age=35Income=95KNo. of credit cards=3Rachel: Age=41Income=215KNo. of credit cards=2How do we determine our neighbors? -Distance Measure Revisited.
*Example : 3-Nearest Neighbors K-Nearest Neighbor Classifier
CustomerAgeIncomeNo. credit cardsResponseJohn3535K3NoRachel2250K2YesHannah63200K1NoTom59170K1NoNellie2540K4YesDavid3750K2?
*Example K-Nearest Neighbor ClassifierYes (2/3)
CustomerAgeIncome (K)No. cardsResponseJohn35353NoRachel22502YesHannah632001NoTom591701NoNellie25404YesDavid37502
Distance from Davidsqrt [(35-37)2+(35-50)2 +(3-2)2]=15.16sqrt [(22-37)2+(50-50)2 +(2-2)2]=15sqrt [(63-37)2+(200-50)2 +(1-2)2]=152.23sqrt [(59-37)2+(170-50)2 +(1-2)2]=122sqrt [(25-37)2+(40-50)2 +(4-2)2]=15.74
*Some Issues with Euclidian DistanceScaling of valuesSince each numeric attribute may be measured in different units, they should be standardized.Weighting of attributes: Manual weighting: Weights may be suggested by expertsAutomatic weighting: Weights may be computed based on discriminatory power or other statistics. (e.g. in SAS, weighted dimension is based on the correlation to the target variable.)Treatment of categorical variablesVarious ways of assigning distance between categories are possible.
*Dealing with Categorical ValuesFor categorical values, we can to convert them to numeric values.
We might treat being in class A as 1, and not in class A as 0. Therefore, two items in the same class have distance 0 for that attribute, and two items in different classes have distance 1 for that attribute. For example:Take the bridge attributes: (deck type, purpose) Take the bridges:Bridge 1 = (concrete, auto)Bridge 2 = (steel, railway)Bridge 3 = (concrete, railway)We could compute distances as:d(Bridge1,Bridge2) = 1 + 1 = 2d(Bridge2,Bridge3) = 1 + 0 = 1d(Bridge1,Bridge3) = 0 + 1 = 1
Again, some form of weighting for attributes of different importance may be useful.
*Dealing with Categorical ValuesWe might also construct aggregation hierarchies, so that categories far away from each other conceptually are given higher distances.Concrete deckSteel deckPre-cast deckCast-at-site deckDeckUsing this hierarchy, we might regard the distance between pre-cast and cast-at-site as 1 (they have a common parent), while the distance between pre-cast and steel could be 2 (they have a common grandparent). The distance between concrete and steel would be 1 (they have a common parent).
*How to Decided K?Assume a new example X (at the center of the circles below). Notice that:With a 3-Nearest Neighbor classifier (inner circle), X is assigned to the majority Class B, whereasWith an 11-Nearest Neighbor classifier (outer circle), X is assigned to the majority Class A.Can use validation data set to decide k.Class AAttribute BClass BAttribute AXX
*Often work well for classes that are hard to separate using parametric methods or the splits used by decision trees.Simple to implement and useComprehensible easy to explain predictionRobust to noisy data by averaging k-nearest neighbors.Some appealing applications (e.g. personalization)Strengths of K-Nearest NeighborClass AAttribute AAttribute BClass BClass CClass D
*How to choose k ? Do we use 1 nearest neighbor, 10 nearest neighbors, 50 nearest neighbors?Computational cost: For a large database, wed have to compute the distance between the new example and every old example, and then sort by distance, which can be very time-consuming. Possible resolutions are:sampling: store only a sample of the historic data so that you have fewer distances to compute. Problems with K Nearest Neighbor (KNN)
*Applications of MBRMedicine / 911: Find which diagnosis was made for similar symptoms in the past, and adapt treatment appropriatelyCustomer Support (HelpDesk): Find which solution was proposed for similar problems in the past, and adapt appropriately (e.g. Compaqs SMART/QUICKSOURCE system)Engineering / Construction: Find what costing or design was made for projects with similar requirements in the past, and adapt appropriatelyLaw (Legal Advice): Find what judgment was made for similar cases in the past, and adapt appropriatelyAudit and Consulting Engagements: find similar past projectsInsurance Claims Settlement: find similar claims in the pastReal estate: Property price appraisal based on previous sales
*One seeks recommendations about movies, restaurants, books etc. from people with similar tastesAutomate the process of "word-of-mouth" by which people recommend products or services to one another.CF is a variant of MBR particularly well suited to personalized recommendations
Collaborative Filtering: Finding the like-minded people
*Collaborative FilteringStarts with a history of peoples personal preferencesUses a distance function people who like the same things are closeUses votes which are weighted by distances, so close neighbor votes count more
* Collaborative filteringConsumers preferences are registeredDavid is seeking recommendations on restaurants .Using a similarity metric, the similarity between another person and David is calculated based on their preferences (i.e., restaurant ratings).Their (weighted) average ratings for any given restaurant is computed, and restaurants with a high average score are recommended to David.
Restaurants Rating (0:bad - 10:Excelent)FridaysThai FoodThe BarnsUniversity CafeCosiDon51662Rachel14235David132??????
* Collaborative filteringDistanceDavid and Don: sqrt[(5-1)2+(1-3)2+(6-2)2]=6David and Rachel: sqrt[(1-1)2+(4-3)2+(2-2)2]=1Weighted Score6*(1/7) + 3*(6/7) = 3.42*(1/7) + 5*(6/7) = 4.6RankingCosi > University Cafe
Restaurants Rating (0:bad - 5:Excelent)FridaysThai FoodThe BarnsUniversity CafeCosiDon51662Rachel14235David132??????
*Collaborative Filtering: Drawback for sellersNeed real time recommendationScale millions of customers, thousands of itemsWorks well only once a "critical mass" of preference has been obtained Need a very large number of consumers to express their preferences about a relatively large number of products.Consumer input is difficult to get Solution: identify preferences that are implicit in people's actionsFor example, people wh