mathematics online: some common algorithms

Post on 04-Nov-2014

6 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Brief overview of some basic algorithms used online and across data-mining, and a word on where to learn them. Prepared specially for UCC Boole Prize 2012.

TRANSCRIPT

Boole Prize 2012

Mark Moriarty

University College Cork

MATHEMATICS ONLINEData-Mining, Predictive Analytics, Clustering, A.I., Machine Learning… and where to learn all this.

3 SECTIONS:

• 1 - Overview to some applications of Maths online.

• 2 - Sample algorithms.

• 3 - Recommended online Maths courses.

SECTION 1 (MOTIVATION):MATHEMATICS IN ACTION

• User Clustering.

• Recommender Systems. Movie recommendations.

• Shopper analytics – send relevant coupons.

• Voice recognition. Machine Learning.

• Spam detection.

• Fraud detection.

• Facebook Feed.

• Google’s PageRank.

• DNA sequencing.

• Health analytics.

• Intelligent ad displays.

• etc.

AWKS…

“My daughter got this in the mail!

She’s still in high school, and you’re sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?! ”

HOW TARGET FIGURED OUT A TEEN GIRL WAS PREGNANT BEFORE HER FATHER DID

As Pole’s computers crawled through the data, he was able to identify about 25 products that, when analyzed together, allowed him to assign each shopper a “pregnancy prediction” score. More important, he could also estimate her due date to within a small window, so Target could send coupons timed to very specific stages of her pregnancy.

Take a fictional Target shopper who is 23, and in March bought cocoa-butter lotion, a purse large enough to double as a diaper bag, zinc and magnesium supplements and a bright blue rug. There’s, say, an 87% chance that she’s pregnant and that her delivery date is sometime in late August.

HOW KHAN ACADEMY IS USING MACHINE LEARNING TO ASSESS STUDENT MASTERY

Old method: To determine when a student has finished a certain exercise, they awarded proficiency to a user who has answered at least 10 problems in a row correctly — known as a streak.

New metric for accuracy…

What do I mean by accuracy? Now define it as

which is just notation desperately trying to say ”Given that we just gained proficiency, what’s the probability of getting the next problem correct?”

NETFLIX PRIZE

$1 million top prize for their verified submission on July 26, 2009, achieving the winning RMSE of 0.8567 on the test subset. This represents a 10.06% improvement over Cinematch’s score on the test subset at the start of the contest.

PANDORA & THE MUSIC GENOME PROJECT®

• On January 6, 2000 a group of musicians and music-loving technologists came together with the idea of creating the most comprehensive analysis of music ever.

• Together we set out to capture the essence of music at the most fundamental level. We ended up assembling literally hundreds of musical attributes or "genes" into a very large Music Genome.

FACEBOOK NEWS FEED & MACHINE LEARNING

FACEBOOK NEWS FEED

The default wall setting is "Top News“. EdgeRank is there to do the customizing for you, based on how each item scores in the algorithm. The three main criteria for an item's algorithm score are:1. Affinity: How often you and your friends interact

on the platform2. Weight: Each type of content is weighted

differently, based on the past interactions of that type of content

3. Time: How old the published item is

AD PLACEMENT

MACHINE LEARNING IS EVERYWHERE

Mario learns to survive: http://www.youtube.com/watch?v=m0tJLTXNT0A

SECTION 2:SOME ALGORITHMS, BROKEN DOWN

• Recommender Systems

• Logistic Regression

• K nearest neighbours

• K-means clustering

• Naïve Bayes Classifiers

RECOMMENDER SYSTEMS[CONTENT-BASED EXAMPLE HERE:]

CONTENT-BASED VS COLLABORATIVE

LOGISTIC REGRESSION

• At the most basic level, for one input variable, linear regression is simply “fitting a line to some data”.

• Let’s look at the in the sample case of the Khan Academy:

• vector x = the values of input features (eg. % correct).

• vector w = how much each feature makes it more likely that the user is proficient.

• We can write compactly as a linear algebra dot product:

LOGISTIC REGRESSION ALGORITHM

Already, you can see that the higher z is, the more likely the user is to be proficient. To obtain our probability estimate, all we have to do is “shrink” into the interval (0, 1). We can do this by plugging into a sigmoid function:

LOGISTIC REGRESSION RESULTS

From http://david-hu.com/2011/11/02/how-khan-academy-is-using-machine-learning-to-assess-student-mastery.html

K-NEAREST NEIGHBOUR

Tarring you with the same brush as your k nearest peers.

K-MEANS CLUSTERING

A personal favourite

K-MEANS ALGORITHM SUBSECTION:

• Introduction

• K-means Algorithm

• Example

• K-means Demo

• Relevant Issues

• Conclusion

K-MEANS: INTRODUCTION• Partitioning Clustering Approach

• a typical clustering analysis approach via partitioning data set iteratively

• construct a partition of a data set to produce several non-empty clusters (usually, the number of clusters given in advance)

• in principle, partitions achieved via minimising the sum of squared distance in each cluster

• Given a K, find a partition of K clusters to optimise the chosen partitioning criterion• K-means algorithm: each cluster is represented by the centroid of the cluster

and the algorithm converges to stable centres of clusters.

21 |||| iC

Ki i

E mxx

K-MEAN ALGORITHM• Given the cluster number K, the K-means algorithm is carried out in three steps:

Initialisation: set seed points• Assign each object to the

cluster with the nearest seed point

• Compute seed points as the centroids of the clusters of the current partition (the centroid is the centre, i.e., mean point, of the cluster)

• Go back to Step 1), stop when no more new assignment

K-MEANS DEMO

1. User set up the number of clusters they’d like. (e.g. k=5)

Credit to Ke Chen for the example graphics used on this and next few slides.

K-MEANS DEMO 1. User set up the number

of clusters they’d like. (e.g. K=5)

2. Randomly guess K cluster Center locations

K-MEANS DEMO 1. User set up the number

of clusters they’d like. (e.g. K=5)

2. Randomly guess K cluster Center locations

3. Each data point finds out which Center it’s closest to. (Thus each Center “owns” a set of data points)

K-MEANS DEMO 1. User set up the number

of clusters they’d like. (e.g. K=5)

2. Randomly guess K cluster centre locations

3. Each data point finds out which centre it’s closest to. (Thus each Center “owns” a set of data points)

4. Each centre finds the centroid of the points it owns

K-MEANS DEMO 1. User set up the number

of clusters they’d like. (e.g. K=5)

2. Randomly guess K cluster centre locations

3. Each data point finds out which centre it’s closest to. (Thus each centre “owns” a set of data points)

4. Each centre finds the centroid of the points it owns

5. …and jumps there

K-MEANS DEMO 1. User set up the number

of clusters they’d like. (e.g. K=5)

2. Randomly guess K cluster centre locations

3. Each data point finds out which centre it’s closest to. (Thus each centre “owns” a set of data points)

4. Each centre finds the centroid of the points it owns

5. …and jumps there

6. …Repeat until terminated!

RELEVANT ISSUES• Efficient in computation

• O(tKn), where n is number of objects, K is number of clusters, and t is number of iterations. Normally, K, t << n.

• Local optimum

• sensitive to initial seed points

• converge to a local optimum that may be unwanted solution

• Other problems

• Need to specify K, the number of clusters, in advance

• Unable to handle noisy data and outliers (K-Medoids algorithm)

• Not suitable for discovering clusters with non-convex shapes

• Applicable only when mean is defined, then what about categorical data? (K-mode algorithm)

RELEVANT ISSUES• Cluster Validity

• With different initial conditions, the K-means algorithm may result in different partitions for a given data set.

• Which partition is the “best” one for the given data set?

• In theory, no answer to this question as there is no ground-truth available in unsupervised learning

• Nevertheless, there are several cluster validity criteria to assess the quality of clustering analysis from different perspectives

• A common cluster validity criterion is the ratio of the total between-cluster to the total within-cluster distances

• Between-cluster distance (BCD): the distance between means of two clusters

• Within-cluster distance (WCD): sum of all distance between data points and the mean in a specific cluster

• A large ratio of BCD:WCD suggests good compactness inside clusters and good separability among different clusters!

CONCLUSION

• K-means algorithm is a simple yet popular method for clustering analysis

• Its performance is determined by initialisation and appropriate distance measure

• There are several variants of K-means to overcome its weaknesses

• K-Medoids: resistance to noise and/or outliers

• K-Modes: extension to categorical data clustering analysis

END OF K-MEANS SUBSECTION

• Nearly there now…

ALGORITHM:NAÏVE BAYES

• What is a classifier?

NAÏVE BAYES ALGORITHM• Want

• Use Bayes Rule:

• In English:

• Assume independence: probability of each word independent of others

)|( wordsspamP

)(

)()|()|(

wordsP

spamPspamwordsPwordsspamP

)()|()()|()( goodPgoodwordsPspamPspamwordsPwordsP

)|(...)|2()|1()|( spamwordnPspamwordPspamwordPspamwordsP

SECTION 3:TAKE FREE TOP-CLASS ONLINE MATH COURSES

• ml-class.org

• Udacity.com

• http://mitx.mit.edu/

FREE STANFORD CLASSES, SPRING 2012

SOME OFFER A STATEMENT OF ACCOMPLISHMENT

UDACITY.COM

ITUNES U

ITUNES U

For philosophy lectures, I recommend Dreyfus or Searle. -Mark

REFERENCES• “One Learning Hypothesis” image from http://www.ml-class.org

• Khan Academy discussion from http://david-hu.com/2011/11/02/how-khan-academy-is-using-machine-learning-to-assess-student-mastery.html

• K-Means images from http://www.cs.manchester.ac.uk/ugt/COMP24111/materials/slides/K-means.ppt

• Word equation for Naïve Bayes: http://www.wikipedia.org

• K nearest neighbours image from http://mlpy.sourceforge.net/docs/3.0/_images/knn.png

• Recommender Systems image from http://holehouse.org/mlclass/16_Recommender_Systems.html

QUESTIONS?

2012-22-02 UCC Boole Prize M@rkMoriarty.com

top related