tópicos especiais em aprendizagem reinaldo bianchi centro universitário da fei 2012
TRANSCRIPT
- Slide 1
- Tpicos Especiais em Aprendizagem Reinaldo Bianchi Centro Universitrio da FEI 2012
- Slide 2
- 4a. Aula Parte B
- Slide 3
- O algoritmo K-means
- Slide 4
- K-Means n Algoritmo muito conhecido para agrupamento (clustering) de padres. n Usado quando se pode definir o nmero de agrupamentos: Escolha o nmero de agrupamentos desejado. Escolha centros e membros dos agrupamentos de modo a minimizar o erro. No pode ser feito por busca: muitos parmetros.
- Slide 5
- K-Means n Algoritmo: Fixe os centros dos agrupamentos. Aloque os pontos para o agrupamento mais prximo. Recalcule os centros dos clusters, como sendo a mdia dos pontos que ele representa. Repita at que os centros parem de se mover.
- Slide 6
- K-Means n Pode ser usado para qualquer atributo para o qual se pode calcular uma distncia
- Slide 7
- Clustering n Partitioning Clustering Approach: a typical clustering analysis approach via partitioning data set iteratively construct a partition of a data set to produce several non-empty clusters (usually, the number of clusters given in advance) in principle, partitions achieved via minimising the sum of squared distance in each cluster
- Slide 8
- Clustering n Given a K, find a partition of K clusters to optimise the chosen partitioning criterion: global optimal: exhaustively enumerate all partitions Heuristic method: K-means algorithm (MacQueen67): each cluster is represented by the center of the cluster and the algorithm converges to stable centers of clusters.
- Slide 9
- Algorithm n Initialisation: set seed points n Assign each object to the cluster with the nearest seed point; n Compute seed points as the centroids of the clusters of the current partition (the centroid is the centre, i.e., mean point, of the cluster) n Go back to Step 1), n stop when no more new assignment Given the cluster number K, the K-means algorithm is carried out in three steps:
- Slide 10
- Example n Suppose we have 4 types of medicines and each has two attributes: pH and weight index. n Our goal is to group these objects into K=2 group of medicine.
- Slide 11
- Example AB C D MedicineWeightpH-Index A11 B21 C43 D54
- Slide 12
- Step 1: Use initial seed points for partitioning Assign each object to the cluster with the nearest seed point Euclidean distance
- Slide 13
- Step 2: Compute new centroids of the current partition Knowing the members of each cluster, now we compute the new centroid of each group based on these new memberships.
- Slide 14
- Step 2: Renew membership based on new centroids 14 Compute the distance of all objects to the new centroids Assign the membership to objects
- Slide 15
- Step 3: Repeat the first two steps until its convergence Knowing the members of each cluster, now we compute the new centroid of each group based on these new memberships.
- Slide 16
- Repeat the first two steps until its convergence Compute the distance of all objects to the new centroids Stop due to no new assignment
- Slide 17
- K-means Demo 17 1.User set up the number of clusters theyd like. (e.g. k=5)
- Slide 18
- K-means Demo 18 1.User set up the number of clusters theyd like. (e.g. K=5) 2.Randomly guess K cluster Center locations
- Slide 19
- K-means Demo 19 1.User set up the number of clusters theyd like. (e.g. K=5) 2.Randomly guess K cluster Center locations 3.Each data point finds out which Center its closest to. (Thus each Center owns a set of data points)
- Slide 20
- K-means Demo 20 1.User set up the number of clusters theyd like. (e.g. K=5) 2.Randomly guess K cluster centre locations 3.Each data point finds out which centre its closest to. (Thus each Center owns a set of data points) 4.Each centre finds the centroid of the points it owns
- Slide 21
- K-means Demo 21 1.User set up the number of clusters theyd like. (e.g. K=5) 2.Randomly guess K cluster centre locations 3.Each data point finds out which centre its closest to. (Thus each centre owns a set of data points) 4.Each centre finds the centroid of the points it owns 5.and jumps there
- Slide 22
- K-means Demo 22 1.User set up the number of clusters theyd like. (e.g. K=5) 2.Randomly guess K cluster centre locations 3.Each data point finds out which centre its closest to. (Thus each centre owns a set of data points) 4.Each centre finds the centroid of the points it owns 5.and jumps there 6.Repeat until terminated!
- Slide 23
- Exemplo K-means no Matlab 23
- Slide 24
- Exemplo k-means no iPad 24
- Slide 25
- Relevant Issues n Efficient in computation O(tKn), where n is number of objects, K is number of clusters, and t is number of iterations. Normally, K, t