clustering heterogeneous data with mutual...

26
Clustering Heterogeneous Data with Mutual Semi-Supervision Knowledge Discovery & Web Mining Lab, Department of Computer Engineering and Computer Science, University of Louisville 19 th International Symposium on String Processing and Information Retrieval, SPIRE 2012, Cartagena de Indias, Colombia October 24, 2012 Artur Abdullin and Olfa Nasraoui [email protected], [email protected]

Upload: others

Post on 16-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Clustering Heterogeneous Data with Mutual Semi-Supervisionwebmining.spd.louisville.edu/~artur/static/docs/... · Knowledge Discovery & Web Mining Lab, ... 2 6 ft 5 in 181.7 lb Green

Clustering Heterogeneous Data with Mutual Semi-Supervision

Knowledge Discovery & Web Mining Lab, Department of Computer Engineering and Computer Science,

University of Louisville

19th International Symposium on String Processing and Information Retrieval, SPIRE 2012, Cartagena de Indias, Colombia

October 24, 2012

Artur Abdullin and Olfa Nasraoui [email protected], [email protected]

Page 2: Clustering Heterogeneous Data with Mutual Semi-Supervisionwebmining.spd.louisville.edu/~artur/static/docs/... · Knowledge Discovery & Web Mining Lab, ... 2 6 ft 5 in 181.7 lb Green

Clustering Heterogeneous

Data Sets

Conversion Splitting

Algorithms for specific types of data

Algorithms for mixed data

Other approaches

Proposed Semi-supervised Learning Framework

Combining distance matrix

Artur Abdullin & Olfa Nasraoui - Clustering Heterogeneous Data with Mutual Semi-Supervision - SPIRE'12

Page 3: Clustering Heterogeneous Data with Mutual Semi-Supervisionwebmining.spd.louisville.edu/~artur/static/docs/... · Knowledge Discovery & Web Mining Lab, ... 2 6 ft 5 in 181.7 lb Green

Baselines for this paper: Conversion and Splitting (for the case of categorical & numerical domains)

Convert and run:

all numerical attributes to categorical attributes and run k-modes.

all categorical attributes to numerical attributes and run k-means.

Split and run:

k-means on the numerical subsets of attributes

k-modes on the categorical subsets of attributes

Artur Abdullin & Olfa Nasraoui - Clustering Heterogeneous Data with Mutual Semi-Supervision - SPIRE'12

Page 4: Clustering Heterogeneous Data with Mutual Semi-Supervisionwebmining.spd.louisville.edu/~artur/static/docs/... · Knowledge Discovery & Web Mining Lab, ... 2 6 ft 5 in 181.7 lb Green

Specifically Designed Clustering Algorithms Numerical data

k-means, DBSCAN, etc.

Categorical data k-modes, ROCK, CACTUS, etc.

Transactional or text data spherical k-means, or any specialized

algorithm.

Graph data KMETIS, spectral clustering, etc.

Artur Abdullin & Olfa Nasraoui - Clustering Heterogeneous Data with Mutual Semi-Supervision - SPIRE'12

Page 5: Clustering Heterogeneous Data with Mutual Semi-Supervisionwebmining.spd.louisville.edu/~artur/static/docs/... · Knowledge Discovery & Web Mining Lab, ... 2 6 ft 5 in 181.7 lb Green

Algorithms for Mixed Data Attributes

k-prototypes [Huang, 1998]: integrates the k-means and the k-modes. Disadvantage: weight parameter

INCONCO [Plant and Bohm, 2011] : extends the Cholesky decomposition to model dependencies in

heterogeneous data relying on the principle of Minimum Description Length, integrates

numerical and categorical information in clustering. Disadvantage: assumes a known probability distribution model for each

domain and assumes an identical number of clusters in each domain

k-mixed-means [Ahmad and Dey, 2007]: based on k-means paradigm that works with mixed numerical and categorical features.

CAVE [Hsu and Chen, 2007]: based on variance and entropy.

Artur Abdullin & Olfa Nasraoui - Clustering Heterogeneous Data with Mutual Semi-Supervision - SPIRE'12

Page 6: Clustering Heterogeneous Data with Mutual Semi-Supervisionwebmining.spd.louisville.edu/~artur/static/docs/... · Knowledge Discovery & Web Mining Lab, ... 2 6 ft 5 in 181.7 lb Green

Other Methods

Multiview Clustering [Zhou and Burges, 2007; Cortes et al., 2009] train two independent hypotheses with bootstrapping by

providing each other with labels for the unlabeled data.

Ensemble-based Clustering [He et al., 2005; Al-Shaqsi and Wang, 2010] Dedicating a different clustering process to each domain and

aggregating the final results within an ensemble.

Collaborative Clustering [Pedrycz and Rai, 2008; Hammouda and Kamel, 2006] Consider each site as dedicated to only one pure domain of the

data.

Artur Abdullin & Olfa Nasraoui - Clustering Heterogeneous Data with Mutual Semi-Supervision - SPIRE'12

Page 7: Clustering Heterogeneous Data with Mutual Semi-Supervisionwebmining.spd.louisville.edu/~artur/static/docs/... · Knowledge Discovery & Web Mining Lab, ... 2 6 ft 5 in 181.7 lb Green

Where is our SSL Framework Located? Independent vs. Integrated

approaches tend to tip scale in either direction

SSL framework can embrace continuum between the two!

Independent Fully

integrated

Combining distance matrix

CAVE

INCONCO

Specialized algorithms

Ensemble-based

Multiview

Artur Abdullin & Olfa Nasraoui - Clustering Heterogeneous Data with Mutual Semi-Supervision - SPIRE'12

SSL Framework

Page 8: Clustering Heterogeneous Data with Mutual Semi-Supervisionwebmining.spd.louisville.edu/~artur/static/docs/... · Knowledge Discovery & Web Mining Lab, ... 2 6 ft 5 in 181.7 lb Green

Semi-supervised Clustering

Supervision (some labels) in the form of: Initial seeds [Basu et al., 2002]

Constraints [Wagstaff et al., 2001]

Feedback [Cohn et al., 2003]

Examples of Semi-supervised Learning Algorithms: Co-training [Blum and Mitchell, 1998]

Transductive support vector machines [Joachims, 1999]

Entropy minimization [Guerrero-Curieses and Cid-Sueiro, 2000]

Semi-supervised EM [Nigam et al., 2000]

Graph-based approaches [Blum and Chawla, 2001]

Clustering-based approaches [Zeng et al., 2003]

Artur Abdullin & Olfa Nasraoui - Clustering Heterogeneous Data with Mutual Semi-Supervision - SPIRE'12

Page 9: Clustering Heterogeneous Data with Mutual Semi-Supervisionwebmining.spd.louisville.edu/~artur/static/docs/... · Knowledge Discovery & Web Mining Lab, ... 2 6 ft 5 in 181.7 lb Green

Proposed: SSL Framework for Mixed Domain Clustering

We propose a new approach to cluster diverse representations or types of data Also, naturally, can cluster different sources of data

Our approach is rooted in Semi-Supervised Learning (SSL) However: no external labels are used! Instead, knowledge that is discovered from each domain

serves as a guide to supervise the other domain Note: some data may be missing from one domain or source In this paper, we explore semi-supervision/guidance based

on constraints exchange

Artur Abdullin & Olfa Nasraoui - Clustering Heterogeneous Data with Mutual Semi-Supervision - SPIRE'12

Page 10: Clustering Heterogeneous Data with Mutual Semi-Supervisionwebmining.spd.louisville.edu/~artur/static/docs/... · Knowledge Discovery & Web Mining Lab, ... 2 6 ft 5 in 181.7 lb Green

Hidden Markov Random Field

Artur Abdullin & Olfa Nasraoui - Clustering Heterogeneous Data with Mutual Semi-Supervision - SPIRE'12

Observed data Hidden MRF

1x

2x

3x

4x

5x

11 l

12 l13 l

24 l

25 l

Must Link

Cannot Link

Page 11: Clustering Heterogeneous Data with Mutual Semi-Supervisionwebmining.spd.louisville.edu/~artur/static/docs/... · Knowledge Discovery & Web Mining Lab, ... 2 6 ft 5 in 181.7 lb Green

HMRF-Kmeans [Basu et al., 2004]

)9(log][),(

][),(),(

),(

),(

maxZllxxw

llxxwxDJ

Cxx

jijiDDij

Mxx

jijiDij

Xx

liobj

ji

jii

i

Artur Abdullin & Olfa Nasraoui - Clustering Heterogeneous Data with Mutual Semi-Supervision - SPIRE'12

objJK

hh 1}{ The task is to minimize over , k, L, and D

1x 4x34w

321

7x8x

78w

Page 12: Clustering Heterogeneous Data with Mutual Semi-Supervisionwebmining.spd.louisville.edu/~artur/static/docs/... · Knowledge Discovery & Web Mining Lab, ... 2 6 ft 5 in 181.7 lb Green

EM framework Initialization step:

Use constraints to guide initial cluster formations

Neighborhood inference

Cluster selection

E step: given the current cluster representatives, every data points is re-

assigned to the cluster which minimizes its contribution to the objective function

Use the iterative conditional model (ICM)

M step: Minimizes objective function over cluster representatives

Minimizes objective function over the parameter distortion measure

Artur Abdullin & Olfa Nasraoui - Clustering Heterogeneous Data with Mutual Semi-Supervision - SPIRE'12

Page 13: Clustering Heterogeneous Data with Mutual Semi-Supervisionwebmining.spd.louisville.edu/~artur/static/docs/... · Knowledge Discovery & Web Mining Lab, ... 2 6 ft 5 in 181.7 lb Green

N Height Weight Eyes Color Education Shopping Cart 1 5 ft 9 in 170.4 lb Blue Master MP3 player, iPad 2 6 ft 5 in 181.7 lb Green High School Watch, car, T-shirt 3 5 ft 7 in 151.4 lb Black Bachelor Milk, Beer, Apples

Mixed-Type Data Set

Transactional Data

Algorithm 1

Result 2 Result 3 Result 1

Categorical Data Numerical Data

Constraint-based Semi-supervised Framework

Algorithm 2 Algorithm 3

constraints constraints

Artur Abdullin & Olfa Nasraoui - Clustering Heterogeneous Data with Mutual Semi-Supervision - SPIRE'12

Page 14: Clustering Heterogeneous Data with Mutual Semi-Supervisionwebmining.spd.louisville.edu/~artur/static/docs/... · Knowledge Discovery & Web Mining Lab, ... 2 6 ft 5 in 181.7 lb Green

HMRF-KMeans (domain 1)

No constraints, C = Ø, M = Ø

Generate constraints from

the n closest points to cluster centroids

HMRF-KMeans (domain 2)

HMRF-KMeans (domain 1)

Result 2 Result 1

Stage 2

Stage 3

Stage 4

Stage 5

Generate constraints from the n closest points

to cluster centroids

Generate constraints from the n closest points

to cluster centroids

Data set

Domain 1 Domain 2

Stage 1

Page 15: Clustering Heterogeneous Data with Mutual Semi-Supervisionwebmining.spd.louisville.edu/~artur/static/docs/... · Knowledge Discovery & Web Mining Lab, ... 2 6 ft 5 in 181.7 lb Green

Complexity of Proposed SSL Approach Determined mainly by HMRF-KMeans complexity:

Initialization:

transitive closure on the must-link constraints - O(N3) time and O(N2) space.

add cannot-link constraints between every pair of points in that pair of connected components - O(k2)

cluster selection which is O(k2).

The EM-based minimization - O(kN).

Overhead complexity resulting from mutual supervision via inter-domain coordination and seed exchange: O(k2N).

Total computational complexity:

With initialization - O(N3)

Without initialization (random) - O(N)

Artur Abdullin & Olfa Nasraoui - Clustering Heterogeneous Data with Mutual Semi-Supervision - SPIRE'12

Initialization

EM-based minimization

Mutual supervision

HMRF-KMeans

)( 3NO )(NOOR

Page 16: Clustering Heterogeneous Data with Mutual Semi-Supervisionwebmining.spd.louisville.edu/~artur/static/docs/... · Knowledge Discovery & Web Mining Lab, ... 2 6 ft 5 in 181.7 lb Green

Clustering Evaluation

Internal validity metrics (cluster quality based only on data distribution – compact & separated)

Silhouette index [Rousseeuw, 1987]

External validity metrics (use external labels in validation step only – pure clusters)

Normalized Mutual Information [Strehl et al., 2000]

Artur Abdullin & Olfa Nasraoui - Clustering Heterogeneous Data with Mutual Semi-Supervision - SPIRE'12

Page 17: Clustering Heterogeneous Data with Mutual Semi-Supervisionwebmining.spd.louisville.edu/~artur/static/docs/... · Knowledge Discovery & Web Mining Lab, ... 2 6 ft 5 in 181.7 lb Green

Real Data Sets (UCI ML Repository)

Data set No. of Records

No. of Numerical Attributes

No. of Categorical Attributes

Missing Values

No. of Classes

Adult 45179 6 8 Yes 2

Heart Disease 303 6 7 Yes 2

Credit Approval

690 6 9 Yes 2

Artur Abdullin & Olfa Nasraoui - Clustering Heterogeneous Data with Mutual Semi-Supervision - SPIRE'12

• Number of clusters = 2 • Repeated each experiment 50 times

• (10 for larger adult data set)

Page 18: Clustering Heterogeneous Data with Mutual Semi-Supervisionwebmining.spd.louisville.edu/~artur/static/docs/... · Knowledge Discovery & Web Mining Lab, ... 2 6 ft 5 in 181.7 lb Green

Experimental Results: Adult Data Set

Algorithm Semi-Supervised Conversion Splitting

Data type Numerical Categorical Numerical Categorical Numerical Categorical

Silhouette Index

0.92 0.28 0.07 0.22 0.21 0.25

NMI 0.06 0.17 0.08 0.09 0.10 0.09

Artur Abdullin & Olfa Nasraoui - Clustering Heterogeneous Data with Mutual Semi-Supervision - SPIRE'12

Significant improvements with Semi-supervised approach: • In Silhouette index:

• For both numerical and categorical domains • In NMI:

• For categorical domain

Page 19: Clustering Heterogeneous Data with Mutual Semi-Supervisionwebmining.spd.louisville.edu/~artur/static/docs/... · Knowledge Discovery & Web Mining Lab, ... 2 6 ft 5 in 181.7 lb Green

Experimental Results: Heart Disease Data Set

Algorithm Semi-Supervised Conversion Splitting

Data type Numerical Categorical Numerical Categorical Numerical Categorical

Silhouette Index

0.36 0.29 0.26 0.18 0.36 0.31

NMI 0.24 0.27 0.28 0.26 0.19 0.29

Artur Abdullin & Olfa Nasraoui - Clustering Heterogeneous Data with Mutual Semi-Supervision - SPIRE'12

• Splitting yielded better clustering results for categorical domain

• Conversion yielded better clustering results for numerical domain in NMI index.

• However, Semi-supervised approach outperforms conversion and splitting in numerical domain in Silhouette index

Page 20: Clustering Heterogeneous Data with Mutual Semi-Supervisionwebmining.spd.louisville.edu/~artur/static/docs/... · Knowledge Discovery & Web Mining Lab, ... 2 6 ft 5 in 181.7 lb Green

Experimental Results: credit card data set

Algorithm Semi-Supervised Conversion Splitting

Data type Numerical Categorical Numerical Categorical Numerical Categorical

Silhouette Index

0.46 0.23 0.35 0.17 0.63 0.23

NMI 0.20 0.28 0.13 0.23 0.08 0.26

Artur Abdullin & Olfa Nasraoui - Clustering Heterogeneous Data with Mutual Semi-Supervision - SPIRE'12

• Significant improvements with Semi-supervised approach: • In categorical domain:

• For all indices • In numerical:

• NMI

Page 21: Clustering Heterogeneous Data with Mutual Semi-Supervisionwebmining.spd.louisville.edu/~artur/static/docs/... · Knowledge Discovery & Web Mining Lab, ... 2 6 ft 5 in 181.7 lb Green

Conclusions

In general, better clustering results in the categorical domain.

Exchanged (supervising) constraints seem to provide additional helpful knowledge to guide the clustering in the categorical domain

Possible explanation: Avoiding local minima and obtaining better clustering in the categorical domain.

Artur Abdullin & Olfa Nasraoui - Clustering Heterogeneous Data with Mutual Semi-Supervision - SPIRE'12

Page 22: Clustering Heterogeneous Data with Mutual Semi-Supervisionwebmining.spd.louisville.edu/~artur/static/docs/... · Knowledge Discovery & Web Mining Lab, ... 2 6 ft 5 in 181.7 lb Green

Ongoing & Future Work

Ongoing work – since this paper: Continued work by exploiting seed-based SSL

Published results [Abdullin & Nasraoui, KDIR 2012]

Exploring Inter-Domain Compatibility with bigger and more challenging Multimedia Web data sets Visual + Text [Abdullin & Nasraoui, to be presented in LA-WEB 2012]

Future work: Further improve seed-based & constraint-based HMRF

Optimize parameters of the SSL framework

Improve scalability

Real applications (Web domain is fertile with heterogeneous data)

Artur Abdullin & Olfa Nasraoui - Clustering Heterogeneous Data with Mutual Semi-Supervision - SPIRE'12

Page 23: Clustering Heterogeneous Data with Mutual Semi-Supervisionwebmining.spd.louisville.edu/~artur/static/docs/... · Knowledge Discovery & Web Mining Lab, ... 2 6 ft 5 in 181.7 lb Green

References

[He et al., 2005] He, Z. and Xu, X. and Deng, S., "Clustering Mixed Numeric and Categorical Data: A Cluster Ensemble Approach", eprint arXiv:cs/0509011 (2005).

[Al-Shaqsi and Wang, 2010] Al-Shaqsi, J. and Wenjia Wang, "A clustering ensemble method for clustering mixed data", in Neural Networks (IJCNN), The 2010 International Joint Conference on (, 2010), pp. 1 -8.

[Huang, 1998] Huang, Zhexue, "Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values", Data Mining and Knowledge Discovery 2 (1998), pp. 283-304.

[Guha et al., 2000] Sudipto Guha and Rajeev Rastogi and Kyuseok Shim, "Rock: A robust clustering algorithm for categorical attributes", Information Systems 25, 5 (2000), pp. 345 - 366.

[Ganti et al., 1999] Ganti, Venkatesh and Gehrke, Johannes and Ramakrishnan, Raghu, "CACTUS - clustering categorical data using summaries", in Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining (New York, NY, USA: ACM, 1999), pp. 73--83.

[MacQueen, 1967] MacQueen, J. B., "Some Methods for Classification and Analysis of MultiVariate Observations", in Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability vol. 1, (University of California Press, 1967), pp. 281-297.

[Ester et al., 1996] Martin Ester and Hans-peter Kriegel and Jorg Sander and Xiaowei Xu, "A density-based algorithm for discovering clusters in large spatial databases with noise", in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96) (AAAI Press, 1996), pp. 226--231.

[Huang, 1998] Huang, Zhexue, "Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values", Data Mining and Knowledge Discovery 2 (1998), pp. 283-304.

[Hsu and Chen, 2007] Chung-Chian Hsu and Yu-Cheng Chen, "Mining of mixed data with application to catalog marketing", Expert Systems with Applications 32, 1 (2007), pp. 12 - 23.

[Basu et al., 2004] S. Basu, M. Bilenko, and R.J. Mooney, “A Probabilistic Framework for Semi-Supervised Clustering,” Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), Aug. 2004.

Artur Abdullin & Olfa Nasraoui - Clustering Heterogeneous Data with Mutual Semi-Supervision - SPIRE'12

Page 24: Clustering Heterogeneous Data with Mutual Semi-Supervisionwebmining.spd.louisville.edu/~artur/static/docs/... · Knowledge Discovery & Web Mining Lab, ... 2 6 ft 5 in 181.7 lb Green

[Ahmad and Dey, 2007] Amir Ahmad and Lipika Dey, "A k-mean clustering algorithm for mixed numeric and categorical data", Data and Knowledge Engineering 63, 2 (2007), pp. 503 - 527.

[Plant and Böhm , 2011] Plant, Claudia and Böhm, Christian, "INCONCO: interpretable clustering of numerical and categorical objects", in Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (New York, NY, USA: ACM, 2011), pp. 1127--1135.

[Sheath and Sokal, 1973] Sheath, P. H. A. and Sokal, Robert C., Numerical taxonomy. The principles and practice of numerical classification (San Francisco : W.H. Freeman, 1973).

[Kaufman and Rousseeuw , 1990] Kaufman, L. and Rousseeuw, P.J., Finding Groups in Data An Introduction to Cluster Analysis (New York: John Wiley & Sons, Inc, 1990).

[Muller et al., 2001] Muller, K.-R. and Mika, S. and Ratsch, G. and Tsuda, K. and Scholkopf, B., "An introduction to kernel-based learning algorithms", Neural Networks, IEEE Transactions on 12, 2 (2001), pp. 181 -201.

[Girolami, 2002] Girolami, M., "Mercer kernel-based clustering in feature space", Neural Networks, IEEE Transactions on 13, 3 (2002), pp. 780 -784.

[Inokuchi and Miyamoto, 2004] Inokuchi, R. and Miyamoto, S., "LVQ clustering and SOM using a kernel function", in Fuzzy Systems, 2004. Proceedings. 2004 IEEE International Conference on vol. 3, (, 2004), pp. 1497 - 1500 vol.3.

[MacDonald and Fyfe, 2000] MacDonald, D. and Fyfe, C., "The kernel self-organising map", in Knowledge-Based Intelligent Engineering Systems and Allied Technologies, 2000. Proceedings. Fourth International Conference on vol. 1, (, 2000), pp. 317 -320 vol.1.

[Hur et al., 2001] Asa Ben-Hur and David Horn and Hava T. Siegelmann and Vladimir Vapnik, "A Support Vector Method for Clustering", in Advances in Neural Information Processing Systems 13 (MIT Press, 2001), pp. 367--373.

[Hur et at., 2002] Ben-Hur, Asa and Horn, David and Siegelmann, Hava T. and Vapnik, Vladimir, "Support vector clustering", J. Mach. Learn. Res. 2 (2002), pp. 125--137.

Page 25: Clustering Heterogeneous Data with Mutual Semi-Supervisionwebmining.spd.louisville.edu/~artur/static/docs/... · Knowledge Discovery & Web Mining Lab, ... 2 6 ft 5 in 181.7 lb Green

[Frank and Asuncion, 2010] A. Frank and A. Asuncion, "UCI Machine Learning Repository", University of California, Irvine, School of Information and Computer Sciences (2010).

[Zhou and Burges, 2007] D. Zhou and C. Burges. Spectral clustering and transductive learning with multiple views. In ICML, pages 1159-1166, 2007.

[Cortes et al., 2009] C. Cortes, M. Mohri, and A. Rostamizadeh. Learning non-linear combinations of kernels. In NIPS, pages 396-404, 2009.

[A. Kumar and H. D. III., 2011] A co-training approach for multi-view spectral clustering. In ICML, pages 393-400, 2011.

[Cheng and Zhao, 2009] Y. Cheng and R. Zhao. Multiview spectral clustering via ensemble. In GrC, pages 101-106, 2009.

[de Sa, 2005] V. de Sa. Spectral clustering with two views. In ICML workshop on learning with multiple views, pages 20-27, 2005.

[Tang et al., 2009] W. Tang, Z. Lu, and I. S. Dhillon. Clustering with multiple graphs. In ICDM, pages 1016-1021, 2009.

[Pedrycz and Rai, 2008] Witold Pedrycz and Partab Rai, "Collaborative clustering with the use of Fuzzy C-Means and its quantification", Fuzzy Sets and Systems (2008), 2399 - 2427.

[Hammuoda and Kamel, 2006]Hammuoda, Khaled and Kamel, Mohamed, "Collaborative document clustering" (2006), 453--463.

[Cohl et al., 2003] Cohn, D., Carauana, R., and McCallum, A. (2003). Semi-supervised clustering with user feedback. Technical report.

[Abdullin & Nasraoui, KDIR 2012] A. Abdullin and O. Nasraoui, “A Semi-supervised Learning Framework to Cluster Mixed Data Types,” in Proceedings of KDIR 2012, October 2012.

[Abdullin & Nasraoui, LA-WEB 2012] A. Abdullin and O. Nasraoui, “Clustering Heterogeneous Data Sets,” in Proceedings of LA-WEB 2012 (Invited keynote), October 2012.

Page 26: Clustering Heterogeneous Data with Mutual Semi-Supervisionwebmining.spd.louisville.edu/~artur/static/docs/... · Knowledge Discovery & Web Mining Lab, ... 2 6 ft 5 in 181.7 lb Green

Thank you!

Questions ?

Artur Abdullin & Olfa Nasraoui - Clustering Heterogeneous Data with Mutual Semi-Supervision - SPIRE'12