on the parallel complexity of minimum sum of diameters clustering

19
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . On the Parallel Complexity of Minimum Sum of Diameters Clustering Nopadon Juneam 1,2 and Sanpawat Kantabutra 2 1 Department of Computer Science, Chiang Mai University, Chiang Mai, Thailand 2 The Theory of Computation Group, Department of Computer Engineering Chiang Mai University, Chiang Mai, Thailand Nopadon Juneam (Chiang Mai University) 114th RGJ Seminar Series April 29, 2016 1 / 19

Upload: nopadon-juneam

Post on 05-Apr-2017

193 views

Category:

Science


1 download

TRANSCRIPT

Page 1: On the Parallel Complexity of Minimum Sum of Diameters clustering

..........

.....

.....................................................................

.....

......

.....

.....

.

.

......

On the Parallel Complexity of Minimum Sum

of Diameters Clustering

Nopadon Juneam 1,2 and Sanpawat Kantabutra 2

1Department of Computer Science, Chiang Mai University, Chiang Mai, Thailand2The Theory of Computation Group, Department of Computer Engineering

Chiang Mai University, Chiang Mai, Thailand

Nopadon Juneam (Chiang Mai University) 114th RGJ Seminar Series April 29, 2016 1 / 19

Page 2: On the Parallel Complexity of Minimum Sum of Diameters clustering

..........

.....

.....................................................................

.....

......

.....

.....

.

Outline

1 IntroductionMinimum Sum of Diameters Clustering Problem(MSDCP)

2 Basic Parallel Complexity TheoryBasic Concepts

3 Our ContributionsParallel ComplexityMore Practical Parallel Algorithm

Nopadon Juneam (Chiang Mai University) 114th RGJ Seminar Series April 29, 2016 2 / 19

Page 3: On the Parallel Complexity of Minimum Sum of Diameters clustering

..........

.....

.....................................................................

.....

......

.....

.....

.

Outline

1 IntroductionMinimum Sum of Diameters Clustering Problem(MSDCP)

2 Basic Parallel Complexity TheoryBasic Concepts

3 Our ContributionsParallel ComplexityMore Practical Parallel Algorithm

Nopadon Juneam (Chiang Mai University) 114th RGJ Seminar Series April 29, 2016 3 / 19

Page 4: On the Parallel Complexity of Minimum Sum of Diameters clustering

..........

.....

.....................................................................

.....

......

.....

.....

.

Background

Conceptually, a cluster is a collection of similar entities, whereentities within the same cluster are more similar to each other thanto those in different clusters.

A basic problem of clustering is to partition a given finite set ofentities into clusters.

Applications of clustering are known in a variety of areas such asdata mining, machine learning, pattern recognition, bioinformaticsand social networking.

Nopadon Juneam (Chiang Mai University) 114th RGJ Seminar Series April 29, 2016 4 / 19

Page 5: On the Parallel Complexity of Minimum Sum of Diameters clustering

..........

.....

.....................................................................

.....

......

.....

.....

.

Basic Problem of Clustering

Given a set of points on the plane.(e.g. climate data such as GPS locations)

Nopadon Juneam (Chiang Mai University) 114th RGJ Seminar Series April 29, 2016 5 / 19

Page 6: On the Parallel Complexity of Minimum Sum of Diameters clustering

..........

.....

.....................................................................

.....

......

.....

.....

.

Basic Problem of Clustering (cont.)

The set of points partitioned into three clusters.

Nopadon Juneam (Chiang Mai University) 114th RGJ Seminar Series April 29, 2016 6 / 19

Page 7: On the Parallel Complexity of Minimum Sum of Diameters clustering

..........

.....

.....................................................................

.....

......

.....

.....

.

Minimum Sum of Diameters ClusteringProblem (MSDCP)

Objective: Partition a given set of entities into clusters such that the sumof clusters’ diameters is minimized.

Nopadon Juneam (Chiang Mai University) 114th RGJ Seminar Series April 29, 2016 7 / 19

Page 8: On the Parallel Complexity of Minimum Sum of Diameters clustering

..........

.....

.....................................................................

.....

......

.....

.....

.

Minimum Sum of Diameters ClusteringProblem (MSDCP) (cont.)

“In this work, we studied the complexity of the MSDCPon a parallel computer.”

Nopadon Juneam (Chiang Mai University) 114th RGJ Seminar Series April 29, 2016 8 / 19

Page 9: On the Parallel Complexity of Minimum Sum of Diameters clustering

..........

.....

.....................................................................

.....

......

.....

.....

.

Outline

1 IntroductionMinimum Sum of Diameters Clustering Problem(MSDCP)

2 Basic Parallel Complexity TheoryBasic Concepts

3 Our ContributionsParallel ComplexityMore Practical Parallel Algorithm

Nopadon Juneam (Chiang Mai University) 114th RGJ Seminar Series April 29, 2016 9 / 19

Page 10: On the Parallel Complexity of Minimum Sum of Diameters clustering

..........

.....

.....................................................................

.....

......

.....

.....

.

Key Concepts

Researches in computational complexity theory are centered on thequestion how difficult it is to solve a computational problem in theterm of some computational resources.

On a parallel computer, important computational resources are thecomputational time and the number of processors.

Problems can obtain some speed improvement while being solvedon a parallel computer. However, a significant speed up may notpossible for every problem.

Nopadon Juneam (Chiang Mai University) 114th RGJ Seminar Series April 29, 2016 10 / 19

Page 11: On the Parallel Complexity of Minimum Sum of Diameters clustering

..........

.....

.....................................................................

.....

......

.....

.....

.

Time Order

source: https://apelbaum.wordpress.com/2011/05/05/big-o/

Nopadon Juneam (Chiang Mai University) 114th RGJ Seminar Series April 29, 2016 11 / 19

Page 12: On the Parallel Complexity of Minimum Sum of Diameters clustering

..........

.....

.....................................................................

.....

......

.....

.....

.

Theory of P-Completeness [2]

The parallel complexity class NC contains all the problems thathave a polylogarithmic-time parallel algorithm using a polynomialnumber of processors.

Such a parallel algorithm is considered as a highly parallel solution,or efficient parallel algorithm for the problem.

In contrast, the parallel complexity class P-complete contains allthe problems for which no highly parallel solution is known; up tosome technicality.

Nopadon Juneam (Chiang Mai University) 114th RGJ Seminar Series April 29, 2016 12 / 19

Page 13: On the Parallel Complexity of Minimum Sum of Diameters clustering

..........

.....

.....................................................................

.....

......

.....

.....

.

Research Question

“Whether the MSDCP is in NC or P-complete? If it is in NC,then how good its highly parallel solution can be?

in the use of parallel algorithms in practice.”

Nopadon Juneam (Chiang Mai University) 114th RGJ Seminar Series April 29, 2016 13 / 19

Page 14: On the Parallel Complexity of Minimum Sum of Diameters clustering

..........

.....

.....................................................................

.....

......

.....

.....

.

Outline

1 IntroductionMinimum Sum of Diameters Clustering Problem(MSDCP)

2 Basic Parallel Complexity TheoryBasic Concepts

3 Our ContributionsParallel ComplexityMore Practical Parallel Algorithm

Nopadon Juneam (Chiang Mai University) 114th RGJ Seminar Series April 29, 2016 14 / 19

Page 15: On the Parallel Complexity of Minimum Sum of Diameters clustering

..........

.....

.....................................................................

.....

......

.....

.....

.

Previous Studies

For the case of a partition into three or more clustersIn 1978, Brucker showed that the MSDCP is NP-hard [1]

The problem is extremely difficult.

An exact solution to the problem requires at least exponential time.

For the case of a partition into two clustersIn 1987, Hansen and Jaumard gave an algorithm in O(n3 log n)time [3].

The problem becomes easy.

Nopadon Juneam (Chiang Mai University) 114th RGJ Seminar Series April 29, 2016 15 / 19

Page 16: On the Parallel Complexity of Minimum Sum of Diameters clustering

..........

.....

.....................................................................

.....

......

.....

.....

.

Our Contributions

For the case of a partition into two clusters, we show that theMSDCP is in class NC. That is, the problem has highly parallelsolutions.

1 The problem can be fasteastly solved in O(log n) parallel timeand n6 processors on the Common CRCW PRAM modelof parallel computer.

2 A more practical NC algorithm can be implemented inO(log3 n) parallel time using n3.376 processors on the EREWPRAM model of parallel computer.

In addition, these results will be published in [4].

Nopadon Juneam (Chiang Mai University) 114th RGJ Seminar Series April 29, 2016 16 / 19

Page 17: On the Parallel Complexity of Minimum Sum of Diameters clustering

..........

.....

.....................................................................

.....

......

.....

.....

.

References

P. Brucker.On the complexity of clustering problems.In R. Henn, B. Korte, and W. Oettli, editors, Optimization andOperations Research, volume 157 of Lecture Notes in Economics andMathematical Systems, pages 45–54. Springer Berlin Heidelberg, 1978.

R. Greenlaw, H. J. Hoover, and W. L. Ruzzo.Limits to Parallel Computation: P-completeness Theory.Oxford University Press, Inc., New York, NY, USA, 1995.

P. Hansen and B. Jaumard.Minimum sum of diameters clustering.Journal of Classification, 4(2):215–226, 1987.

N. Juneam and S. Kantabutra.On the parallel complexity of minimum sum of diameters clustering.Journal of Internet Technology, 2017 (in press).

Nopadon Juneam (Chiang Mai University) 114th RGJ Seminar Series April 29, 2016 17 / 19

Page 18: On the Parallel Complexity of Minimum Sum of Diameters clustering

..........

.....

.....................................................................

.....

......

.....

.....

.

Acknowledgment

Financial support from the Thailand Research Fund throughthe Royal Golden Jubilee Ph.D. Program

(Grant No. PHD/0267/2552) is acknowledged.

Nopadon Juneam (Chiang Mai University) 114th RGJ Seminar Series April 29, 2016 18 / 19

Page 19: On the Parallel Complexity of Minimum Sum of Diameters clustering

..........

.....

.....................................................................

.....

......

.....

.....

.

Thanks [email protected]

Nopadon Juneam (Chiang Mai University) 114th RGJ Seminar Series April 29, 2016 19 / 19