Download - ICDM04 Submission Data
ICDM 2004 Business Meeting 11/4/2004 1
Data Mining Data Mining on ICDM Submission Dataon ICDM Submission Data
Shusaku Tsumoto
Ning Zhong and Xindong Wu
ICDM 2004 Business Meeting 11/4/2004 2
Data Mining Data Mining on ICDM Submission Dataon ICDM Submission Data
38 countries, 445 Submissions Regular Papers: 39 (9%) Short Papers: 66 (14.8%)
High Acceptance Ratio (Regular)– Germany: 4/15 (26.7%)– Finland: 2/ 9 (22.2%)– USA: 20/109 (18.3%)
ICDM 2004 Business Meeting 11/4/2004 3
CountryCountry
Country Regular Short Total Ratio
USA 20 28 109 44.0%
China 3 4 55 12.7%
UK 1 6 39 17.9%
Japan 0 5 28 17.9%
Canada 3 3 25 24.0%
Taiwan 0 1 18 5.6%
Australia 2 1 17 17.6%
Germany 4 5 15 60.0%
France 0 2 14 14.3%
India 1 0 14 7.1%
Singapore 0 3 12 25.0%
Brazil 0 1 12 8.3%
Italy 2 1 10 30.0%
Finland 2 1 9 33.3%
Spain 0 1 7 14.3%
HongKong 1 1 6 33.3%
Top 15 39 63 390 26.2%
Total 39 66 445 23.8%
ICDM 2004 Business Meeting 11/4/2004 4
Data Mining Data Mining on ICDM Submission Dataon ICDM Submission Data
Top 5 Areas of Submissions:– Data mining applications– Data mining and machine learning algorithms and methods– Mining text and semi-structured data, and mining temporal, spatial and multimedia
data– Data pre-processing, data reduction, feature selection and feature transformation– Soft computing and uncertainty management for data mining
High Acceptance Ratio Areas (Regular+Short)– Quality assessment and interestingness metrics of data mining results
5/10 50.0%– Data pre-processing, data reduction, feature selection and feature transfor
mation 14/35 40.0%– Complexity, efficiency, and scalability issues in data mining
4/11 36.4%
5
TopicsTopics
TopicRegular
Short
Total
Ratio
Data mining applications 4 10 8416.7
%
Data mining and machine learning algorithms and methods
9 20 8135.8
%
Mining text and semi-structured data, and mining temporal, spatial and multimedia data
3 8 4425.0
%
Data pre-processing, data reduction, feature selection and feature transformation
7 7 3540.0
%
Soft computing and uncertainty management for data mining
3 348.8
%
Foundations of data mining 2 1 2611.5
%
Mining data streams 3 4 2528.0
%
Human-machine interaction and visual data mining 1 166.3
%
Security, privacy and social impact of data mining 2 1 1520.0
%
Data and knowledge representation for data mining 1 1 1216.7
%
Pattern recognition and trend analysis 1 119.1
%
Complexity, efficiency, and scalability issues in data mining
2 2 1136.4
%
Quality assessment and interestingness metrics of data mining results
2 3 1050.0
%
Statistics and probability in large-scale data mining 1 911.1
%
Integration of data warehousing, OLAP and data mining
1 911.1
%
Collaborative filtering/personalization 2 728.6
%
Post-processing of data mining results 1 1 728.6
%
Others 2 633.3
%
High performance and parallel/distributed data mining
1 250.0
%
Query languages and user interfaces for mining 10.0
%
Total 39 66 44523.8
%
ICDM 2004 Business Meeting 11/4/2004 6-1.5
-1
-0.5
0
0.5
1
1.5
2
-2 -1.5 -1 -0.5 0 0.5 1 1.5
Corresponding AnalysisCorresponding Analysis(Country vs Final Decision)(Country vs Final Decision)
Reject
Regular
Short
Slovenia
Japan
Hong Kong
USA
r2=0.177
Germany
ItalyIndia
r1=0.378
Finland
UK France
Canada
Australia
ICDM 2004 Business Meeting 11/4/2004 7-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
-1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
Corresponding AnalysisCorresponding Analysis(Topics vs Final Decision)(Topics vs Final Decision)
RejectShort
RegularStatistics and probability
Security, privacy
Applications
Post-processing
r2=0.184
Preprocessing, Feature Selection
r1=0.280
High-performance
Quality-assessment
Collaborative Filtering
Soft-computing
DM Methods
ICDM 2004 Business Meeting 11/4/2004 8
Corresponding AnalysisCorresponding Analysis
Country vs Final Decision– Regular: Germany, USA– Short: ? – Reject: Most of the countries are located near this region.
Topics vs Final Decision– Regular: Quality Assessment,
Preprocessing/Feature Selection– Short: DM/ML Methods, Collaborative Filtering– Reject: DM Applications
ICDM 2004 Business Meeting 11/4/2004 9
Rule Mining Rule Mining on ICDM Submission Dataon ICDM Submission Data
Datasets– Sample Size: 445– Attributes: 5
• Paper No. : ordered by submission date• # of Authors• # of Characters in Title• Country• Category
– Analyzed by Clementine 7.1 (and SPSS12.0J)
ICDM 2004 Business Meeting 11/4/2004 10
Rule Mining (C5.0)Rule Mining (C5.0)on ICDM Submission Dataon ICDM Submission Data
C5.0
– [Topic=Mining semi-structured data,…] & [129< Paper No.<=369] => Reject (Confidence 0.87, Support 10)
– [Country=USA] & [Topic=Mining semi-structured data,…] & [Paper No.>369] & [# of Authors <=3] =>Accept (Confidence 0.667, Support 3)
– [Topic=Preprocessing/Feature Selection] & [# of Authors>4] => Accept (Confidence: 1.0, Support 3)
– Topic, Paper No, # of Authors : Important Features
ICDM 2004 Business Meeting 11/4/2004 11
Rule Mining (GRI)Rule Mining (GRI)on ICDM Submission Dataon ICDM Submission Data
Generalized Rule Induction
– [# of Authors <2] & [Paper No. <120.5] => Rejected (Confidence 96.0%, Support 24)
– [# of Chars in Title< 27] & [Paper No. > 212]=> Accepted (Confidence 100%, Support 5)
Paper No., # of Chars in Title, # of Authors: Important Features
ICDM 2004 Business Meeting 11/4/2004 12
Multidimensional ScalingMultidimensional Scaling(2004)(2004)
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
-1 -0.5 0 0.5 1 1.5
Decision
# of Authors
Review Score
# of Chars in Title
TopicsPaper No.
Country
ICDM 2004 Business Meeting 11/4/2004 13
Summary (2004) of Mining Summary (2004) of Mining on ICDM Submission Data on ICDM Submission Data
Do not submit a paper too fast ! – Reflection not only on the contents, but also on the titles needed
Mining Text/Web/Semi-structured Data are very popular. # of Application papers are growing now. (But, many: rejected) Strong Topics
– Preprocessing/Feature-Selection
– Postprocessing
– Security and Privacy Several topics are emerging in ICDM2004:
– Mining Data Streams
– Collaborative Filtering
– Quality Assessment
ICDM 2004 Business Meeting 11/4/2004 14
Comparison Comparison between 02-04between 02-04Review Scores: Review Scores:
Box-plot Box-plot
2002 2003 2004
year
0.00
1.00
2.00
3.00
4.00
5.00
score
1,1691,176
ICDM 2004 Business Meeting 11/4/2004 15
Comparison between 02-04Comparison between 02-04Countries Countries
CountryAcceptance Ratio (2002)
Country Acceptance Ratio (2003)
Country Acceptance Ratio (2004)
Hong Kong 64.7% Israel 55.0% Germany 60.0%
USA 47.9% Hong Kong 50.0% USA 44.0%
Canada 45.5% Japan 37.0% Finland 33.0%
Finland 33.3% USA 33.0% Hong Kong 33.0%
France 33.3% Germany 32.0% Italy 30.0%
16
Comparison between 02 and 04Comparison between 02 and 04Topics Topics
Top 5 in 2002
AcceptanceRatio
Top 5 in 2003
AcceptanceRatio
Top 5 in 2004
Acceptance Ratio
Graph Mining
75.0%Process-centric DM
80.0% Quality Assessment 50.0%
Temporal Data
52.6%Security, privacy
57.0%Preprocessing, Feature Selection
40.0%
Theory 42.9%Statistics and Probability
47.0%Complexity/Scalability
36.4%
Text Mining
42.1%Visual Data Mining
38.0%DM and ML Methods
35.8%
Rule 41.7%Post-processing
41.7%Collaborative Filtering
28.6%
Post-processing 28.6%
ICDM 2004 Business Meeting 11/4/2004 17
Multidimensional ScalingMultidimensional Scaling(2003 and 2004)(2003 and 2004)
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
-1 -0.5 0 0.5 1 1.5
Decision
# of Authors
Review Score
# of Chars in Title
Topics Paper No.
Country
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
-1 -0.5 0 0.5 1 1.5
Decision
# of Authors
Review Score
# of Chars in Title
Topics Paper No.
Country
2003
2004
Topological structure w.r.t. similaritiesseems not to be changed in 2003 and 2004.
ICDM 2004 Business Meeting 11/4/2004 18
Data Mining Data Mining on ICDM Submission Dataon ICDM Submission Data
Acknowledgements– Many thanks to
• PC chairs, Vice Chairs and PC members
• All the authors• All the contributors to ICDM2004
– See you again in ICDM2005!
ICDM 2004 Business Meeting 11/4/2004 19
Multidimensional ScalingMultidimensional Scaling(2004)(2004)
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
-1 -0.5 0 0.5 1 1.5
Decision
# of Authors
Review Score
# of Chars in Title
TopicsPaper No.
Country
ICDM 2004 Business Meeting 11/4/2004 20-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
-1 -0.5 0 0.5 1 1.5
Multidimensional ScalingMultidimensional Scaling(2003)(2003)
Decision
# of Authors
Review Score
# of Chars in Title
TopicsPaper No.
Country