chaoyang university of technology clustering web transactions using rough approximation source :...
DESCRIPTION
Chaoyang University of Technology 2006/12/143 Abstract Web usage mining is the application of data mining techniques Discovering user access patterns from web access log Using rough sets can effectively mine web log records to discover web page access patternsTRANSCRIPT
Chaoyang University of Technology
Clustering web transactions using rough approximation
Source : Fuzzy Sets and Systems 148 (2004) 131–138
Author : Supriya Kumar Dea, P. Radha Krishnab.
Adviser : RC. Chen
Present : Yu-Hsiang Fu (傅昱翔 )
Date :2006/12/14
Chaoyang University of TechnologyChaoyang University of Technology
2006/12/14 2
Chaoyang University of Technology Outline
• Abstract• Introduction• Rough Set• Rough Set Approximation• Experimental Results• Conclusions• References
2006/12/14 3
Chaoyang University of Technology Abstract
• Web usage mining is the application of data mining techniques
• Discovering user access patterns from web access log
• Using rough sets can effectively mine web log records to discover web page access patterns
2006/12/14 4
Chaoyang University of Technology Introduction (1/2)
• WWW includes a huge number of hyperlinks ,access and usage information.
• Web Mining– Web content mining– Web structure mining– Web usage mining
2006/12/14 5
Chaoyang University of Technology Introduction (2/2)
• User’s behaviors– Click stream is the sequence of clicks or pages
requested as a visitor explores a Web site.• Web transaction
– A user session is the click-stream of page views for a single user across the entire web.
• The usage patterns are different for different users that navigates the same pattern in different ways.
2006/12/14 6
Chaoyang University of Technology Rough Set (1/5)
• The Rough Set theory was introduced by Zdzislaw Pawlak in the early 1980s.
• Rough Set deals with the classification analysis of data table.
• Rough Set develop efficient searching for relevant tolerance relations and extract interesting patterns in data.
2006/12/14 7
Chaoyang University of Technology Rough Set (2/5)
• Universe and Relation
2006/12/14 8
Chaoyang University of Technology Rough Set (3/5)
• Lower and Upper Approximation
( surely )
( possible )
2006/12/14 9
Chaoyang University of Technology Rough Set (4/5)
• Boundary and Negative region
2006/12/14 10
Chaoyang University of Technology Rough Set (5/5)
2006/12/14 11
Chaoyang University of TechnologyRough Set Approximation (1/7)
• A user transaction is a sequence of items
• Let there be m users and the user transactions be
• Let U be the set of distinct n clicks (hyperlinks/URLs) clicked by users
2006/12/14 12
Chaoyang University of TechnologyRough Set Approximation (2/7)
2006/12/14 13
Chaoyang University of TechnologyRough Set Approximation (3/7)
2006/12/14 14
Chaoyang University of TechnologyRough Set Approximation (4/7)
2006/12/14 15
Chaoyang University of TechnologyRough Set Approximation (5/7)
2006/12/14 16
Chaoyang University of TechnologyRough Set Approximation (6/7)
2006/12/14 17
Chaoyang University of TechnologyRough Set Approximation (7/7)
2006/12/14 18
Chaoyang University of TechnologyExperimental Results (1/2)
• Log files form www.idrbt.ac.in .– The web sites consists of 62 web pages and 283
links.– Log files record every click that user make.– Session time is 30 min.
2006/12/14 19
Chaoyang University of TechnologyExperimental Results (2/2)
• Steps:– First, the data is preprocessed and transformed.– Second, computing similarity upper approximation for
each transaction.– Finally, clusters of transactions using rough approxim
ation (threshold = 0.5).
2006/12/14 20
Chaoyang University of Technology Conclusion• This paper presented a novel algorithm for
clustering using rough approximation to cluster the web transactions of user access.
• This approach is useful to find interesting user access patterns in web log.
• The result can be helpful for building up adaptive web according to the user’s behavior.
2006/12/14 21
Chaoyang University of Technology References• Zdzislaw Pawlak,Jerzy Grzymala-Busse,Roman Slowinski, and Wojciech Ziarko, Rough S
ets, COMMUNICATIONS OF THE ACM November 1995/Vol. 38, No. 11, 88-95• Zdzislaw Pawlak, Rough Sets (Abstract) ,262-264• Zdzisław Pawlak , Andrzej Skowron , Rudiments of rough sets , Information Sciences 177
(2007) 3–27• Nils Kammenhuber, Julia Luxenburger, Anja Feldmann, Gerhard Weikum, Web Search Cli
ckstreams, IMC’06, October 25–27, 2006,• A, Jain, Data Clustering: A Review , ACM Computing Suversy, Vol 31, No 3, September
1999 ,274-275,281-285