web mining
DESCRIPTION
TRANSCRIPT
04/10/2023 1
WEB MINING
K.INIYACSE FINAL YEAR
04/10/2023 2
INTRODUCTION
Web mining is to apply data mining techniques to extract and uncover knowledge from web documents and services.
Using data mining techniques to make the web more useful and more profitable and to increase the efficiency of our interaction with the web.
04/10/2023 3
WEB MINING SERVICES
04/10/2023 4
WWW SPECIFIES…
Web: A huge, widely-distributed, highly heterogeneous, semi-structured, hypertext/hypermedia, interconnected information repository. Web is a huge collection of documents plus– Hyper-link information– Access and usage information
04/10/2023 5
SUBTASKS
Resource Finding.
Information selection & Pre-processing.
Generalization.
Analysis.
04/10/2023 6
WEB MINING TAXONOMY
WEB MINING
WEB USAGE MINING
WEB STRUCTURE
MINING
WEB CONTENT MINING
CUSTOMIZED USAGE
TRACKING
GENERAL ACCESS
PATTERN TRACKING
SEARCH RESULT MINING
WEB PAGE CONTENT MINING
04/10/2023 7
WEB CONTENT MINING
Discovery of useful information from web contents /data /documents.
Information Retrieval view.
Database View.
04/10/2023 8
WEB STRUCTURE MINING
Researchers proposed methods of using citations among journal articles to evaluate the quality of research papers.
Customer behavior – evaluate a quality of a product based on the opinions of other customers (instead of product’s description or advertisement).
04/10/2023 9
WEB USAGE MINING
It’s also known as Web log Mining. DEFINITION
Discovery of meaningful patterns from data generated by client-server transactions (or) from Web server logs.
Typical Sources of Data:automatically generated data stored in server access logs, referrer logs, agent logs, and client-side cookies.user profiles.metadata: page attributes, content attributes, usage data.
04/10/2023 10
Cont.,Generate simple statistical reports:
A summary report of hits and bytes transferred A list of top requested URLs A list of top referrers A list of most common browsers used Hits per hour/day/week/month reports Hits per domain reports
Learn: Who is visiting you site The path visitors take through your pages How much time visitors spend on each page The most common starting page Where visitors are leaving your site
04/10/2023 11
DESIGN OF WEB LOG MINIER
Weblog is Filtered to generate a relational Database.
A Data cube is generated from Database.
OLAP is used to drill-down and roll-up in the cube.
WEB LOG DatabaseData Cleaning
Knowledge
Patterns
Data cube creation
Data cube Sliced and diced cube
Data Mining
OLAP
04/10/2023 12
MINING THE WEB’S LINK STRUCTURES
Hubs.
Authority.
Mutual Reinforcing Relationship.
Finding Authoritative Web Pages.
Hyperlinks can infer the notation of Authority.
HUBS AUTHORITIES
Hub-Authority Relations
04/10/2023 13
STRUCTURES
04/10/2023 14
HITS
HITS Stands for Hyperlink-Induced Topic Search.
It Explore interactions between hubs and authoritative pages.
Expand the root set into a base set.
Apply Weight-Propagation.
System Based on the HITS Algorithm.
- eg) GOOGLE.
Difficulties from ignoring textual contexts
-Drifting: When Hubs contains Multiple Topics.
-Topic hijacking: When Many Pages from a single web site point to the same single Popular site.
04/10/2023 15
APPLICATIONS OF WEB MINING
Improve web server system performance.
Improve site Design.
Intrusion Detection.
Predict user’s Action.
Enhance the quality and delivery of the internet information services to the end user.
Facilitates Adaptive sites/personalization.
04/10/2023 16