web mining

16

Click here to load reader

Upload: iniya-kannan

Post on 17-Jan-2015

333 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Web mining

04/10/2023 1

WEB MINING

K.INIYACSE FINAL YEAR

Page 2: Web mining

04/10/2023 2

INTRODUCTION

Web mining is to apply data mining techniques to extract and uncover knowledge from web documents and services.

Using data mining techniques to make the web more useful and more profitable and to increase the efficiency of our interaction with the web.

Page 3: Web mining

04/10/2023 3

WEB MINING SERVICES

Page 4: Web mining

04/10/2023 4

WWW SPECIFIES…

Web: A huge, widely-distributed, highly heterogeneous, semi-structured, hypertext/hypermedia, interconnected information repository. Web is a huge collection of documents plus– Hyper-link information– Access and usage information

Page 5: Web mining

04/10/2023 5

SUBTASKS

Resource Finding.

Information selection & Pre-processing.

Generalization.

Analysis.

Page 6: Web mining

04/10/2023 6

WEB MINING TAXONOMY

WEB MINING

WEB USAGE MINING

WEB STRUCTURE

MINING

WEB CONTENT MINING

CUSTOMIZED USAGE

TRACKING

GENERAL ACCESS

PATTERN TRACKING

SEARCH RESULT MINING

WEB PAGE CONTENT MINING

Page 7: Web mining

04/10/2023 7

WEB CONTENT MINING

Discovery of useful information from web contents /data /documents.

Information Retrieval view.

Database View.

Page 8: Web mining

04/10/2023 8

WEB STRUCTURE MINING

Researchers proposed methods of using citations among journal articles to evaluate the quality of research papers.

Customer behavior – evaluate a quality of a product based on the opinions of other customers (instead of product’s description or advertisement).

Page 9: Web mining

04/10/2023 9

WEB USAGE MINING

It’s also known as Web log Mining. DEFINITION

Discovery of meaningful patterns from data generated by client-server transactions (or) from Web server logs.

Typical Sources of Data:automatically generated data stored in server access logs, referrer logs, agent logs, and client-side cookies.user profiles.metadata: page attributes, content attributes, usage data.

Page 10: Web mining

04/10/2023 10

Cont.,Generate simple statistical reports:

A summary report of hits and bytes transferred A list of top requested URLs A list of top referrers A list of most common browsers used Hits per hour/day/week/month reports Hits per domain reports

Learn: Who is visiting you site The path visitors take through your pages How much time visitors spend on each page The most common starting page Where visitors are leaving your site

Page 11: Web mining

04/10/2023 11

DESIGN OF WEB LOG MINIER

Weblog is Filtered to generate a relational Database.

A Data cube is generated from Database.

OLAP is used to drill-down and roll-up in the cube.

WEB LOG DatabaseData Cleaning

Knowledge

Patterns

Data cube creation

Data cube Sliced and diced cube

Data Mining

OLAP

Page 12: Web mining

04/10/2023 12

MINING THE WEB’S LINK STRUCTURES

Hubs.

Authority.

Mutual Reinforcing Relationship.

Finding Authoritative Web Pages.

Hyperlinks can infer the notation of Authority.

HUBS AUTHORITIES

Hub-Authority Relations

Page 13: Web mining

04/10/2023 13

STRUCTURES

Page 14: Web mining

04/10/2023 14

HITS

HITS Stands for Hyperlink-Induced Topic Search.

It Explore interactions between hubs and authoritative pages.

Expand the root set into a base set.

Apply Weight-Propagation.

System Based on the HITS Algorithm.

- eg) GOOGLE.

Difficulties from ignoring textual contexts

-Drifting: When Hubs contains Multiple Topics.

-Topic hijacking: When Many Pages from a single web site point to the same single Popular site.

Page 15: Web mining

04/10/2023 15

APPLICATIONS OF WEB MINING

Improve web server system performance.

Improve site Design.

Intrusion Detection.

Predict user’s Action.

Enhance the quality and delivery of the internet information services to the end user.

Facilitates Adaptive sites/personalization.

Page 16: Web mining

04/10/2023 16