web mining

Post on 17-Jan-2015

333 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

04/10/2023 1

WEB MINING

K.INIYACSE FINAL YEAR

04/10/2023 2

INTRODUCTION

Web mining is to apply data mining techniques to extract and uncover knowledge from web documents and services.

Using data mining techniques to make the web more useful and more profitable and to increase the efficiency of our interaction with the web.

04/10/2023 3

WEB MINING SERVICES

04/10/2023 4

WWW SPECIFIES…

Web: A huge, widely-distributed, highly heterogeneous, semi-structured, hypertext/hypermedia, interconnected information repository. Web is a huge collection of documents plus– Hyper-link information– Access and usage information

04/10/2023 5

SUBTASKS

Resource Finding.

Information selection & Pre-processing.

Generalization.

Analysis.

04/10/2023 6

WEB MINING TAXONOMY

WEB MINING

WEB USAGE MINING

WEB STRUCTURE

MINING

WEB CONTENT MINING

CUSTOMIZED USAGE

TRACKING

GENERAL ACCESS

PATTERN TRACKING

SEARCH RESULT MINING

WEB PAGE CONTENT MINING

04/10/2023 7

WEB CONTENT MINING

Discovery of useful information from web contents /data /documents.

Information Retrieval view.

Database View.

04/10/2023 8

WEB STRUCTURE MINING

Researchers proposed methods of using citations among journal articles to evaluate the quality of research papers.

Customer behavior – evaluate a quality of a product based on the opinions of other customers (instead of product’s description or advertisement).

04/10/2023 9

WEB USAGE MINING

It’s also known as Web log Mining. DEFINITION

Discovery of meaningful patterns from data generated by client-server transactions (or) from Web server logs.

Typical Sources of Data:automatically generated data stored in server access logs, referrer logs, agent logs, and client-side cookies.user profiles.metadata: page attributes, content attributes, usage data.

04/10/2023 10

Cont.,Generate simple statistical reports:

A summary report of hits and bytes transferred A list of top requested URLs A list of top referrers A list of most common browsers used Hits per hour/day/week/month reports Hits per domain reports

Learn: Who is visiting you site The path visitors take through your pages How much time visitors spend on each page The most common starting page Where visitors are leaving your site

04/10/2023 11

DESIGN OF WEB LOG MINIER

Weblog is Filtered to generate a relational Database.

A Data cube is generated from Database.

OLAP is used to drill-down and roll-up in the cube.

WEB LOG DatabaseData Cleaning

Knowledge

Patterns

Data cube creation

Data cube Sliced and diced cube

Data Mining

OLAP

04/10/2023 12

MINING THE WEB’S LINK STRUCTURES

Hubs.

Authority.

Mutual Reinforcing Relationship.

Finding Authoritative Web Pages.

Hyperlinks can infer the notation of Authority.

HUBS AUTHORITIES

Hub-Authority Relations

04/10/2023 13

STRUCTURES

04/10/2023 14

HITS

HITS Stands for Hyperlink-Induced Topic Search.

It Explore interactions between hubs and authoritative pages.

Expand the root set into a base set.

Apply Weight-Propagation.

System Based on the HITS Algorithm.

- eg) GOOGLE.

Difficulties from ignoring textual contexts

-Drifting: When Hubs contains Multiple Topics.

-Topic hijacking: When Many Pages from a single web site point to the same single Popular site.

04/10/2023 15

APPLICATIONS OF WEB MINING

Improve web server system performance.

Improve site Design.

Intrusion Detection.

Predict user’s Action.

Enhance the quality and delivery of the internet information services to the end user.

Facilitates Adaptive sites/personalization.

04/10/2023 16

top related