predicting content change on the web by : hitesh sonpure guided by : prof. m. wanjari

19
Predicting Content Change On The Web BY : HITESH SONPURE GUIDED BY : PROF. M. WANJARI

Upload: malcolm-whitehead

Post on 12-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Predicting Content Change On The Web BY : HITESH SONPURE GUIDED BY : PROF. M. WANJARI

Predicting Content Change On The Web

BY : HITESH SONPURE

GUIDED BY : PROF. M. WANJARI

Page 2: Predicting Content Change On The Web BY : HITESH SONPURE GUIDED BY : PROF. M. WANJARI

IntroductionRelated Work Main FocusProblem Formulation and TargetsFoundational Methodologies and AlgorithmsExperimental Setup And ResultApplicationConclusionsFurther plans

OUTLINE

Page 3: Predicting Content Change On The Web BY : HITESH SONPURE GUIDED BY : PROF. M. WANJARI

INTRODUCTION

The ability to predict key types of changes can be used in a variety of setting.

In particular, the content of a page enables better prediction of its change.

Pages that are related to the prediction page may also change in similar.

Page 4: Predicting Content Change On The Web BY : HITESH SONPURE GUIDED BY : PROF. M. WANJARI

Incremental Web Crawling Setting- Recrawling a web page is linked to the probability of its change.

User Centric Utility- Utility Weights each page.

Several works Use Past change frequency and change recency of a page.

Related Work

Page 5: Predicting Content Change On The Web BY : HITESH SONPURE GUIDED BY : PROF. M. WANJARI

Prediction based on content based features.

Type of correlation structure at the website level by using a sample of web pages from a website.

Extends above idea by clustering pages based on static and dynamic content features.

Related Work

Page 6: Predicting Content Change On The Web BY : HITESH SONPURE GUIDED BY : PROF. M. WANJARI

1. The task of predicting significant changes rather than any change to a web page.

2. Develop a wide array of dynamic content based features that may be useful for the more general temporal mining case beyond crawling. To predict Dynamic Content Change On The Web, so that one can improves a variety of retrieval and web related components.

Focus

Page 7: Predicting Content Change On The Web BY : HITESH SONPURE GUIDED BY : PROF. M. WANJARI

3. Explore a wide variety of methods to identify related pages including content , web graph distance and temporal content similarity.

4.Derive a novel expert prediction framework that effectively leverages information from related pages without the need for sampling from the current time slice.

Focus

Page 8: Predicting Content Change On The Web BY : HITESH SONPURE GUIDED BY : PROF. M. WANJARI

where o ϵ O at time Types of Web Page Change

1. Whether the page o ϵ O changes significantly.

2. Whether the change in page o ϵ O corresponds to a

change from non relevant previous content to relevant

current content.

3. Whether there is a new out link from a page o ϵ O .

PROBLEM FORMULATION AND TARGETS

Page 9: Predicting Content Change On The Web BY : HITESH SONPURE GUIDED BY : PROF. M. WANJARI

Information Settings

1. 1D setting

2. 2D setting

3. 3D setting

…..Continued

Page 10: Predicting Content Change On The Web BY : HITESH SONPURE GUIDED BY : PROF. M. WANJARI

Information Observability

1.Partially Observed 2. Fully Observed

…..Continued

Page 11: Predicting Content Change On The Web BY : HITESH SONPURE GUIDED BY : PROF. M. WANJARI

BASELINE ALGORITHM

Prediction is based on the probability of the page change significantly. i.e.

p(h( oi,tj )=1 | h( oi,tk ) ϵ E where tk < tj and (tj – tk)≤ l).

SINGLE EXPERT ALGORITHM

Represents the pages with set of features.MULTIPLE EXPERT ALGORITHM

Consider both page’s features and features of other pages

LEARNING ALGORITHMS

Page 12: Predicting Content Change On The Web BY : HITESH SONPURE GUIDED BY : PROF. M. WANJARI

EXPERIMENTAL SETUP RESULTS

Page 13: Predicting Content Change On The Web BY : HITESH SONPURE GUIDED BY : PROF. M. WANJARI

Application to Crawling

Maximising Freshness

APPLICATION:

Page 14: Predicting Content Change On The Web BY : HITESH SONPURE GUIDED BY : PROF. M. WANJARI

CONCLUSIONS

Tackled the problem of predicting significant content change.

Sheds light on how and why content changes on the web and how it can be predicted.

the addition of the page content improves prediction when compared to simple frequency-based prediction.

Additionally, the addition of information of related pages content improves over the usage of page's content alone.

Page 15: Predicting Content Change On The Web BY : HITESH SONPURE GUIDED BY : PROF. M. WANJARI

To predict the appropriate analysis in Real time Scenario.

FURTHER PLANS

Page 16: Predicting Content Change On The Web BY : HITESH SONPURE GUIDED BY : PROF. M. WANJARI

REFERENCES

E. Adar, J. Teevan, S. Dumais, and J. Elsas. The web changes everything: Understanding the dynamics of web content. In Proc. of WSDM, 2009.

J. Cho and H. Garca-Molina. The evolution of the web and implications for an incremental crawler. In Proc. of VLDB, 2000.

J. Cho and H. Garca-Molina. Estimating frequency of change. TOIT, 3(3):256{290, 2003.

Page 17: Predicting Content Change On The Web BY : HITESH SONPURE GUIDED BY : PROF. M. WANJARI

D. Fetterly, M. Manasse, M. Najork, and J. L. Wiener. A large-scale study of the evolution of web pages. In Proc. Of WWW, 2003.

Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. JMLR, 4:933{969, 2003.

REFERENCES

Page 18: Predicting Content Change On The Web BY : HITESH SONPURE GUIDED BY : PROF. M. WANJARI

REFERENCES

L. Getoor and L. Mihalkova. Exploiting statistical and relational information on the web and in social media. In Proc. of WSDM, 2011.

Page 19: Predicting Content Change On The Web BY : HITESH SONPURE GUIDED BY : PROF. M. WANJARI

THANK YOU !