chapter 5 web crawler & search engine

13
CHAPTER 5 WEB CRAWLER & SEARCH ENGINE

Upload: gerry

Post on 15-Feb-2016

61 views

Category:

Documents


0 download

DESCRIPTION

Chapter 5 Web Crawler & Search Engine . I. What is Web Crawler. A  Web crawler  is an  Internet bot  that systematically browses the  World Wide Web , typically for the purpose of  Web indexing (store in database easy to search). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 5   Web Crawler & Search Engine

CHAPTER 5

WEB CRAWLER & SEARCH ENGINE

Page 2: Chapter 5   Web Crawler & Search Engine

I. WHAT IS WEB CRAWLER A Web crawler is an Internet bot that

systematically browses the World Wide Web, typically for the purpose of Web indexing (store in database easy to search).

Web search engines and some other sites use Web crawling or spidering software to update their web content or indexes of others sites' web content. Web crawlers can copy all the pages they visit for later processing by a search engine that indexes the downloaded pages so that users can search them much more quickly.

Crawlers can validate hyperlinks and HTML code. They can also be used for web scraping (see also data-driven programming).

Page 3: Chapter 5   Web Crawler & Search Engine

1. WHAT IS WEB CRAWLER (CON.) A Web crawler starts with a list of URLs to visit, called

the seeds. As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the crawl frontier.

URLs from the frontier are recursively visited according to a set of policies. If the crawler is performing archiving of websitesit copies and saves the information as it goes. Such archives are usually stored such that they can be viewed, read and navigated as they were on the live web, but are preserved as ‘snapshots'

Page 4: Chapter 5   Web Crawler & Search Engine

2. HOW CRAWLER WORKS? Tovisitpage : URL Queue visitedpage : Extract

Page 5: Chapter 5   Web Crawler & Search Engine

SOFTWARE OF CRAWLER

Page 6: Chapter 5   Web Crawler & Search Engine
Page 7: Chapter 5   Web Crawler & Search Engine

PROCESS OF WEB CRAWLER WITH SEARCH ENGINE

Page 8: Chapter 5   Web Crawler & Search Engine

BASIC WEBSITE LAYOUT

Page 9: Chapter 5   Web Crawler & Search Engine

3. TECHNOLOGY FOR WEB CRAWLER A

Page 10: Chapter 5   Web Crawler & Search Engine

4. WEB CRAWLER SEARCH ENGINE A

Page 11: Chapter 5   Web Crawler & Search Engine

5. HOW WEB CRAWLER SEARCH ENGINE WORKS?

Page 12: Chapter 5   Web Crawler & Search Engine

6. MONEY BEHIND SEARCH ENGINE A

Page 13: Chapter 5   Web Crawler & Search Engine