ch19
DESCRIPTION
TRANSCRIPT
![Page 1: Ch19](https://reader035.vdocuments.site/reader035/viewer/2022081413/5466b439af795984778b5163/html5/thumbnails/1.jpg)
Chapter 19
Web Crawler
![Page 2: Ch19](https://reader035.vdocuments.site/reader035/viewer/2022081413/5466b439af795984778b5163/html5/thumbnails/2.jpg)
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 19-2
Chapter Objectives
• Provide a case study example from problem statement through implementation
• Demonstrate how hash tables and graphs can be used to solve a problem
![Page 3: Ch19](https://reader035.vdocuments.site/reader035/viewer/2022081413/5466b439af795984778b5163/html5/thumbnails/3.jpg)
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 19-3
Web Crawler
• A web crawler is a system that searches the web, beginning with a user-designated we page, looking for a designated target string
• A web crawler follows all of the links on each page that it encounter until there are no more pages or until it reaches a designated limit
![Page 4: Ch19](https://reader035.vdocuments.site/reader035/viewer/2022081413/5466b439af795984778b5163/html5/thumbnails/4.jpg)
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 19-4
Web Crawler
• For this case study, we will create a graphical web crawler with the following requirements– Enter a designated starting web page
– Enter a target string for which to search
– Limit the search to 50 pages
– Display the results when done
![Page 5: Ch19](https://reader035.vdocuments.site/reader035/viewer/2022081413/5466b439af795984778b5163/html5/thumbnails/5.jpg)
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 19-5
Web Crawler - Design
• Our web crawler system consists of three high-level components:– The driver
– The graphical user interface
– The web crawler implementation• Makes use of graphs and hashtables
![Page 6: Ch19](https://reader035.vdocuments.site/reader035/viewer/2022081413/5466b439af795984778b5163/html5/thumbnails/6.jpg)
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 19-6
Web Crawler - Design• The algorithm for the web crawler is as follows
– Add the starting page to a HashSet of pages to be searched and to our graph
– Remove a page from the set of pages to be searched
– Search the page for the target string• If string exists, add page to list of results
– Search the page for links• If links have not already been searched, add them to set of
pages to be searched and to our graph
– Repeat the three previous steps until our limit is reached or the set is empty
![Page 7: Ch19](https://reader035.vdocuments.site/reader035/viewer/2022081413/5466b439af795984778b5163/html5/thumbnails/7.jpg)
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 19-7
FIGURE 19.1 User interface design
![Page 8: Ch19](https://reader035.vdocuments.site/reader035/viewer/2022081413/5466b439af795984778b5163/html5/thumbnails/8.jpg)
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 19-8
FIGURE 19.2 UML description