web searching basics dr. dania bilal is 530 fall 2009
TRANSCRIPT
![Page 1: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649ea85503460f94bab367/html5/thumbnails/1.jpg)
Web Searching Basics
Dr. Dania Bilal
IS 530
Fall 2009
![Page 2: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649ea85503460f94bab367/html5/thumbnails/2.jpg)
How the Web Came About?
• First, we had the Internet with text-based files and indexes to find information in these files – Static, no graphics or multimedia– No point and click using a mouse– No GUI (Graphical User Interface)– Menu-driven and subject categories for topics
were hierarchical in nature
![Page 3: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649ea85503460f94bab367/html5/thumbnails/3.jpg)
How the Web Came About?
• Tim Berners-Lee– Late 1980s created the HTTP protocol– Hypertext Transfer Protocol– Links various files and documents (text,
sound, images, videos, etc.) available on various Internet host servers in a seamless way
• Beginning of the World Wide Web (WWW)• WWW is part of the Internet
![Page 4: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649ea85503460f94bab367/html5/thumbnails/4.jpg)
How the Web Came About?
• Graphical Web browsers were developed for navigating through Web content
• Mosaic– First Web browser – Appeared in 1993– Revolutionized access to information – Made use of the Web much easier to use
• Other browsers appeared
![Page 5: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649ea85503460f94bab367/html5/thumbnails/5.jpg)
Searching the Web
• Search engines (general and subject-driven)
• Directories
• Meta-search engines
• Meta-directories
![Page 6: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649ea85503460f94bab367/html5/thumbnails/6.jpg)
Search Engines
• Engines are computer programs designed for searching the Web
• Components– Crawlers or spiders– Database – Search engine software – Search algorithms
![Page 7: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649ea85503460f94bab367/html5/thumbnails/7.jpg)
Crawlers or Spiders
• Traverse the Web, visits web pages that are not blocked
• Read the pages visited
• Follows links form pages to additional pages
• Return frequently to the pages for updates
![Page 8: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649ea85503460f94bab367/html5/thumbnails/8.jpg)
Database Component
• Stores copies of the web pages the crawlers or spiders visited
• Database is organized based on a preset scheme
• Fields in each document or webpage are identified (e.g., URL, page title, header or section title, metadata described by author of a page)----> pages are indexed
![Page 9: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649ea85503460f94bab367/html5/thumbnails/9.jpg)
Search Engine Software
• Program that sorts through the pages stored in the database
• Takes a user query entered in a search engine• Matches the words in the query to the web
pages stored in the database alongside the search criteria in the query– Matches each word and accounts for the operators
appearing in the query (+; -; “ “)• The + sign is assumed when no operators are used
![Page 10: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649ea85503460f94bab367/html5/thumbnails/10.jpg)
Search Engine Software
• Matching is performed by algorithms (computational rules)
• Relevance of what was matched is calculated using sophisticated algorithms
• Relevance ranking of pages returned to a user are based on rules used by the engine company
![Page 11: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649ea85503460f94bab367/html5/thumbnails/11.jpg)
Search Engine Relevance Ranking
• Some criteria– Word frequency– Location of a word in the web page or
document • page title, page URL, page first heading, 2nd
heading, first sentence in a heading, etc.)
– Number of links to a page by other pages– No. of clicks on a page when it appears in the
result of a search– Meta-tags (metadata)
![Page 12: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649ea85503460f94bab367/html5/thumbnails/12.jpg)
Basic Search Strategy
• Identify the information need• Extract basic concepts from the information need (broad
ideas) • Choose possible keywords or terms related to the
concepts– Think of broader, narrower, or related terms
• Determine the search logic and techniques most suitable for formulating a search using the keywords or terms – Boolean? Proximity? Combination of both? Nesting?
• Select an appropriate engine, directory, meta-engine, or meta-directory based on the topic
![Page 13: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649ea85503460f94bab367/html5/thumbnails/13.jpg)
Basic Search Strategy• Explore the features of the engine or directory if you’re unfamiliar with them
– Visit the Advanced Search options, Help file, Search Tips, as applicable • Conduct the search • Examine the first page of returned results and visit the top five or more
– Search engine ranks results not based on the context of the topic search; rather, based on the matching and ranking criteria
• System relevance
• Identify the pages or documents that are the most relevant to your topic– User relevance judgment (also called pertinence)
• Use the most relevant document or page and explore the keywords, headings, phrases, etc. that you can use to find additional relevant pages or documents.
– “Seed” document or “Pearl growing”– Follow the Cited by, as applicable to find additional documents relevant to the
topic. • Revise your search if needed.• Try your search in another engine, specialized engine, meta-engine,
directory, etc.
![Page 14: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649ea85503460f94bab367/html5/thumbnails/14.jpg)
The Question of Quality
• Criteria for evaluating information quality– Source domain (.com, .edu, .gov, etc.)– Authority– Purpose or motivation– Quality of writing– Balanced views– Currency of information– Sources cited
![Page 15: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649ea85503460f94bab367/html5/thumbnails/15.jpg)
The Question of Quality
• Accuracy• Factual information (check against two or
more authoritative sources)• Use additional sources for evaluating the
quality of information on the Internet. http://www.virtualchase.com/quality http://www.lib.berkeley.edu/TeachingLib/Guides/
Internet/Evaluate.html
![Page 16: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649ea85503460f94bab367/html5/thumbnails/16.jpg)
The Invisible Web
• Search engines don’t index all web pages
• Reasons:– Information stored in databases that require
subscription– Pages or websites that are password-
protected – Pages that are not linked to other pages– Pages that are blocked to spiders or crawlers
![Page 17: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649ea85503460f94bab367/html5/thumbnails/17.jpg)
Search Logic: Boolean Operators
Source: Google Images
![Page 18: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649ea85503460f94bab367/html5/thumbnails/18.jpg)
Boolean and Search Engines
• AND +
• OR
• NOT -
![Page 19: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649ea85503460f94bab367/html5/thumbnails/19.jpg)
Phrase Searching
• Proximity searching
• “ “ are used in search engines
• Provides more precise results
• Limits the results to the words that are close to each other.
![Page 20: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649ea85503460f94bab367/html5/thumbnails/20.jpg)
Demos
• Google Features– Basic– Advanced– I’m feeling lucky– Google Directory– About Google– More (from the menu option)– Show options/Hide options (from the results
page)
![Page 21: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649ea85503460f94bab367/html5/thumbnails/21.jpg)
Google Advanced Searching
• Video on YouTube
http://www.youtube.com/watch?v=tk6vZiGiaiQ
![Page 22: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009](https://reader035.vdocuments.site/reader035/viewer/2022062322/56649ea85503460f94bab367/html5/thumbnails/22.jpg)
Yahoo Demo
• Basic
• Advanced
• Directory
• Yahoo Answers
• Ask Earl
• Other features