web search engines

32
Web Search Engines Web Search Engines by Greg R. Notess by Greg R. Notess [email protected] [email protected] imt.net/~notess/ imt.net/~notess/ search search

Upload: brett-malone

Post on 03-Jan-2016

30 views

Category:

Documents


3 download

DESCRIPTION

Web Search Engines. by Greg R. Notess [email protected] imt.net/~notess/search. Overview:. Comparing the database content Change Comparative Size Overlap Looking towards future developments Portal or Destination Output sorting. Results are limited by. Database content - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Web Search Engines

Web Search EnginesWeb Search Enginesby Greg R. Notessby Greg R. Notess

[email protected]@imt.net

imt.net/~notess/imt.net/~notess/searchsearch

Page 2: Web Search Engines

Overview:Overview: Comparing the database contentComparing the database content

• ChangeChange• Comparative SizeComparative Size• OverlapOverlap

Looking towards future Looking towards future developmentsdevelopments• Portal or DestinationPortal or Destination• Output sortingOutput sorting

Page 3: Web Search Engines

Results are Results are limited bylimited by

Database contentDatabase content•The Web sites includedThe Web sites included•The depth to which The depth to which they are indexedthey are indexed

Page 4: Web Search Engines

If it’s not in the If it’s not in the database, the best database, the best search engine will not search engine will not be able to find the Web be able to find the Web pagepage

Page 5: Web Search Engines

So what’re they So what’re they like?like?Very large databasesVery large databasesMost index all words on Most index all words on pagepage•None index words in imagesNone index words in images

Let’s see how the databases Let’s see how the databases compare to the real Webcompare to the real Web

Page 6: Web Search Engines

Change over Change over time?time?

Page 7: Web Search Engines

Overall Size Overall Size ChangeChange

Is the Web in generalIs the Web in generalGrowing?Growing?Shrinking?Shrinking?Remaining the same?Remaining the same?

Page 8: Web Search Engines

Excite Excite 6 Searches 10/96-8/986 Searches 10/96-8/98

Page 9: Web Search Engines

What about the What about the rest?rest?

Who’s the biggest?Who’s the biggest?How to measure?How to measure?

•Actual search resultsActual search results•Verified hitsVerified hits

Page 10: Web Search Engines

0 500 1000 1500 2000 2500 3000 3500 4000 4500

AltaVista

Northern Light

HotBot

Infoseek

Excite

Lycos

WebCrawler

Total Hits from 15 SearchesAugust 29, 1998

Page 11: Web Search Engines

And over time?And over time? 8/98 -- AltaVista8/98 -- AltaVista, Northern Light, HotBot, Northern Light, HotBot 5/98 -- AltaVista5/98 -- AltaVista, HotBot, Northern Light, HotBot, Northern Light 2/98 -- HotBot2/98 -- HotBot, AltaVista, Northern Light, AltaVista, Northern Light 10/97 -- AltaVista10/97 -- AltaVista, HotBot, Northern Light, HotBot, Northern Light 9/97 -- Northern Light9/97 -- Northern Light, Excite, HotBot, Excite, HotBot 6/97 -- HotBot6/97 -- HotBot, AltaVista, Infoseek, AltaVista, Infoseek 10/96 -- HotBot10/96 -- HotBot, Excite, AltaVista, Excite, AltaVista

Page 12: Web Search Engines

Back to change in Back to change in sizesizeLet’s look at six search Let’s look at six search

enginesenginesOver the course of two Over the course of two

yearsyears

Page 13: Web Search Engines

0

250

500

750

1000

1250

Northern Light HotBot AltaVista Infoseek Excite Lycos

Oct 96 June 97 Sept 97 Oct 97 Feb 98 May 98 Aug 98

Database Size ChangesFive Terms: Oct. 96 - Aug. 98

Page 14: Web Search Engines

But at leastBut at least

They have a high They have a high degree of duplication degree of duplication between thembetween them

Right?Right?

Page 15: Web Search Engines

Try 4 small Try 4 small searchessearchesUsing five search enginesUsing five search enginesHow many pages are How many pages are

found by all five or at found by all five or at least by four of them?least by four of them?

Page 16: Web Search Engines

ZEROZERO

Page 17: Web Search Engines

OverlapOverlap

Page 18: Web Search Engines

And they exclude And they exclude most:most: Content of Adobe PDF and formatted filesContent of Adobe PDF and formatted files The content in most sites requiring a log inThe content in most sites requiring a log in CGI output: data requested by a formCGI output: data requested by a form Other dynamically produced dataOther dynamically produced data Pages protected by a robots.txt filePages protected by a robots.txt file Intranets, pages not linked from anywhere Intranets, pages not linked from anywhere

elseelse Commercial resources with domain limitationsCommercial resources with domain limitations Non-Web resourcesNon-Web resources

Page 19: Web Search Engines

Scope Summary:Scope Summary:

Inconsistent growthInconsistent growthNot full coverageNot full coverageSurprisingly low Surprisingly low duplicationduplication

Page 20: Web Search Engines

Positive Side?Positive Side?Essential for searching the Essential for searching the

NetNetCan be used effectivelyCan be used effectively

•Phrase searchPhrase search•Use more than oneUse more than one•Smart searchingSmart searching

Page 21: Web Search Engines

Incredibly popularIncredibly popular•Even when they failEven when they fail

–But then, since when is finding But then, since when is finding information always easy?information always easy?

Page 22: Web Search Engines

Overview:Overview: Comparing the database contentComparing the database content

• ChangeChange• Comparative SizeComparative Size• OverlapOverlap

Looking towards future Looking towards future developmentsdevelopments• Portal or DestinationPortal or Destination• Output sortingOutput sorting

Page 23: Web Search Engines

What is a search What is a search engine?engine?

Portal?Portal?Gateway?Gateway?Destination?Destination?

Page 24: Web Search Engines

Search EngineSearch Engine

the software than the software than searches a databasesearches a database

Page 25: Web Search Engines

DevelopmentDevelopment Database of Web pagesDatabase of Web pages adds Supplementary Databaseadds Supplementary Database

• Phone numbers, reference, businesses, Phone numbers, reference, businesses, newsnews

then adds Subject directorythen adds Subject directory then Servicesthen Services

• email, ISP, shopping, travel agentemail, ISP, shopping, travel agent now Communitiesnow Communities

Page 26: Web Search Engines

Portal to Portal to Destination?Destination?

Driving forceDriving force• advertising revenueadvertising revenue

Keep users longer for moreKeep users longer for moreConflicts with portal and Conflicts with portal and gateway principlegateway principle

Page 27: Web Search Engines

Future Future possibilities?possibilities? Smaller databasesSmaller databases Less pointing to external pagesLess pointing to external pages Paid advertising or sponsorship Paid advertising or sponsorship

for visibilityfor visibility Rise of search only sites?Rise of search only sites?

Page 28: Web Search Engines

Output Output DevelopmentDevelopment Initially, “Relevance” rankingInitially, “Relevance” ranking

•CrudeCrude•Not site or URL basedNot site or URL based

Some site sorting from ExciteSome site sorting from ExciteNo date sortingNo date sorting

Page 29: Web Search Engines

Site SortingSite Sorting Infoseek, then Lycos, now Infoseek, then Lycos, now

HotBotHotBotGroup together by siteGroup together by site

•More relevant than prior More relevant than prior algorithmsalgorithms

Northern Light includes it in Northern Light includes it in •Custom FoldersCustom Folders

Page 30: Web Search Engines

Other OutputOther Output RealName on AltaVistaRealName on AltaVista Direct Hit on HotBotDirect Hit on HotBot Subject Directory Categories Subject Directory Categories NewsNews Books, CDs, etc. “about search Books, CDs, etc. “about search

term”term”

Page 31: Web Search Engines

Search Engine Search Engine ShowdownShowdown

imt.net/~notess/searchimt.net/~notess/search Search engine featuresSearch engine features See also See also

• www.searchenginewatch.comwww.searchenginewatch.com See alsoSee also

•Rich Wiggins, Coming up Rich Wiggins, Coming up next . . .next . . .

Page 32: Web Search Engines

Web Search EnginesWeb Search Enginesby Greg R. Notessby Greg R. Notess

[email protected]@imt.net

imt.net/~notess/imt.net/~notess/searchsearch