search engine 1 - copy

Upload: zatin-gupta

Post on 03-Apr-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 Search Engine 1 - Copy

    1/19

    Tefko Saracevic 1

    Search Engines

  • 7/29/2019 Search Engine 1 - Copy

    2/19

    Tefko Saracevic 2

    Definition

    SearchCOMPUTING (transitive verb) to examine a computer file,

    disk, database, or network for particular information

    Enginesomething that supplies the driving force or energy to a

    movement, system, or trend

    Search Enginea computer program that searches for particular keywords

    and returns a list of documents in which they were found,especially a commercial service that scans documents onthe Internet

  • 7/29/2019 Search Engine 1 - Copy

    3/19

    Tefko Saracevic 3

    Brief History

    Very First tool used for searching wasArchie created in 1990.

    Aliweb was next to come in 1993 which usedthe crawler.

    Web crawler and Lycos were next to come in1994.

  • 7/29/2019 Search Engine 1 - Copy

    4/19

    Tefko Saracevic 4

    Your

    Browser

    How Search Engines Work

    The Web

    URL1

    URL2

    URL3 URL4

    Crawler

    Indexer

    SearchEngine

    Database Eggs?Eggs.

    Eggs - 90%

    Eggo - 81%

    Ego- 40%

    Huh? - 10%

    All AboutEggsby

    S. I. Am

  • 7/29/2019 Search Engine 1 - Copy

    5/19

    Tefko Saracevic 5

    Ways of Searching

    Keyword searching

    Refined Searching

    Relevancy Rankings

    Information on meta tags

    Concept based Searching

  • 7/29/2019 Search Engine 1 - Copy

    6/19

    Tefko Saracevic 6

    Few Search Engines

    AltaVista (www.altavista.com)

    Excite (www.excite.com)

    Infoseek (www.go.com) Lycos (www.lycos.com)

    HotBot (www.hotbot.com)

    Yahoo (www.yahoo.com) Google (www.google.com)

    http://www.altavista.com/http://www.excite.com/http://www.go.com/http://www.lycos.com/http://www.hotbot.com/http://www.yahoo.com/http://www.google.com/http://www.google.com/http://www.yahoo.com/http://www.hotbot.com/http://www.lycos.com/http://www.go.com/http://www.excite.com/http://www.altavista.com/
  • 7/29/2019 Search Engine 1 - Copy

    7/19 Tefko Saracevic 7

    Web Crawler

    Create a copy of all visited pages for laterprocessing by a search engine.

    used for automating maintenance taskson a website, such as checking links orvalidating HTML code

  • 7/29/2019 Search Engine 1 - Copy

    8/19 Tefko Saracevic 8

    can be used to gather specific types ofinformation from Web pages, such as

    harvesting e-mail addresses (usually forspam).

    for a number of reasons crawlers

    cover only a fraction, not cover-invisibleweb.

    http://en.wikipedia.org/wiki/Spamminghttp://en.wikipedia.org/wiki/Spamming
  • 7/29/2019 Search Engine 1 - Copy

    9/19 Tefko Saracevic 9

    Indexing

    Search engine Indexing collects, parses, and storesdata to facilitate fast and accurate information

    retrieval. The purpose of storing an index is to optimize speed

    and performance in finding relevant documents for a

    search query.

    Without an index, the search engine would scan everydocument in the corpus, which would requireconsiderable time and computing power.

  • 7/29/2019 Search Engine 1 - Copy

    10/19 Tefko Saracevic 10

    elaborationsimilarities, differences

    all search engines have these basic parts incommon

    BUT the actual processes methods howthey do it are based on various algorithms& they differ

    most are proprietary with details kept mostly

    secret but based on well known principles frominformation retrieval or classification

    to some extent Google is an exception theypublished their method

  • 7/29/2019 Search Engine 1 - Copy

    11/19 Tefko Saracevic 11

    Case of

    developed by Sergey Brin and Lawrence Page whilestudents at Stanford in the beginning run on Stanford computers

    basic approach has been described in their famous

    paperThe Anatomy of a Large-Scale HypertextualWeb Search Engine well written, simple language, has their pictures

    in acknowledgement they cite the support by NSFs DigitalLibrary Initiative i.e. initially, Google came out ofgovernment sponsored research

    describe their method PageRank - based on rankinghyperlinks as in citation indexing

    We chose our system name, Google, because it is acommon spelling of googol, or ten on hundredth power

    http://www-db.stanford.edu/~backrub/google.htmlhttp://www-db.stanford.edu/~backrub/google.htmlhttp://www-db.stanford.edu/~backrub/google.htmlhttp://www-db.stanford.edu/~backrub/google.htmlhttp://www-db.stanford.edu/~backrub/google.htmlhttp://www-db.stanford.edu/~backrub/google.htmlhttp://www-db.stanford.edu/~backrub/google.html
  • 7/29/2019 Search Engine 1 - Copy

    12/19 Tefko Saracevic 12

    Coverage Differences

    no engine covers more than a fraction ofWWW

    estimates: none more than 16%

    hard (even impossible) to discern & comparecoverage, but they differ substantially inwhat they cover

  • 7/29/2019 Search Engine 1 - Copy

    13/19 Tefko Saracevic 13

    o in addition:

    many national search engines

    own coverage, orientation,

    governance

    many specialized or domain searchengines

    own coverage geared to subject ofinterest

    many comprehensive sources

    independent

  • 7/29/2019 Search Engine 1 - Copy

    14/19

    Tefko Saracevic 14

    Advantages of search engine

    Search vast databases

    Very easy to use

    Sophisticated searching often available

    Normally global

  • 7/29/2019 Search Engine 1 - Copy

    15/19

    Tefko Saracevic 15

    Limitations

    Automated method of collecting informationsrather crude.

    Information may be out of context .

    May produce out of date sites.

  • 7/29/2019 Search Engine 1 - Copy

    16/19

    Tefko Saracevic 16

    Search engines are also many timesvictims of spamdexing.

    use of techniques that pushrankings higher than they belong isalso called spamdexing.

    methods typically include textual

    as well as link-based techniques.

  • 7/29/2019 Search Engine 1 - Copy

    17/19

    Tefko Saracevic 17

    Search Engine Optimization

    (SEO) SEO is one of the key Web Marketing activities.

    It is a part of search engine marketing.

    SEO + SEM = PPC(pay par click)

    When any user search on Google, on the right side,display some adds on right side under Sponsor Linkssection, these are called Pay Per Click adds.

    http://en.wikipedia.org/wiki/Web_Marketinghttp://en.wikipedia.org/wiki/Web_Marketinghttp://en.wikipedia.org/wiki/Web_Marketinghttp://en.wikipedia.org/wiki/Web_Marketing
  • 7/29/2019 Search Engine 1 - Copy

    18/19

    Tefko Saracevic 18

  • 7/29/2019 Search Engine 1 - Copy

    19/19

    Tefko Saracevic 19

    Thank you