search engine 1 - copy
TRANSCRIPT
-
7/29/2019 Search Engine 1 - Copy
1/19
Tefko Saracevic 1
Search Engines
-
7/29/2019 Search Engine 1 - Copy
2/19
Tefko Saracevic 2
Definition
SearchCOMPUTING (transitive verb) to examine a computer file,
disk, database, or network for particular information
Enginesomething that supplies the driving force or energy to a
movement, system, or trend
Search Enginea computer program that searches for particular keywords
and returns a list of documents in which they were found,especially a commercial service that scans documents onthe Internet
-
7/29/2019 Search Engine 1 - Copy
3/19
Tefko Saracevic 3
Brief History
Very First tool used for searching wasArchie created in 1990.
Aliweb was next to come in 1993 which usedthe crawler.
Web crawler and Lycos were next to come in1994.
-
7/29/2019 Search Engine 1 - Copy
4/19
Tefko Saracevic 4
Your
Browser
How Search Engines Work
The Web
URL1
URL2
URL3 URL4
Crawler
Indexer
SearchEngine
Database Eggs?Eggs.
Eggs - 90%
Eggo - 81%
Ego- 40%
Huh? - 10%
All AboutEggsby
S. I. Am
-
7/29/2019 Search Engine 1 - Copy
5/19
Tefko Saracevic 5
Ways of Searching
Keyword searching
Refined Searching
Relevancy Rankings
Information on meta tags
Concept based Searching
-
7/29/2019 Search Engine 1 - Copy
6/19
Tefko Saracevic 6
Few Search Engines
AltaVista (www.altavista.com)
Excite (www.excite.com)
Infoseek (www.go.com) Lycos (www.lycos.com)
HotBot (www.hotbot.com)
Yahoo (www.yahoo.com) Google (www.google.com)
http://www.altavista.com/http://www.excite.com/http://www.go.com/http://www.lycos.com/http://www.hotbot.com/http://www.yahoo.com/http://www.google.com/http://www.google.com/http://www.yahoo.com/http://www.hotbot.com/http://www.lycos.com/http://www.go.com/http://www.excite.com/http://www.altavista.com/ -
7/29/2019 Search Engine 1 - Copy
7/19 Tefko Saracevic 7
Web Crawler
Create a copy of all visited pages for laterprocessing by a search engine.
used for automating maintenance taskson a website, such as checking links orvalidating HTML code
-
7/29/2019 Search Engine 1 - Copy
8/19 Tefko Saracevic 8
can be used to gather specific types ofinformation from Web pages, such as
harvesting e-mail addresses (usually forspam).
for a number of reasons crawlers
cover only a fraction, not cover-invisibleweb.
http://en.wikipedia.org/wiki/Spamminghttp://en.wikipedia.org/wiki/Spamming -
7/29/2019 Search Engine 1 - Copy
9/19 Tefko Saracevic 9
Indexing
Search engine Indexing collects, parses, and storesdata to facilitate fast and accurate information
retrieval. The purpose of storing an index is to optimize speed
and performance in finding relevant documents for a
search query.
Without an index, the search engine would scan everydocument in the corpus, which would requireconsiderable time and computing power.
-
7/29/2019 Search Engine 1 - Copy
10/19 Tefko Saracevic 10
elaborationsimilarities, differences
all search engines have these basic parts incommon
BUT the actual processes methods howthey do it are based on various algorithms& they differ
most are proprietary with details kept mostly
secret but based on well known principles frominformation retrieval or classification
to some extent Google is an exception theypublished their method
-
7/29/2019 Search Engine 1 - Copy
11/19 Tefko Saracevic 11
Case of
developed by Sergey Brin and Lawrence Page whilestudents at Stanford in the beginning run on Stanford computers
basic approach has been described in their famous
paperThe Anatomy of a Large-Scale HypertextualWeb Search Engine well written, simple language, has their pictures
in acknowledgement they cite the support by NSFs DigitalLibrary Initiative i.e. initially, Google came out ofgovernment sponsored research
describe their method PageRank - based on rankinghyperlinks as in citation indexing
We chose our system name, Google, because it is acommon spelling of googol, or ten on hundredth power
http://www-db.stanford.edu/~backrub/google.htmlhttp://www-db.stanford.edu/~backrub/google.htmlhttp://www-db.stanford.edu/~backrub/google.htmlhttp://www-db.stanford.edu/~backrub/google.htmlhttp://www-db.stanford.edu/~backrub/google.htmlhttp://www-db.stanford.edu/~backrub/google.htmlhttp://www-db.stanford.edu/~backrub/google.html -
7/29/2019 Search Engine 1 - Copy
12/19 Tefko Saracevic 12
Coverage Differences
no engine covers more than a fraction ofWWW
estimates: none more than 16%
hard (even impossible) to discern & comparecoverage, but they differ substantially inwhat they cover
-
7/29/2019 Search Engine 1 - Copy
13/19 Tefko Saracevic 13
o in addition:
many national search engines
own coverage, orientation,
governance
many specialized or domain searchengines
own coverage geared to subject ofinterest
many comprehensive sources
independent
-
7/29/2019 Search Engine 1 - Copy
14/19
Tefko Saracevic 14
Advantages of search engine
Search vast databases
Very easy to use
Sophisticated searching often available
Normally global
-
7/29/2019 Search Engine 1 - Copy
15/19
Tefko Saracevic 15
Limitations
Automated method of collecting informationsrather crude.
Information may be out of context .
May produce out of date sites.
-
7/29/2019 Search Engine 1 - Copy
16/19
Tefko Saracevic 16
Search engines are also many timesvictims of spamdexing.
use of techniques that pushrankings higher than they belong isalso called spamdexing.
methods typically include textual
as well as link-based techniques.
-
7/29/2019 Search Engine 1 - Copy
17/19
Tefko Saracevic 17
Search Engine Optimization
(SEO) SEO is one of the key Web Marketing activities.
It is a part of search engine marketing.
SEO + SEM = PPC(pay par click)
When any user search on Google, on the right side,display some adds on right side under Sponsor Linkssection, these are called Pay Per Click adds.
http://en.wikipedia.org/wiki/Web_Marketinghttp://en.wikipedia.org/wiki/Web_Marketinghttp://en.wikipedia.org/wiki/Web_Marketinghttp://en.wikipedia.org/wiki/Web_Marketing -
7/29/2019 Search Engine 1 - Copy
18/19
Tefko Saracevic 18
-
7/29/2019 Search Engine 1 - Copy
19/19
Tefko Saracevic 19
Thank you