03 search engine

Upload: zhou-xuanming

Post on 06-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 03 Search Engine

    1/15

    I C 0 1 0 2 W e b - b a se d I n f o r m a t i o n Sy s t em s

    W eb Search Eng ines

    A/ Prof. Yang Zhonghua

    Web Search Eng ine / 2 ( 59 ) A/Prof. Yang, Zhonghua

    Web Search Engines

    Web Search Eng ine / 3 ( 59 ) A/Prof. Yang, Zhonghua

    The first law of e-commerce is that if users

    cannot find the product, they cannot buy iteither.

    Jakob Nielsen

    Web Search Eng ine / 4 ( 59 ) A/Prof. Yang, Zhonghua

    I n t e rne t sea rch eng i nes

    Internet search engines are special sites onthe Web that are designed to help peoplefind information stored on other sites.

    There are differences in the ways various

    search engines work, but they all performthree basic tasks:1. They search the Internet -- or select pieces of the

    Internet -- based on important words.

    2. They keep an index of the words they find, andwhere they find them.

    3. They allow users to look for words orcombinations of words found in that index.

  • 8/3/2019 03 Search Engine

    2/15

    Web Search Eng ine / 5 ( 59 ) A/Prof. Yang, Zhonghua

    Google search is the world's most popular search engine

    Web Search Eng ine / 6 ( 59 ) A/Prof. Yang, Zhonghua

    SearchesPer Day(Millions)

    Per Month(Millions)

    Google

    91 2,733

    Yahoo 60 1,792

    MSN 28 845AOL 16 486

    Ask 13 378

    Others 6 166

    Total 213 6,400

    Searches Per Day: Top 5 Eng ines

    the United States in March 2006

    Web Search Eng ine / 7 ( 59 ) A/Prof. Yang, Zhonghua

    Top U.S. Search P roviders by Searches, M ay 2007

    Provider Searches (000) Share of Total Searches (% )

    Google 4,033,277 56.3

    Yahoo 1,540,949 21.5

    MSN/ Windows Live 605,400 8.4

    AOL 381,961 5.3

    Ask.com 142,418 2.0

    My Web Search 61,784 0.9

    Comcast 34,908 0.5

    EarthLink 33,461 0.5

    My Way 30,122 0.4

    Dogpile.com 26,295 0.4

    Other 275,365 3.8All search 7,165,940 100.0

    Source: Nielsen// NetRatings, 2007

    Web Search Eng ine / 8 ( 59 ) A/Prof. Yang, Zhonghua

    Google search eng ine

    Google has one of the largest databases ofWeb pages, including many other types ofweb documents (blog posts, w iki pages, groupdiscussion threads and document formats

    (e.g., PDFs, Word or Excel documents,PowerPoints).

    Despite the presence of all these formats,Google's popularity ranking often makespages worth looking at rise near the top ofsearch results.

  • 8/3/2019 03 Search Engine

    3/15

    Web Search Eng ine / 9 ( 59 ) A/Prof. Yang, Zhonghua

    Second Op in ion i n search ing

    Google alone is often not sufficient, however.Less than halfthe searchable Web is fullysearchable in Google.

    Overlap studies show that about halfof thepages in any search engine database existonly in that database.

    Getting a second opinion is therefore oftenworth your time.

    Ask.com or Yahoo! Search.

    Web Search Eng ine / 1 0 ( 59 ) A/Prof. Yang, Zhonghua

    Things You CAN Do

    in Google, Yahoo !, and Ask .com

    Things NOT Supported

    in Google, Yahoo !, or Ask .com Phrase Searching by enclosing

    terms in double quotes

    OR searching with capitalized OR

    - excludes, + requires exact formof word

    Limit results by language inAdvanced Search

    Truncation - use OR searches

    for variants (airline ORairlines)

    Case sensitivity capitalizationdoes

    Fea tu res i n comm on

    Web Search Eng ine / 1 1 ( 5 9) A/Prof. Yang, Zhonghua

    Featu r es the search eng ines d i f fe r

    Search EngineGoogle

    www.google.com

    Yahoo! Searchsearch.yahoo.com

    Ask.comwww.ask.com

    L inks to help Goog le help pages Yahoo ! help pages Ask help pages

    Size, type

    HUGE. Size not

    disclosed in any waythat allows comparison.Probab ly the biggest.

    HUGE. Claims over20 billion total"w eb objects."

    LARGE. Claims tohave 2 billion fullyindexed,searchable pages.

    Noteworthyfeatures andlimitations

    Popularity rankingusing PageRank.Indexes the first 101KBof a Web page, and120KB of PDF's.~ before a word finds

    synonyms sometimes(~help > FAQ, tutorial,etc.)

    Shortcuts givequick access todictionary,synonyms, patents,traffic, stocks,

    encyclopedia, andmore.

    Subject-SpecificPopularityranking.Suggests broader

    and narrowerterms.

    Web Search Eng ine / 1 2 ( 59 ) A/Prof. Yang, Zhonghua

    Featu r es the search eng ines d i f fe r

    Search EngineGoogle

    www.google.com

    Yahoo! Searchsearch.yahoo.com

    Ask.comwww.ask.com

    Boolean logic

    Partial. AND assumedbetween words.Capitalize OR.- excludes.

    No ( ) or nesting.In Advanced Search,partial Booleanavailable in boxes.

    Accepts AND, OR,NOT or AND NOT,and ( ). Mus t be cap i ta l i zed .

    You must encloseterms joined by ORin parentheses(classic Boolean).

    Partial. ANDassumed betweenwords.

    Capitalize OR.- excludes.No ( ) or nesting.

    +Requires/ -Excludes

    - excludes+ will allow you toretrieve "stop words"(e.g., +in)

    - excludes+ will allow you tosearch commonwo rds: "+in truth"

    - excludes+ will allow you toretrieve "stopwords" (e.g., +in)

    Sub-Searching

    Sort of . At bottom ofresults page, click"Search withinresults" and entermore terms. Addsterms.

    Add terms. Sort of . Add terms.

  • 8/3/2019 03 Search Engine

    4/15

    Web Search Eng ine / 1 3 ( 5 9) A/Prof. Yang, Zhonghua

    Featu r es the search eng ines d i f fe r

    Search EngineGoogle

    www.google.comYahoo! Search

    search.yahoo.comAsk.com

    www.ask.com

    Results Ranking

    Based on pagepopularity measuredin links to it fromother pages: highrank if a lot of otherpages link to it.Fuzzy AND alsoinvoked.Matching and rankingbased on "cached"version of pages thatmay not be the mostrecent version.

    Automatic FuzzyAND.

    Based on Subject-SpecificPopular ity, linksto a page byrelated pages.

    Web Search Eng ine / 1 4 ( 59 ) A/Prof. Yang, Zhonghua

    Featu r es the search eng ines d i f fe r

    Search EngineGoogle

    www.google.comYahoo! Search

    search.yahoo.comAsk.com

    www.ask.com

    Field limiting

    link:site:intitle:inurl:Offers U.S.Gov'tSearch and otherspecial searches.Patent search.

    link:site:intitle:inurl:url:hostname:

    intitle:inurl:site:

    Truncation

    Stemming

    No truncation. Stemssome words. Searchvariant endings and

    synonyms separately,separating with OR(capitalized):a i r l ine OR a i r l ines

    Neither. Search

    with OR as inGoogle.

    Neither. Search

    with OR as inGoogle.

    Web Search Eng ine / 1 5 ( 5 9) A/Prof. Yang, Zhonghua

    Featu r es the search eng ines d i f fe r

    Search EngineGoogle

    www.google.comYahoo! Search

    search.yahoo.comAsk.com

    www.ask.com

    Language

    Yes. Major Romanizedand non-Romanizedlanguages inAdvanced Search.

    Yes. MajorRomanized andnon-Romanizedlanguages.

    Yes. MajorRomanizedlanguages. UseAdvanced Search to

    limit.

    Translation

    Yes, in Translate thispage link fol lowingsome pages. To andsometimes fromEnglish and majorEuropean languagesand Chinese,Japanese, Korean.

    Yes. No.

    Web Search Eng ine / 1 6 ( 59 ) A/Prof. Yang, Zhonghua

    Role o f search eng ines fo r e -com m erce

    80% of traffic determined by search

    60% would use search to research a purchase

    67% would choose a natural search result

    Examples (each month in the UK):

    500,000 search for shopp ing

    100,000 for clothes, shirts & shoes

    1,000,000 for mobile phone

    250,000 for furniture

    25,000 for bed linen

  • 8/3/2019 03 Search Engine

    5/15

    Web Search Eng ine / 1 7 ( 5 9) A/Prof. Yang, Zhonghua

    Wh y i t m a t t e r s f i nanc ia l ly

    what the term "natural" or "organic" search engine-listing means, they describe the "editorial" searchresults on any particular engine. These results are professed to be non-biased - meaning that the enginewill not accept money to influence the rankings of any individual sites.

    Shopper enters search.About 10,000 peoplelooked for ski jacketsin Dec 2004

    Natural/organic searchresults: over 79,000pages in the UK

    Analysis suggests roughly 30%of searchers will click a topthree result, another 20% onrest of page one (top ten).

    Paid for search results at a costof 0.62p per click for this

    keyword. Rough click-throughrate of 5-10%

    Web Search Eng ine / 1 8 ( 59 ) A/Prof. Yang, Zhonghua

    Wh y i t m a t t e r s f i nanci a ll y

    Search termNumber of

    searchesCl ic k thrus V is it or s Conv . r at io Order s

    Valuep.m.

    Value p.a.

    DVD player 300,000 30% 90,000 0.05% 45 20,205 242,460

    Sony DVD player 10,000 30% 3,000 1% 30 13,470 161,640

    Sony RDR-GX7 1,000 30% 300 5% 15 6,735 80,820

    Assumes a top three search resultand a purchase price of 449.

    Web Search Eng ine / 1 9 ( 5 9) A/Prof. Yang, Zhonghua

    All products and categories appear in Google & other major searchengines: completely & elegantly

    Sites perform well for generic searches

    Wh at s t h e goa l?

    55.8%

    9.6%

    3.8%

    2.6%2.2%1.5%

    21.7%

    2.8%

    Google

    Yahoo

    M SN

    AOL

    Lycos

    Altavista

    Ask Jeeves

    Others

    Web Search Eng ine / 2 0 ( 59 ) A/Prof. Yang, Zhonghua

    How do you com e t op o f Goog l e?

    1) Indexability

    2) Relevance

    3) Link popularity

  • 8/3/2019 03 Search Engine

    6/15

    Web Search Eng ine / 2 1 ( 5 9) A/Prof. Yang, Zhonghua

    I n d e x a b i l i t y

    The site must be navigated by robots andspiders

    I ts content must be readable

    Robots dont like frames

    Robots dont like Flash

    Robots cant read into product catalogues

    Web Search Eng ine / 2 2 ( 59 ) A/Prof. Yang, Zhonghua

    page titles

    URLs

    links

    image names

    body copy

    Wh at robo t s read and i ndex

    Web Search Eng ine / 2 3 ( 5 9) A/Prof. Yang, Zhonghua

    description meta tag

    w h a t r o b o t s r e a d an d i n d e x

    Web Search Eng ine / 2 4 ( 59 ) A/Prof. Yang, Zhonghua

    Relevance

    The content of your site must be relevant

    It must reflect the keywords

    Keywords are the words or phrases that webusers use to search for information on the

    web

    Where and how you place and present thesekeywords in your site is vital

  • 8/3/2019 03 Search Engine

    7/15

    Web Search Eng ine / 2 5 ( 5 9) A/Prof. Yang, Zhonghua

    c h oo si n g k e y w o r d s

    Keywords reflect:

    Your core business/ product/ service offering

    Your unique sales/ value proposition

    What your customers are looking for on theInternet

    Other influencing factors

    Popularity

    Saturation

    Relevance

    Priorities (& quantity)

    Web Search Eng ine / 2 6 ( 59 ) A/Prof. Yang, Zhonghua

    W h e r e t o p u t k e y w o r d s

    Page title (the single most important place)

    Description meta tag (appears in listings)

    Body headers (H1) and copy

    Image/ file names

    Image alt tags

    URLs

    Keywords meta tag

    and Offsite descriptions (directories etc)

    Web Search Eng ine / 2 7 ( 5 9) A/Prof. Yang, Zhonghua

    W h a t m a t t e r s

    Meta tags

    Description (pay attention to size)

    Page title (pay attention to size)

    Category page (make it relevant)

    Product page (make it relevant)

    Offsite relevance (d irectories, links)

    Web Search Eng ine / 2 8 ( 59 ) A/Prof. Yang, Zhonghua

    Popu la r i t y

    Determined primarily by number of inbound &relevant links

    Influenced by frequency and recency ofupdates

    Visible in Googles Page Rank

  • 8/3/2019 03 Search Engine

    8/15

    Web Search Eng ine / 2 9 ( 5 9) A/Prof. Yang, Zhonghua

    H o w d o I i n cr e as e p o p u l ar i t y ?

    Get lots of people to link to your site (w ith theright keywords)

    Common approaches:

    Get in the important directoriesSelf-managed affiliate programmes

    Develop valuable content

    Research, surveys and quizzes

    Weblogs (blogs)

    Social bookmarks (del.icio.us)

    Web Search Eng ine / 3 0 ( 59 ) A/Prof. Yang, Zhonghua

    t h e t w o m o s t i m p o r t a n t l i n k s

    Open directory Project Yahoo Directory

    Web Search Eng ine / 3 1 ( 5 9) A/Prof. Yang, Zhonghua

    How do Search Engines Work?

    Web Search Eng ine / 3 2 ( 59 ) A/Prof. Yang, Zhonghua

    Search ing a da t abase

    Search Engines for the general web (like allthose listed above) do not really search theWorld Wide Web directly.

    Each one searches a database of the full textof web pages selected from the billions of webpages out there residing on servers.When you search the web using a search engine,

    you are always searching a somewhat stale copy ofthe real web page.

    When you click on links provided in a searchengine's search results, you retrieve from theserver the current version of the page.

  • 8/3/2019 03 Search Engine

    9/15

    Web Search Eng ine / 3 3 ( 5 9) A/Prof. Yang, Zhonghua

    Robot s : Sp ider

    Search engine databases are selected andbuilt by computer robot programs calledspiders (Web craw ler).

    Although it is said they " crawl" the w eb in theirhunt for pages to include, in truth they stay in oneplace.

    They find the pages for potential inclusion byfollowing the links in the pages they already havein their database (i.e., already "know about") .

    They cannot think or type a URL or use judgment

    to "decide" to go look something up and see what'son the web about it.

    Web Search Eng ine / 3 4 ( 59 ) A/Prof. Yang, Zhonghua

    Page L inks sub m ission

    If a web page is never linked to in any otherpage, search engine spiders cannot find it.

    The only way a brand new page - one that no

    other page has ever linked to - can get into asearch engine is for its URL to be sent bysome human to the search engine companiesas a request that the new page be included.

    All search engine companies offer ways to do this.

    Web Search Eng ine / 3 5 ( 5 9) A/Prof. Yang, Zhonghua

    I n d e x i n g

    After spiders find pages, they pass them on toanother computer program for " indexing."

    This program identifies the text, links, and othercontent in the page and stores it in the searchengine database's files

    so that the database can be searched by keywordand whatever more advanced approaches areoffered, and the page w ill be found if your searchmatches its content.

    Web Search Eng ine / 3 6 ( 59 ) A/Prof. Yang, Zhonghua

    "Spiders" take a Web page'scontent and create key searchwords that enable online usersto find pages they're lookingfor.

  • 8/3/2019 03 Search Engine

    10/15

    Web Search Eng ine / 3 7 ( 5 9) A/Prof. Yang, Zhonghua

    W h a t t o l oo k

    When the Google spider looked at an HTMLpage, it took note oftwo things:The words w ithin the page

    Where the words w ere found

    Words occurring in the title, subtitles, metatags and other positions of relativeimportance were noted for specialconsideration during a subsequent usersearch.The Google spider was built to index every

    significant word on a page, leaving out the articles"a," "an" and "the."

    Other spiders take different approaches.

    Web Search Eng ine / 3 8 ( 59 ) A/Prof. Yang, Zhonghua

    Meta Tags

    Meta tags allow the owner of a page to specifykey words and concepts under which the pagew ill be indexed.

    There is, however, a danger in over-reliance onmeta tags, because a careless or unscrupulouspage owner might add meta tags that fit verypopular topics but have nothing to do w ith theactual contents of the page.

    To protect against this, spiders w ill correlate metatags with page content, rejecting the meta tagsthat don't match the words on the page.

    Web Search Eng ine / 3 9 ( 5 9) A/Prof. Yang, Zhonghua

    Meta tag ( NTU)

    Web Search Eng ine / 4 0 ( 59 ) A/Prof. Yang, Zhonghua

    The Met a Descr ip t i on Tag

    The meta description tag allows you toinfluence the description of your page in thecraw lers that support the tag

    But Google ignores the meta description tag and

    instead will automatically generate its owndescription for this page

  • 8/3/2019 03 Search Engine

    11/15

    Web Search Eng ine / 4 1 ( 5 9) A/Prof. Yang, Zhonghua

    Met a Robot s Tag

    The robots tag lets you specify that aparticular page should NOT be indexed by asearch engine.

    Web Search Eng ine / 4 2 ( 59 ) A/Prof. Yang, Zhonghua

    I n d e x i n g : w e ig h t

    To make for more useful results, most searchengines store more than just the word andURL.

    An engine might store the number of timesthat the word appears on a page.

    The engine might assign a weight to eachentry, w ith increasing values assigned towords as they appear near the top of thedocument, in sub-headings, in links, in themeta tags or in the title of the page.

    Each commercial search engine has a differentformula for assigning weight to the words in itsindex.

    Web Search Eng ine / 4 3 ( 5 9) A/Prof. Yang, Zhonghua

    How Search Eng ines Rank W eb Pages

    How do crawler-based search engines goabout determining relevancy follow a set ofrules, know n as an algorithm.

    Exactly how a particular search engine's algorithm

    works is a closely-kept trade secret. How ever, all major search engines follow the

    general rules below.

    Web Search Eng ine / 4 4 ( 59 ) A/Prof. Yang, Zhonghua

    How Search Eng ines Rank Web Pages

    One of the main rules in a ranking algorithminvolves the location and frequency ofkeywords on a web page. Call it the location /frequency method, for short.Pages w ith the search terms appearing in the HTML

    title tag are often assumed to be more relevantthan others to the topic.

    Search engines w ill also check to see if the searchkeywords appear near the top of a web page,

    Frequency is the other major factor in how searchengines determine relevancy. A search engine w illanalyze how often keywords appear in relation toother words in a web page

  • 8/3/2019 03 Search Engine

    12/15

    Web Search Eng ine / 4 5 ( 5 9) A/Prof. Yang, Zhonghua

    How Search Eng in es Rank Web Pages

    "off the page" ranking criteria.

    Off the page factors are those that a webmasterscannot easily influence. Chief among these is linkanalysis.

    By analyzing how pages link to each other, asearch engine can both determine what apage is about and whether that page isdeemed to be " important"

    Web Search Eng ine / 4 6 ( 59 ) A/Prof. Yang, Zhonghua

    How Search Eng in es Rank Web Pages

    In addition, sophisticated techniques are usedto screen out attempts by webmasters to build"artificial" links designed to boost their

    rankings. Another off the page factor is click through

    measurement.

    a search engine may watch what results someoneselects for a particular search, then eventuallydrop high-ranking pages that aren't attractingclicks, while promoting lower-ranking pages that

    do pull in visitors.

    Web Search Eng ine / 4 7 ( 5 9) A/Prof. Yang, Zhonghua

    Pl acemen t Ti ps fo r m os t " re levan t "

    Pick Your Target KeywordsHow do you think people w ill search for your web

    page? The words you imagine them typing into thesearch box are your target keywords.

    Your target keywords should always be at least

    two or more words long. Position Your KeywordsMake sure your target keywords appear in the

    crucial locations on your web pages. The page'sHTML title tag is most important.

    Build your titles around the top two or threephrases that you would like the page to be foundfor.

    Web Search Eng ine / 4 8 ( 59 ) A/Prof. Yang, Zhonghua

    Creat e Relevant Cont ent

    Your keywords need to be reflected in thepage content.

    consider "expanding" your text references,where appropriate.

    For example, a stamp collecting page might havereferences to "collectors" and " collecting."Expanding these references to "stamp collectors"and "stamp collecting" reinforces your strategickeywords in a legitimate and natural manner.

  • 8/3/2019 03 Search Engine

    13/15

    Web Search Eng ine / 4 9 ( 5 9) A/Prof. Yang, Zhonghua

    Bu i l d I n b o u n d L in k s

    Every major search engine uses link analysisas part of its ranking algorithm.

    By building links, you can help improve how

    well your pages perform in link analysissystems.

    You w ant links from good web pages that arerelated to the topics you want to be found for.

    Web Search Eng ine / 5 0 ( 59 ) A/Prof. Yang, Zhonghua

    Bu i l d I n b o u n d L in k s

    Here's one simple means to find those goodlinks.

    Using a search engine, search for your target

    keywords. Look at the pages that appear in the topresults.

    Now visit those pages and ask the site owners ifthey will link to you. Not everyone w ill, especiallysites that are extremely competitive with yours.

    Web Search Eng ine / 5 1 ( 5 9) A/Prof. Yang, Zhonghua

    Subm i t Your Key Pages

    Most search engines will index the otherpages from your web site by follow ing linksfrom a page you submit to them.

    submit the top two or three pages that best

    summarize your web site. Verify and Maintain Your Listing

    Web Search Eng ine / 5 2 ( 59 ) A/Prof. Yang, Zhonghua

  • 8/3/2019 03 Search Engine

    14/15

    Web Search Eng ine / 5 3 ( 5 9) A/Prof. Yang, Zhonghua

    I nv i s i b le Web pages

    Some types of pages and links are excludedfrom most search engines by policy.

    Others are excluded because search engine

    spiders cannot access them. Pages that are excluded are referred to as the

    "Invisible Web

    what you don't see in search engine results.

    Web Search Eng ine / 5 4 ( 59 ) A/Prof. Yang, Zhonghua

    Subm i t t i ng To D i recto r i es

    Submitting To Directories: Yahoo & The OpenDirectory

    The Open Directory Project (aka ODP or

    DMOZ) is a volunteer-built guide to the w eb. It is provided as an option at many major searchengines, including Google. Given this, being listedwith the Open Directory can add value to any site.

    Submission is absolutely free.

    Yahoo maintains its own independent"directory" of Web sitesAnyone can use Standard submission to submit for

    free to a non-commercial category.

    dmoz (from directory.mozilla.org, ODP's original domain name)

    Web Search Eng ine / 5 5 ( 5 9) A/Prof. Yang, Zhonghua

    Paid Search Adver t i s i ng

    Paid Search Advertising: Google AdWords,Yahoo Search Marketing & M icrosoft adCenter

    Every major search engine with significantmarket share accepts paid listings.

    This unique form of search engine advertisingguarantees that your site will appear in the topresults for the keyword terms you target w ithin aday or less.

    Paid search listings are also called sponsoredlistings and/ or Pay Per Click (PPC) listings.

    Web Search Eng ine / 5 6 ( 59 ) A/Prof. Yang, Zhonghua

    W hat Makes a Search Eng ine Good?

    Parts of SearchEngines

    Variables, and their implications for your searches

    Database of webdocuments

    Size of database: How many documents does the searchengine claim it has? How much of the total w eb are you ableto search? Freshness (" up-to-dateness"): Search enginedatabases consist of copies of web pages and o therdocuments that were made when their craw lers or spiderslast visited each site. How often is the database refreshed tofind new pages? How often do their crawlers update thecopies of the web pages you ar e searching? Completeness oftext: Is the database really "full" text, or only parts of thepages? Is every word indexed? Types of documents offered:All search engines offer w eb pages. Do they also haveextensive PDF, Word, Excel, Pow erPoint, and other formatslike WordP erfect? Are they full-text searchable? Speed andconsistency: How fast is it? How consistent is it? Do you get

    different results at different times?

  • 8/3/2019 03 Search Engine

    15/15