basics of search engines and algorithms (1)

31
How Search Engine Works ? Presented by Mohammed Azharuddin Digital Marketing Trainer

Upload: kongara

Post on 12-Aug-2015

195 views

Category:

Education


0 download

TRANSCRIPT

How Search Engine Works ?

Presented by Mohammed Azharuddin

Digital Marketing Trainer

History of Search

• 1990 – Archi Query Form – FTP based file search engine

• Feb 1993 – Excite.com– General word relation based search

• Oct 1993 – AliWeb– Manual submission engine

• Jan 1994 – Altavista– First natural language search engine

• Jan 1996 – Backrup– Started by Larry Page and Segrey Brin

• Sep 15 1997 – Google.com– First search engine with Page Rank Technology

• 1997 – Yandex.com – Russian based search engine

• 1998 – MSN Search– Microsoft Rival to Google

• 2000 – Baidu.com– Chinese based search engine

• 2008 – duckduckgo.com– Non tracking search engine

• 2009 – Bing.com – Microsoft Rival to Google

• 2010 – Blekko.com– Spam and Virus free search

http://www.searchenginehistory.com/http://www.google.co.in/about/company/history/

http://www.wordstream.com/articles/internet-search-engines-history

The Google Story

Search Engine Architecture

• Every search engine is based on following

–Crawling

– Indexing

–Algorithms

–Results

– Fight Spam

Google Architecture

http://infolab.stanford.edu/~backrub/google.html

Search Engine Architecture

CrawlerStore

Indexer

100 Million GBindexes

indexes

Search Interface

Algorithms(Programs)

trash

trash

trash

Sorted based on Content / Factors

WWW

60 Trillion PagesOr

60 Lakh CroreLive Google Example

Algorithms

• Programs and Formulas to get relevant results

– Page Rank

– Spelling Check

– Synonym check

– Auto complete

– Query Understanding

– Safe Search

– User Context

Page Rank Algorithm

• Google's first algorithms, which looks at links between pages to determine their relevance.

• PR is a number generated for each page available in Google Index

• PR Toolbar Range – NA to 10 (Best Rank) : This is based on Log Scale

of 0 – 10

• Real Page rank is calculated based on number of pages in index, which can be 0.15 to Trillions

Toolbar Vs. Real PR

Toolbar Real PR

0 0 - 10

1 100 - 1,000

2 1,000 – 10,000

3 10,000 – 100000

4 100000 – 1000000

5 1000000 - 10000000

http://www.webworkshop.net/pagerank_calculator.php3

PR Formula

Updated Formula

Old Formula

D = Damping Factor ; PR(N) = PR of Linking Site ; L(N) : No of Outbound Links

Example

http://en.wikipedia.org/wiki/PageRankhttp://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.htm

Fighting Spam

• Spam refers to websites which uses un ethical practices for Search Rankings

• To fight the spam Google release updates frequently called as “Algorithm Updates”

• Google changes its search algorithm around 500 – 600 times every year.

• Some of them are major and few are minor updates

Major Updates

• Panda Update - February 23, 2011

– This algorithm target the sites with thin content, content farms, duplicate content, sites with high ad-to-content ratios, and a number of other quality issues.

– Affected 12% queries on launch

– Recent update : Panda 4 – May 19 2013

• Penguin Update – April 24, 2012

– This algorithm target the sites which over optimize the websites, uses excessive links.

– Affected 3% queries on launch

– Recent update : Pengiun 2.1 – Oct 4 2013

Humming Bird Update – August 2013

• This algorithm understands the context of the query by analyzing the words in query

• It can automatically rewrite the query internally based on certain words like “Near”, Vs, How to, Where, Who is …. Etc

• Many queries are provided as “ONE BOX ANSWERS” to give the quick answers.

How it Works ?

User QueryQuery

TranslatorModified

Query

Index

One Box Answers Queries

• When is Independence of India • Time in India or Time in Toronto • 1$ to INR • 1Mile to Kms• Banana Vs. Apple • Who is wife of Bill Gates • What is my IP • who invented www• Show me pictures of taj mahal

Search Engine Results Page(SERP)

Types of Results

Paid Results

PPC Ads

Comparison Ads

Shopping Ads

Non Paid Results

Organic Web

News Results

Image Results

Local Results

Video Results

Site Links

Schema Data

Click Through Rate (CTR)

• CTR is a measure to understand how many users are clicking on the site from SERP

• CTR helps to understand the user response

• The top four positions “above the fold” for many desktop users, receive 83% of first page organic clicks.

CTR = (No of Clicks/No of Impressions)x100

2011

2012 CTR Results

Branded Vs. Un Branded

Thank you

Give us your feedback