basics of search engines and algorithms
DESCRIPTION
Web Trainings Academy presents the Part 1 of the SEO Training Series. Learn about the concepts of Search Engines, Architecture, Serp and Search Algorithm Updates. Presented by Mohammed AzharuddinTRANSCRIPT
Search Engine Optimization
How Search Engine Works ?
Presented by Mohammed Azharuddin
Contact Info
• Facebook: Md Azharuddin Barkati • Twitter : mdazhar01 • Gmail : [email protected]
History of Search • 1990 – Archi Query Form – FTP based file search engine
• Feb 1993 – Excite.com– General word relation based search
• Oct 1993 – AliWeb– Manual submission engine
• Jan 1994 – Altavista– First natural language search engine
• Jan 1996 – Backrup– Started by Larry Page and Segrey Brin
• Sep 15 1997 – Google.com– First search engine with Page Rank Technology
• 1997 – Yandex.com – Russian based search engine
• 1998 – MSN Search– Microsoft Rival to Google
• 2000 – Baidu.com– Chinese based search engine
• 2008 – duckduckgo.com– Non tracking search engine
• 2009 – Bing.com – Microsoft Rival to Google
• 2010 – Blekko.com– Spam and Virus free search
http://www.searchenginehistory.com/http://www.google.co.in/about/company/history/
http://www.wordstream.com/articles/internet-search-engines-history
The Google Story
Search Engine Architecture
• Every search engine is based on following
–Crawling– Indexing–Algorithms–Results – Fight Spam
Google Architecture
http://infolab.stanford.edu/~backrub/google.html
Search Engine Architecture
CrawlerStore
Indexer
100 Million GBindexes indexes
Search Interface
Algorithms(Programs)
Query
Results
trash
trash
trash
Sorted based on Content / Factors
WWW
60 Trillion PagesOr
60 Lakh CroreLive Google Example
Algorithms
• Programs and Formulas to get relevant results– Page Rank – Spelling Check – Synonym check – Auto complete– Query Understanding – Safe Search – User Context
Page Rank Algorithm
• Google's first algorithms, which looks at links between pages to determine their relevance.
• PR is a number generated for each page available in Google Index
• PR Toolbar Range – NA to 10 (Best Rank) : This is based on Log Scale of
0 – 10 • Real Page rank is calculated based on number of
pages in index, which can be 0.15 to Trillions
Toolbar Vs. Real PRToolbar Real PR
0 0 - 10
1 100 - 1,000
2 1,000 – 10,000
3 10,000 – 100000
4 100000 – 1000000
5 1000000 - 10000000
http://www.webworkshop.net/pagerank_calculator.php3
PR Formula
Updated Formula
Old Formula
D = Damping Factor ; PR(N) = PR of Linking Site ; L(N) : No of Outbound Links
Example
http://en.wikipedia.org/wiki/PageRankhttp://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.htm
Fighting Spam • Spam refers to websites which uses un ethical
practices for Search Rankings• To fight the spam Google release updates
frequently called as “Algorithm Updates” • Google changes its search algorithm around
500 – 600 times every year. • Some of them are major and few are minor
updates
Major Updates
• Panda Update - February 23, 2011
– This algorithm target the sites with thin content, content farms, duplicate content, sites with high ad-to-content ratios, and a number of other quality issues.
– Affected 12% queries on launch – Recent update : Panda 4 – May 19 2013
• Penguin Update – April 24, 2012
– This algorithm target the sites which over optimize the websites, uses excessive links.
– Affected 3% queries on launch
– Recent update : Pengiun 2.1 – Oct 4 2014
Humming Bird Update – August 2013
• This algorithm understands the context of the query by analyzing the words in query
• It can automatically rewrite the query internally based on certain words like “Near”, Vs, How to, Where, Who is …. Etc
• Many queries are provided as “ONE BOX ANSWERS” to give the quick answers.
How it Works ?
User Query Query Translator
Modified Query
Index
One Box Answers Queries
• When is Independence of India • Time in India or Time in Toronto • 1$ to INR • 1Mile to Kms • Banana Vs. Apple • Who is wife of Bill Gates • What is my IP • who invented www• Show me pictures of taj mahal
Search Engine Results Page(SERP)
Types of Results
Paid Results
PPC Ads
Comparison Ads
Shopping Ads
Non Paid Results
Organic Web
News Results
Image Results
Local Results
Video Results
Site Links
Schema Data
Click Through Rate (CTR)
• CTR is a measure to understand how many users are clicking on the site from SERP
• CTR helps to understand the user response
• The top four positions “above the fold” for many desktop users, receive 83% of first page organic clicks.
CTR = (No of Clicks/No of Impressions)x100
2011
2012 CTR Results
Branded Vs. Un Branded
Thank you
Give us your feedback