seo and analytics basics

70
Sreekanth Narayanan SEO and Analytics

Upload: sreekanth-narayanan

Post on 27-Jan-2015

119 views

Category:

Technology


1 download

DESCRIPTION

My recent presentation on SEO and Web Analytics.

TRANSCRIPT

Page 1: Seo and analytics basics

Sreekanth Narayanan

SEO and Analytics

Page 2: Seo and analytics basics

SEO and AnalyticsSEO Introduction

Search Engine basics

Technology Considerations

Tweaking your Content

Promoting Web Pages

Analytics Introduction

Tools for Analytics

Tools for Web Masters

Analytics – methods

Some Key Terminologies

Page 3: Seo and analytics basics

SEO and Analytics

SEO Introduction

Page 4: Seo and analytics basics

Search Engine Optimization has been a buzz word since the advent of major search engines

SEO deals with best practices outlined to make it easier for search engines to crawl, index and understand the content on your web page.

SEO – what’s that???

Page 5: Seo and analytics basics

SEO and Analytics

Search Engine basics

Page 6: Seo and analytics basics

How do search engines work?

Spiders (Also called Robots) comb the web by following links

Search engine formats the data is finds and stores in its database.

All the search engines maintain extensive and highly indexed databases.

Page 7: Seo and analytics basics

SEO – what’s that???

All trademarks belong to respective owners

Page 8: Seo and analytics basics

Indexing of the results is based on complex algorithms based on a number of complex parameters.

Due to the years of expertise gained by Web masters in analyzing the behaviors of the major Search Engines, there is a considerable knowledgebase on what makes pages more Search Engine Friendly.

SEO – what’s that???

Page 9: Seo and analytics basics

Many Search engines have launched paid services like the Google Ad Words

The Organic Search results are the ones which are not influences by paid or sponsored programs

SEO applies to the organic results. It normally has no impact on the results shown from sponsored links.

Paid and Organic Search Results

Page 10: Seo and analytics basics

SEO and Analytics

Technology Considerations

Page 11: Seo and analytics basics

Most web sites heavily make use of the UserAgent HTTP header to determine who the requestor of the page is.

Often the Web sites behavior is altered depending on what is passed on the user agent field.

Typical applications of this is changing the CSS for IE and Firefox - The (in)famous browser incompatibility issues

Forwarding a user to a Mobile version of the Web Site if the user agent happens to be a Mobile Device.

User-Agent HTTP Header

Page 12: Seo and analytics basics

The following are the most famous Robot user agent strings

The common Robot user agents

Page 13: Seo and analytics basics

Cloaking has been a very popular methodology used in the earlier days for SEO

It is a simple way disguising your website in to another text based (with a lot of keywords sprinkled all over) web site when a request is coming from a Web Robot (Spider).

Most Spiders are identifiable by their User Agent headers.

For e.g. the Google Robot is called the “Googlebot”As search engines strengthened their spam detection

technologies, they often started penalizing “Cloaked” web sites by removing them altogether from their indices.

As of today, cloaking is not considered a recommended practice and should be avoided in all scenarios.

Cloaking

Page 14: Seo and analytics basics

Simple-to-understand URLs will convey content information easily

It is easier for the user as well as the crawlers to organize.

Crawlers typically try to reduce priority of indexes of urls containing arbitrary numbers and characters.

PageRank (TM – Google Inc.) algorithm gives a lot of weightage to the number of pages which link to your page.

If your URLs are simpler it is easier for users to link your page.

If your URL contains relevant words, this provides users and search engines with more information about the page than an ID or oddly named parameter would

URL Structure

Page 15: Seo and analytics basics

Avoid using lengthy URLs with unnecessary parameters and session IDs

Avoid choosing generic page names like "page1.html"

Keep the directory nesting as simple as possible

Keep the directory names relevant to the content provided in the directory. Avoid using numbers for directory names

Do not mix up capital case in urls – like CreateOrder.html? – Users always prefer a single case (and lower case always)

URL best practices

Page 16: Seo and analytics basics

Web sites should be as flat as possible, with content relating to highly competitive keywords implemented on pages high on the hierarchy.

Rewrite URLs on the Server side to make them simpler and less nested.

Note that Search engines always assign a lower relevance score to data which is found deep nested inside the Website. The Content on the top folders are considered much more relevant.

URL best practices

Page 17: Seo and analytics basics

More than often, there are multiple ways to reach a same page on a Website.

Canonicalization is the process of picking the best URL when there are several choices, usually referring to the homepage of a website.

For e.g. consider http://www.google.com and http://google.com. Both URLs provide same content. Another example of this is “domain.com/aboutus.htm” and “blog.domain.com/aboutus.htm”

More than often search engines are intelligent enough to recognize that the content on the pages is the same, and they would pick one of the URLs, which might not be out preferred one.

Canonical URL

Page 18: Seo and analytics basics

There are a few ways to ensure that the proper URL is indexed:

When linking to your homepage always point to the same URL

When requesting links from other sites, always point to the same URL

Redirect the non‐www homepage to the www version of the homepage, use 301 Permanent redirects. A 301 redirect example (JSP) is shown below.

Canonical URL – best practices

<%response.setStatus(301);response.setHeader( "Location", "http://www.new-url.com/" );response.setHeader( "Connection", "close" );%>

Page 19: Seo and analytics basics

302 is a temporary redirect 301 is the permanent redirectAs far as possible use only 301 for redirection.

(Explained on previous slide)Always redirect from the server (Sample on previous

slide)302 redirects indicate that the content is temporary

and will be changed in the near future. Popularity attained by the previous site or page will not be passed on to the new site.

301 Permanent Redirects should be used when the change is long‐term or permanent, which allows Page Rank and link popularity to transfer. This is taken care by the indexing engines of all major search engines.

HTTP 301 &HTTP 302

Page 20: Seo and analytics basics

Name Value pairs are used on urls to provide information necessary to produce dynamic content.

Urls tend to become lengthy with name value pairsThey contain numbers which are typically treated

as junk by Search engines. Further “prod_code” does not make any sense to a

common user. A Product name would have been better

Use valuable keywords in the name‐value pairs whenever possible and keep the quantity of pairs to no more than three.

Name Value pairs in URLs

Page 21: Seo and analytics basics

Many sites have a front page where you need to enter your location or your details before it could give you information about products.

Search engines cannot input information, or make selections from form drop downs. This means search engine spiders are effectively locked out of relevant content and cannot index or rank the content.

Another problem is having a splash screen with a country chooser which does not allow people to go beyond that page without selecting the country to choose the locale.

It is better to have a default locale and go inside and then give an option to change it. The Robot will be able to index your pages with such a design.

User Input Fronting Screens

Page 22: Seo and analytics basics

Lot of sites use flash or JavaScript to do navigations. Search engine spiders are unable to follow Java

Script or Flash navigation and are therefore unable to find pages accessible only through Java Script or Flash navigation.

Flash might not be supported on all browsers. User might not have installed the plug-in or could

have disabled JavaScript.Only use HTML based navigationYou might have seen that most web 2.0 sites include a

full sitemap on the footer. This is done to make sure that all the flash/script navigation links are replicated in HTML form for the spiders to make use of.

Using mostly text for navigation

Page 23: Seo and analytics basics

The Web 2.0 footer

Page copyright mint.com

Page 24: Seo and analytics basics

Spiders cannot read flash content All links embedded in flash is never

navigated or indexed If you cannot do away with flash due to

usability reasons, implement a site with the same links in HTML

Implement user‐agent detection to deliver the HTML site to spiders and the Flash version to human visitors.

Provide alternative to flash content

Page 25: Seo and analytics basics

All Web crawlers limit the amount of content they index from a page

Typically this is limited to 100 KB of data.If you have too much in-page scripting, the only thing

the search engine might see is the script on your pageSome of the content on your page will be ignored if

the limit is reached. Crawlers ignore the <script> tag, but the total content read (100KB) includes the scripts as well.

It is always sensible to have your scripts on a different file and included on to your page. This way, you are not risking running out of the crawlers content limitations and still write a lot of code for dynamic behavior.

Excessive In page Scripting

Page 26: Seo and analytics basics

Following example shows the right way of doing this

Excessive In page Scripting

<link href="${ctx}/content/css/style.css" rel="stylesheet" type="text/css" /><script type="text/javascript" src="${ctx}/js/jquery-1.4.2.js"></script><script type="text/javascript" src="${ctx}/js/jquery.ui.core.js"></script><script type="text/javascript" src="${ctx}/js/jquery.dataTables.js"></script><script type="text/javascript" src="${ctx}/js/highcharts.js"></script><script type="text/javascript" src="${ctx}/content/js/page.mypage.js"></script>

function setDefaults(){ $('#genericError').hide(); $("#catgErr").hide(); $("#allCatgs").attr('checked', false);

$.ajax({url:"../callsomething", type : "POST", async:false, success:function(data){ var len = data.map.entry.length;

for (i =0 ; i < len; i++) {

//do something } }} ); }

Page 27: Seo and analytics basics

A web server assigns a unique session ID variable within the URL for each visit for tracking purposes.

Search engine spiders revisiting a URL will be assigned a different session ID each visit, which will result in each visit to a page appearing as a unique URL and causing indexing inconsistencies, and possibly duplicate content penalties.

Should implement user‐agent detection to remove the session ID’s for search engine visits.

Session Ids on the URL

Page 28: Seo and analytics basics

Setting the value of the "rel" attribute of a link to "nofollow" will tell search engine robots that certain links on your site shouldn't be followed or pass your page's reputation to the pages linked to

Very true for all the pages which allow user comments. Say you a famous company and allow people to post

feedback on your blog. Always set the “nofollow” to avoid the scenario like the following !

Sample : <a href="http://www.cheapdrugs123.com" rel="nofollow">Comment by a spammer</a>

“nofollow” settings

Page 29: Seo and analytics basics

Pages or content that is moved, removed, or changed can result in errors, such as a 404 Page Not Found.

Having a custom 404 page that kindly guides users back to a working page on your site can greatly improve a user's experience

Your 404 page should probably have a link back to your root page and could also provide links to popular or related content on your site.

NEVER EVER allow your 404 pages to be indexed in search engines

Do not use a design for your 404 pages that isn't consistent with the rest of your site

Repair all broken links as soon as possible

404 pages

Page 30: Seo and analytics basics

SEO and Analytics

Tweaking your Content

Page 31: Seo and analytics basics

Most Search Engines give a lot of weightage to what is the content in the <title> HTML tag

A title tag tells both users and search engines what the topic of a particular page is.

The <title> tag should be placed within the <head> tag of the HTML document

Ideally, you should create a unique title for each page on your site.

The <title> tag

Page 32: Seo and analytics basics

Always put a sensible title for every page. Do not repeat the text in all the pages or a group of pages unless it makes sense .

Make sure all your important business are reflected on the title

Never choose a title that has no relation to the content on the page

Never use default or vague titles like "Untitled" or "New Page 1“

<title> tag tips

Page 33: Seo and analytics basics

Always put a sensible title for every page. Do not repeat the text in all the pages or a group of pages unless it makes sense .

Make sure all your important business are reflected on the title

Never choose a title that has no relation to the content on the page

Never use default or vague titles like "Untitled" or "New Page 1“

Google displays 63 characters from the page title on the search results, which means the first 63 characters should contain all relevant detail you needed.

<title> tag tips

Page 34: Seo and analytics basics

A page's description meta tag gives search engines a summary of what the page is about

Limit descriptions to 250 characters•Include all targeted key phrases•Copy should be written with users in mind

(description copy appears in search results)

•Create a unique meta description for every page

<meta> tags

Page 35: Seo and analytics basics

Keywords are mentioned in the head section of the html.

Google gives very little importance to this Bing and Yahoo searches give some importance

to this (Still makes sense to specify this).The search engine normally does not display

these content in the search results.Use only relevant phrases on this tag. Use

distinct phrases for the pages.

<meta> keywords tag

Page 36: Seo and analytics basics

A lot of importance is given by the Search engines to what content appears inside the header tags.

Strictly one <h1> tag per page. This should be used for the most important heading on the page.

<h2> and <h3> tags also should be used for the most relevant headings

Always keep the natural hierarchy. First h1, second h2 and then h3.

Header tags <h1>, <h2>, <h3>

Page 37: Seo and analytics basics

Anchor text is the clickable text that users will see as a result of a link, and is placed within the anchor tag <a href="..."></a>.

e.g. <a href="http://www.mydomain.com/articles/our-prices.htm">Lowest prices on earth for international calls</a>

This text tells search engines something about the page you're linking to.

Avoid writing generic anchor text like "page", "article", or "click here"

Avoid using text that is off-topic or has no relation to the content of the page linked to

Avoid using CSS or text styling that make links look just like regular text

Importance of Anchor text

Page 38: Seo and analytics basics

Duplicate content exists when two or more pages within a website, or on different domains, share identical content.

Different domain names do not create distinct content. company.com/aboutus.html blog.company.com/aboutus.html

Major search engines consider duplicate content to be spam and are continually improving their spam filtering process to penalize and remove offenders.

Avoid duplication of content as far as possibleUse 301 permanent redirects to inform search

engines of the proper URL to utilize.

Duplication of Content

Page 39: Seo and analytics basics

Images form an integral part of any websiteThe "alt" attribute allows you to specify alternative

text for the image if it cannot be displayed for some reason

This is a very important usability aspect as the “screen reader” program used by blind people will identify and read out the alt text for them.

Another reason is that if you're using an image as a link, the alt text for that image will be treated similarly to the anchor text of a text link.

Optimizing your image filenames and alt text makes it easier for image search projects like Google Image Search to better understand and rank the images on your website.

Optimizing image content

Page 40: Seo and analytics basics

Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.

A sample can be seen here : http://www.robotstxt.org/robotstxt.html

All major search engine robots scan this file to see what pages are relevant to be crawled.

The Disallow tags specify which pages should be ignored by the crawler.

The robots.txt file

The robots.txt typically has the such information

Disallow: /residential/customerService/ Disallow: /residential/customerService/contacts.html Disallow:/residential/customerService/contactus/billing.html

Page 41: Seo and analytics basics

There are some important considerations when using /robots.txt:

Robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.

The /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use.

You could put all the files you don't want robots to visit in a separate sub directory, make that directory un-list-able on the web (by configuring your server), then place your files in there, and list only the directory name in the /robots.txt. Now an ill-willed robot won't traverse that directory unless you put a direct link on the web to one of your files, and then it's not /robots.txt fault.

The robots.txt file

Page 42: Seo and analytics basics

SEO and Analytics

Promoting Web Pages

Page 43: Seo and analytics basics

Internal linking between pages within a web site, such as navigational elements or a site map, plays an important role in how search engines perceive the relevancy and theme of both web pages.

Proper intra‐site linking will help facilitate effective spidering, in addition to increasing relevancy of pages

Maintain a sitemap.Keep sitemap pages to less than 100 links per

page Sitemaps should be linked directly from

homepage and other major pages throughout the web site

Linking your websites

Page 44: Seo and analytics basics

Effectively promoting your new content will lead to faster discovery by those who are interested in the same subject

Increasing back-linking to your site is one option, but it should be done properly.

Social Media site (e.g. the facebook like) adds to your link count. Typically it is not advised to link every small update in this fashion, as search engines now-a-days even understand those patterns.

You could include your updates to a RSS feed. You could link it from Blogs of people in the

related community.Search engines of today, do not only go by page

rank for determining the relevance. It also depends on traffic and content.

Promotion through external channels

Page 45: Seo and analytics basics

SEO and Analytics

Tools for Web Masters

Page 46: Seo and analytics basics

Every major search engine has launched their own set of Web master tools

Google: http://www.google.com/webmasters/Yahoo: http://siteexplorer.search.yahoo.com/Bing:

http://www.bing.com/toolbox/webmasters/

We will examine some of the most important tools which Google provides.

Webmaster tools

Page 47: Seo and analytics basics

Google provides the following services:

see which parts of a site Googlebot had problems crawling

notify Google of an XML Sitemap file analyze and generate robots.txt files remove URLs already crawled by Googlebot specify your preferred domain identify issues with title and description meta tags understand the top searches used to reach a site get a glimpse at how Googlebot sees pages remove unwanted site links that Google may use in

results receive notification of quality guideline violations

and request a site reconsideration

Webmaster tools

Page 48: Seo and analytics basics

SEO and Analytics

Analytics Introduction

Page 49: Seo and analytics basics

Web analytics is the measurement, collection, analysis and reporting of internet data for purposes of understanding and optimizing web usage.

It is a very important tool for Business and market research

Web analytics provides data on the number of visitors, page views, etc. to gauge the traffic and popularity trends which helps doing the market research.

Predominantly 2 TypesOff-siteOn-site

Web Analytics - Introduction

Page 50: Seo and analytics basics

Off-site web analytics refers to web measurement and analysis regardless of whether you own or maintain a website. It includes the measurement of a website's potential audience (opportunity), share of voice (visibility), and buzz (comments) that is happening on the Internet as a whole

On-site web analytics measure a visitor's journey once on your website. This includes its drivers and conversions; for example, which pages encourage people to make a purchase. On-site web analytics measures the performance of your website in a commercial context.

Web Analytics - Introduction

Page 51: Seo and analytics basics

SEO and Analytics

Analytics – methods

Page 52: Seo and analytics basics

Log file analysisAll Web servers record most of their transactions in a

log file. (Access log for Apache)Was the most prominent method when the web

evolved in late 90s.This involved running a tool to identify the hits to a

page from the log file and determine statistics from the same

Became very inaccurate in later times as there are a thousands of “non-human” actors on the web today. Googlebot is an example

Log File analysis also failed when users enabled their browser caches. This resulted in pages being cached on the browser and when the user requested for the same pages, no hit was made on to the Web server.

Methods for measuring

Page 53: Seo and analytics basics

Log file analysis

All Web servers record most of their transactions in a log file. (Access log for Apache)

Was the most prominent method when the web evolved in late 90s.

This involved running a tool to identify the hits to a page from the log file and determine statistics from the same

Became very inaccurate in later times as there are a thousands of “non-human” actors on the web today. Googlebot is an example

Methods for measuring

Page 54: Seo and analytics basics

Log file analysis – contd..

The tools adapted to the robots by measuring the hits based on cookie tracking and ignoring the known robots

This is not practical as robots are not only written by search engines, but also by spammers

Log File analysis also failed when users enabled their browser caches. This resulted in pages being cached on the browser and when the user requested for the same pages again, no hit was made on to the Web server and content was delivered from the cache.

Methods for measuring

Page 55: Seo and analytics basics

Page taggingDeveloped during later stages of the web Embeds a Java Script code segment on the pageWhen a tracking operation is triggered, data from

the HTTP Request, browser/system info and cookies are collected by the Script

The Script submits the data as parameters attached to a image request sent to the analytics server. (Single pixel image)

For e.g. take a look at the Google analytics data collection request which gets sent out.

Methods for measuring

http://www.google-analytics.com/__utm.gif?utmwv=4&utmn=769876874&utmhn=example.com&utmcs=ISO-8859-1&utmsr=1280x1024&utmsc=32-bit&utmul=en-us&utmje=1&utmfl=9.0%20%20r115&utmcn=1&utmdt=GATC012%20setting%20variables&utmhid=2059107202&utmr=0&utmp=/auto/GATC012.html?utm_source=www.gatc012.org&utm_campaign=campaign+gatc012&utm_term=keywords+gatc012& …..etc…..

Page 56: Seo and analytics basics

Methods for measuringPage tagging contd..After the invent of the XHR (XmlHttpRequest)

some of the page tagging scripts have used a AJAX submission of user data on to the collection server.

This is often bound to fail due to restrictions on the XHR (Domain of Origin) on most of the modern browsers.

As the page tagging approach Involves downloading a one pixel image from a domain (like Google) this adds an additional DNS (Domain Name System) lookup to your page which is sometimes looked upon as obstructive to page loading.

Page 57: Seo and analytics basics

Page tagging is the de-facto standard followed as of today

It has a significant advantage that it works even for pages hosted on the cloud, meaning that you do not need to have dedicated web servers and monitor their logs

Analytics today is mostly an outsourced service. There are many specialist providers like Google and Adobe. And page tagging is the only method supported there.

Page tagging is the new Analytics

Page 58: Seo and analytics basics

SEO and Analytics

Tools for Analytics

Page 59: Seo and analytics basics

Google AnalyticsFree from Google

(5M page view cap per month for non AdWords advertisers.)

Uses Page Tagging as Analytics MethodUser embeds a Script in to the page The Script collects information on the page

actions and submits the same to the Analytics Server by using the data as parameters on an image fetch

Detailed reports are presented to the user by logging into your Google account

Major tools – Web Analytics

Page 60: Seo and analytics basics

Google Analytics Results

Page 61: Seo and analytics basics

Omniture Fusion (Adobe)Uses page tagging for information collectionYou include a Script snippet on all the pages

which are tracked. The information is submitted through Script

call, almost same as what Google does, as parameters to “1px x 1px” transparent image request

Major tools – Web Analytics

<body><script language="javascript" src="INSERT-DOMAIN-AND-PATH-TO-CODE/s_code.js" type="text/javascript"></script><script language="javascript" type="text/javascript"><!-- /* Copyright 1997-2004 Omniture, Inc. */s.pageName="“var s_code=s.t();if(s_code)document.write(s_code)//--></script></body></html>

Page 62: Seo and analytics basics

Onmiture Reports

Page 63: Seo and analytics basics

SEO and Analytics

Some Key Terminologies

Page 64: Seo and analytics basics

KPIs are those metrics which give information on what changes could drive more effectiveness on your website

All KPIs are metrics, but not all metrics are KPIs.

In Web Analytics it becomes very critical to measure the right things.

Web Analytics KPIs

Page 65: Seo and analytics basics

First-party cookies are cookies that are associated with the host domain.

Third-party cookies are cookies from any other domain.

You go to the site http://yahoo.comThere is a banner ad on this site for

http://youbuy.comBoth yahoo.com and youbuy.com place

cookies on your browser So for you, the cookie from yahoo.com is a

First Party cookie and the one from youbuy.com is a Third Party cookie.

First and Third Party Cookies

Page 66: Seo and analytics basics

So if I had placed the Google analytics Script on our page http://mozvo.com, and it had placed a cookie for the domain “google.com”, then that would have been a third party cookie

Third party cookies are widely discouraged as there are quite a few sites which plant tracker cookies.

A lot of users (about 40%) disable third party cookies

All of the analytics providers have switched to using first party cookies to track information.

Which means that the user will see only cookies from mozvo.com even though the Google analytics code is embedded on the page.

First and Third Party Cookies

Page 67: Seo and analytics basics

The Bounce Rate : The bounce rate for the homepage, or any other page through which visitors enter your site, tells you how many people 'bounce' away (leave) from your site after viewing one page.

Hence having a low bounce rate is preferred.

Click Through Rate : Click-through rate (or click-thru rate) tells you how many people are clicking through to your site from a third-party. For example from a link, search engine, banner, advertising or email campaign.

A Higher Click Through rate is preferred.

Bounce Rate and Click through rate

Page 68: Seo and analytics basics

Clickstreams, also known as clickpaths, are the route that visitors choose when clicking or navigating through a site.

A clickstream is a list of all the pages viewed by a visitor, presented in the order the pages were viewed, also defined as the ‘succession of mouse clicks’ that each visitor makes.

A clickstream will show you when and where a person came in to a site, all the pages viewed, the time spent on each page, and when and where they left.

The most obvious reason for examining clickstreams is to extract specific information about what people are doing on your site..

Click Stream Analysis

Page 69: Seo and analytics basics

http://static.googleusercontent.com/external_content/untrusted_dlcp/www.google.com/en/us/webmasters/docs/search-engine-optimization-starter-guide.pdf

http://www.bing.com/community/site_blogs/b/webmaster/archive/2009/09/03/search-engine-optimization-for-bing.aspx

http://help.yahoo.com/l/us/yahoo/search/indexing/ranking-02.html;_ylt=AiB.kJ7SxMRMNktmvnsyomX.YHhG

http://www.bivings.com/thelab/presentations/SEO_Basics.pdf

References

Page 70: Seo and analytics basics

Thank you !http://nsreekanth.blogspot.com/

Thank You !