the professionals guide to pagerank optimization

Upload: jason-prance

Post on 30-May-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    1/35

    The Professionals Guide To PageRank

    Optimization

    ContentsContents ............................................................. 1

    Introduction ........................................................ 3

    PART I - Understanding the Theory behindPageRank Optimization ........................................ 3

    What is PageRank? ........................................................................................3

    Who Invented PageRank? .......................................................................................................................................3

    What is the purpose of PageRank? .........................................................................................................................3

    How does PageRank affect rankings? ...........................................................4

    How is PageRank measured? ........................................................................5

    Toolbar PageRank vs. Real PageRank ....................................................................................................................5

    Assumptions about PageRank .......................................................................8

    PageRank is Still a Measurement of Importance ....................................................................................................8

    PageRank Still Doesn't Measure Relevance ...........................................................................................................8

    PageRank is Still a Relative Measurement ................................................................................................ ...... ...... .8

    Pages Don't Vote for Themselves ...........................................................................................................................9

    Each Page Can Only Vote for another Page Once ..................................................................................................9

    The Damping Factor is Constant ........................................................................................................ ...... ...... ......10

    Calculating PageRank .................................................................................11

    Introducing the PageRank Function ......................................................................................................................11

    Accumulating PageRank vs. Distributing PageRank ........................................................................................... ..13

    Iterative Calculations and Convergence of PageRank ..........................................................................................14

    PageRank Behavior .....................................................................................16

    Maximum PageRank per System .................................................................................................................... ......16

    Site PageRank vs. Page PageRank .......................................................................................................................17

    Add Links to Important Pages ...............................................................................................................................18

    Subtract Links from Unimportant Pages ...............................................................................................................18

    Add Content ..........................................................................................................................................................19

    Subtract Content ................................................................................................................................... ...... ...... ...20

    Ideal PageRank Distribution ........................................................................21

    Natural Distribution from a Hierarchical Structure ...............................................................................................21

    Site Architecture Definitions .................................................................................................................................22

    Depth of Content on a Website ............................................................................................................................23

  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    2/35

    PART II - Tools and Techniques for SculptingPageRank .......................................................... 25

    Introduction to PageRank Sculpting ............................................................25

    Natural vs. Unnatural Linking ................................................................................................................... ...... ......25

    Link Level PageRank Controls .....................................................................25rel="nofollow" .......................................................................................................................................................26

    JavaScript ..............................................................................................................................................................26

    ..............................................................................................................................................................29

    Flash .................................................................................................................................................... ...... ...... .....32

    ............................................................................................................................................................... ..32

    Summary Chart of Link Level PageRank Controls .......................................32

    Page Level PageRank Controls ....................................................................33

    Robots Meta Tag ...................................................................................................................................................33

    robots.txt ..............................................................................................................................................................34

    301 Permanent Redirects .................................................................................................................................. ...34

    302 Temporary Redirects .....................................................................................................................................34

    Summary Chart of Page Level PageRank Controls ......................................35

    Conclusion ...................................................................................................35

    Resources ...................................................................................................35

  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    3/35

    IntroductionThe purpose of this guide is to provide experienced SEO consultants and web

    developers with a high-level understanding of PageRank optimization. The guide is

    broken down into two parts. Part 1 covers the theory behind PageRank and providessimple illustrations that will help you understand how PageRank "flows" throughout

    a website. Part 2 provides practical strategies and solutions for optimizing the

    PageRank on your site through linking and page controls, and it also offers tips on

    how to get the most out of your inbound links.

    PART I - Understanding the Theory behind

    PageRank Optimization

    What is PageRank?Most Web developers and SEOs think they know everything they need to about

    PageRank, but in reality, very few people have a solid understanding of how it is

    calculated or how it affects rankings. This section will discuss all of those important

    details.

    Who Invented PageRank?PageRank was invented by (and derives its name from) Larry Page. While at

    Stanford University in the late 1990s, Page and his fellow Google cofounder, Sergey

    Brin, wanted to create a search engine that could outperform the existing search

    engines at that time. The other search engines relied heavily on text analysis to

    calculate relevance, but Page and Brin were confident that Google could return

    higher-quality search results by calculating relevance andimportance. PageRank

    made it possible for Google to calculate the relative importance of webpages.

    What is the purpose of PageRank?The purpose of PageRank is to help Google return search results that human visitors

    consider important. It does this by assigning a numerical value to every

    webpage/URL it finds. This value is often referred to as "real PageRank" or "internal

    PageRank," but this guide will refer to it simply as "PageRank." Every webpage

    starts with a small amount of PageRank, which increases as other pages link to it. Itis assumed that each page on the internet is controlled by a human, and that

    humans link to important pages. Therefore, pages with the highest PageRank

    should represent what humans consider to be the most important.

  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    4/35

    How does PageRank affect rankings?One of the most common misconceptions about PageRank is how it influences

    rankings. Many SEOs will have various opinions about the importance of PageRank,

    but the best answer would be one from Google itself:

    The heart of our software is PageRank, a system for ranking web pagesdeveloped by our founders Larry Page and Sergey Brin at Stanford University.

    And while we have dozens of engineers working to improve every aspect of

    Google on a daily basis, PageRank continues to play a central role in many of

    our web search tools. (Corporate Information: Technology Overview, Google)

    Here, Google officially states that PageRank is important, but neglects to explain

    how PageRank affects rankings. According to Google:

    Traditional search engines rely heavily on how often a word appears on a

    web page. We use more than 200 signals, including our patented PageRank

    algorithm, to examine the entire link structure of the web and determinewhich pages are most important. We then conduct hypertext-matching

    analysis to determine which pages are relevant to the specific search being

    conducted. By combining overall importance and query-specific relevance,

    we're able to put the most relevant and reliable results first. (Corporate

    Information: Technology Overview, Google)

    Webpages are ranked using 200+ signals, but the majority of these signals fall

    under two main categories:

    1. Relevance - when a user types in a query, Google finds all the documents in

    its index that contain the words from that query.

    2. Importance - after Google has fetched the relevant documents, it sorts them

    by importance.

    In other words, we can increase the importance of a page by increasing its

    PageRank, but if the page is NOT relevant to a user's query, it still won't rank for

    that query. This is a key point to remember when sculpting PageRank. The

    techniques described in this guide can increase importance--but not relevance.

    http://www.google.com/corporate/tech.htmlhttp://www.google.com/corporate/tech.htmlhttp://www.google.com/corporate/tech.htmlhttp://www.google.com/corporate/tech.htmlhttp://www.google.com/corporate/tech.htmlhttp://www.google.com/corporate/tech.html
  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    5/35

    How is PageRank measured?

    Toolbar PageRank vs. Real PageRankThroughout this guide, we will use the term "PageRank" to mean real PageRank

    the PageRank value that starts at 0.15 and can go into the millions. However, the

    most commonly used PageRank scale is the "toolbar PageRank." The toolbar

    PageRank value is shown on the Google Toolbar, which is an add-on feature of most

    modern Web browsers. Here is an example of the Google toolbar in Firefox 2.0.

    The toolbar PageRank scale goes from 0 to 10. Google takes all the real PageRank

    values of every page in its index and separates them into these eleven differentranges. It is important to note that the toolbar scale is notlinear it is logarithmic.

    So a page that has a toolbar PageRank of 6 does NOT actually have twice as much

    real PageRank as a page with a toolbar PageRank of 3. In reality, it would have

    many more times as much real PageRank. Here is a graph to help you visualize this

    concept.

  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    6/35

    Only Google knows what the actual base value would be (or if it is indeed a

    logarithmic scale), but to keep things simple, we are using a base value of 2 in this

    example. If the base value really were 2, then this would be the ranges of real

    PageRank that each toolbar PageRank value represents:

    ToolbarPageRank

    RealPageRank

    0 1 - 2

    1 2 - 4

    2 4 - 8

    3 8 - 164 16 - 32

    5 32 - 64

    6 64 - 128

    7 128 - 256

    8 256 - 512

    9 512 - 1,024

  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    7/35

    10 1,024 +

    Some "PageRank enthusiasts" estimate that the actual base would be around 5 or

    6. Here is another example that uses 5.5 as a base:

    ToolbarPageRank

    Real PageRank

    0 1 - 6

    1 6 - 30

    2 30 - 166

    3 166 - 915

    4 915 - 5,033

    5 5,033 - 27,6816 27,681 - 152,244

    7 152,244 - 837,339

    8 837,339 - 4,605,367

    9 4,605,367 - 25,329,516

    10 25,329,516 +

    Regardless of what the base is, here are a few things you should know about the

    toolbar PageRank and real PageRank values:

    Each toolbar PageRank value corresponds to a wide range of realPageRank values. This means that even if two pages have the same toolbar

    PageRank value, their real PageRank values can vary greatly. Conversely, if

    one page has a toolbar PageRank of 5 and another page has a 6, then these

    two pages might have real PageRank values that are actually very close.

    Every toolbar PageRank value is exponentially greater than the one

    before it, and is thus more difficult to achieve. After a page is created,

    it can easily increase its toolbar PageRank from a 0 to a 1. However, an

    increase from a 6 to a 7 would be much more difficult and require

    significantly more inbound links.

    The toolbar PageRank value is not frequently updated. We speculate,

    with a high degree of certainty, that the real PageRank values are what

    Google actually uses in their algorithm, and these values are constantly

    changing. However, the toolbar PageRank values are only updated every 3 -

    4 months, and their value is based on the current real PageRank value at that

    time. In other words, if a page is created and it immediately acquires a large

  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    8/35

    number of inbound links, the toolbar PageRank value isn't going to reflect

    that until the next time Google updates their toolbar PageRank numbers, but

    the page will still receive the ranking benefits from those links, whether the

    toolbar shows it yet or not.

    Assumptions about PageRankOne of the challenges of writing a detailed PageRank sculpting guide is overcoming

    the limited availability of recent or reliable information on PageRank. The original

    paper about PageRank was written before Google was a highly profitable company,

    so we can assume the information it contains was at least true at that time, but

    Google has undergone many changes and modifications over the last decade.

    Therefore, in order to make this guide as complete and as accurate as possible, we

    need to establish a set of assumptions from which to build.

    PageRank is Still a Measurement of ImportanceEven though Google has publicly admitted to making changes to the PageRank

    algorithm over the years, one thing we will assume is that it still serves the same

    basic purpose: In other words, Google may have tweaked certain details about how

    PageRank is calculated, but we can almost certainly be sure that PageRank still uses

    the linking structure of the web to measure the relative importance of every page.

    PageRank Still Doesn't Measure RelevanceAs we explained in the previous section, PageRank does not measure relevance it

    measures importance. Therefore, the on-page content of a given webpage does not

    affect its PageRank value. In other words, you can increase the PageRank ofyour homepage by getting more people to link to it, but not by adding

    more keywords to the page.

    PageRank is Still a Relative MeasurementThe PageRank function doesn't just measure importance it measures relative

    importance. The function literally "ranks" every page on the Web, by placing them

    in a specific order of most-important to least-important. Therefore, unless every

    page on the internet links to every other page on the internet*, they can't all be

    equally important. Some pages are inevitably going to be more important than

    others. PageRank may have changed since it was originally invented, but it can't

    stop performing its primary purpose of ranking each page, relative to the rest. It is

    for this reason that PageRank sculpting is possible. By telling Google which pages

    are NOT important, we can make the rest appear more important by comparison.

    *SEOmoz does not recommend placing several billion links on each of your pages.

  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    9/35

    Pages Don't Vote for ThemselvesOne way that Google describes PageRank is by comparing it to a system of votes.

    One of the original characteristics of PageRank that made it an appealing option for

    ranking webpages was that it relies on the democratic nature of the web. In other

    words, it is resistant to manipulation because it requires other pages to "vote" for a

    page by linking to it. In this analogy, it makes sense that Google would ignoreinstances where a page votes for itself by linking to its own URL. Even if Google

    does count links from a page to itself (i.e. even if our assumption is false), this

    would not have a significant effect on the outcome of PageRank optimization

    techniques described in this guide. For example, consider a page that links to 50

    other pages. Each link would receive 1/50th of the distributed PageRank. If the

    linking page links to itself (and we count that link as part of the PageRank

    calculation), this would only reduce the other pages' share of PageRank from 1/50

    to 1/51 - or a difference of 1/2550 of the linking page's PageRank. In any case, we

    will assume Google does not count them.

    Each Page Can Only Vote for another Page OnceIn this guide, we are going to assume that Google's graph of linked pages can be

    represented by a square graph, in which all pages are assigned to one column and

    one row. We are also going to assume that each square of the graph can only

    contain one of two values: Y or N. In other words, each square is an intersection of a

    row (linking page) and a column (linked page), and the square answers the

    question: "Does this page link to this page?" See the following illustration.

  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    10/35

    The Damping Factor is ConstantA common way to describe PageRank is the "random surfer" analogy. Imagine

    someone surfing the internet by clicking a link at random on every page they visit.PageRank represents the probability that the surfer will be on a certain page at any

    given point in time. In other words:

    More inbound links Higher PageRank Higher chance that random

    surfer lands on page

    Part of the PageRank function is the damping factor, d. In our random surfer

    analogy, the damping factor represents the probability that the surfer will keep

    clicking random links on the pages they visit, while (1 - d) represents the possibility

    that the surfer will "get bored" with the current page and type in a new URL (that

    they somehow know exists), instead of clicking a link. By setting the damping factor

    as a constant, we are also setting (1 - d) as a constant, which means we're saying

    that when a surfer gets bored, the URL they type in is totally random and unbiased.

    Even though there have been papers written over the last several years that

    propose methods for weighting certain URLs, depending on the surfer's history and

    preferences (e.g. personalized search), we are still going to proceed with this guide

    under the assumption that URLs are not biased in this way. The illustrations and

  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    11/35

    examples in this guide are going to use nonspecific webpages that are assumed to

    be equal in quality, trust, content, age, etc.

    Calculating PageRankThe calculation of PageRank can quickly become a very complex topic. In the paper,

    The PageRank Citation Ranking: Bringing Order to the Web, Lawrence Page, SergeyBrin, Rajeev Motwani, and Terry Winograd wrote 17 pages about it, but hopefully we

    can cover the basics in fewer words.

    Introducing the PageRank FunctionAccording toThe Anatomy of a Large-Scale Hypertextual Web Search Engine, the

    original paper written by Sergey Brin and Larry Page, the formula for determining

    PageRank is given as follows:

    We assume page A has pages T1...Tn which point to it (i.e., are citations). The

    parameter d is a damping factor which can be set between 0 and 1. We

    usually set d to 0.85. There are more details about d in the next section. Also

    C(A) is defined as the number of links going out of page A. The PageRank of a

    page A is given as follows:

    PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

    For most of us, that description probably isn't too helpful, and the function itself

    looks intimidating. However, it's actually much simpler than it looks. Let's break it

    down into smaller pieces, and analyze each piece, one at a time. You can also use

    the illustrations to help you visualize how each piece of the function fits together.

    http://dbpubs.stanford.edu:8090/pub/1999-66http://infolab.stanford.edu/~backrub/google.htmlhttp://dbpubs.stanford.edu:8090/pub/1999-66http://infolab.stanford.edu/~backrub/google.html
  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    12/35

    PR(A) This basically means "the PageRank of page A."

    (1-d) This is the small amount of PageRank that every webpage starts with.

    The variable d represents a value determined by Google and is often referred

    to as the damping factor. The paper suggests a value of .85, so that's what

    we'll work with in this guide. (1-d) would then be equal to .15. In other words,every new webpage starts with an initial PageRank value of .15.

    d( ) This is the second appearance of d. First we subtracted it from 1 to get

    the initial PageRank value for page A. Now we are multiplying it by

    everything between the ( ). This multiplication is what causes the "damping."

    Assuming that d is .85, this means that if we add up all the PageRank coming

    to page A through inbound links, page A will only get 85 percent of it. Without

    this damping effect, PageRank calculations would create an infinite loop of

    increasing values.

    PR(T1)/C(T1) This whole thing can be understood to mean "the PageRankcoming from webpage T1." The numerator, PR(T1), represents T1's PageRank,

    and the denominator, C(T1), represents the number of crawlable links on T1.

    So there are two distinct ways to increase the amount of PageRank

    that page A receives from page T1: increase the PageRank of T1, or

    decrease the links on T1. This is the foundation of PageRank sculpting! For

    a more in-depth look at how this piece of the PageRank function works, be

    sure to check out the illustration that follows.

    + + PR(Tn)/C(Tn) This fancy-looking string of characters basically means

    "plus the PageRank coming from all the other pages that link to you."

  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    13/35

    Accumulating PageRank vs. Distributing PageRankOne of the key features of PageRank is that a page distributes it over its outbound

    links and accumulates it from inbound links. However, when a page "votes" for

    other pages through its outbound links, it isn't giving away its own PageRank. Inother words, if a lot of pages link to page A, then page A is considered to be

    important and the pages page A links to are assumed to be important by

    association. But when page A votes for another page, page A isn't giving away its

    importance--its simply passing it on. A page does not directly lose PageRank by

    linking out.

  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    14/35

    Iterative Calculations and Convergence of PageRankNow that we understand the basic elements of the PageRank function, the next

    question is: how do we use this function to calculate actual PageRank numbers?

    After all, if the PageRank of every page depends on the PageRank of pages that link

    to it, wouldn't this create an infinite loop of calculations? In a way, yes, the

    calculation of PageRank is never-ending, but remember that the function has a

  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    15/35

    damping factor built in. Again, the damping factor makes sure that every time a

    page distributes PageRank to another page, the other page only receives 85% of it.

    In a simple example of two pages linking to each other, they will pass PageRank

    back and forth, indefinitely. Each time Google makes a calculation of the new

    PageRank totals, it is called an iteration. As you can see in the following illustration,each iteration brings the PageRank totals closer and closer to a specific value or

    limit. When the totals for each page stabilize (i.e. they do not change significantly

    after each additional iteration), they are said to have converged. There are certain

    strategies for choosing starting PageRank values that converge quickly (i.e. after

    the fewest number of iterations), but for practical purposes, we can assume that

    Google's stored values of PageRank have all converged and are stable.

  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    16/35

  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    17/35

    In the example above, the first three systems are well-linked, meaning every page

    has at least one inbound link and one outbound link. This allows the PageRank to

    "flow" through all the pages and achieve the maximum system PageRank. (If you

    are puzzled to see a PageRank 11, remember that we are dealing with real

    PageRank values here--not Toolbar PageRank values.)

    The fourth system contains a dangling link, which means it links to a page that

    doesn't link out to anything (kind of like a dead-end). A dangling link prevents the

    system from achieving its maximum total PageRank because it stops the flow of

    PageRank. The gray page in the above example takes PageRank from the green

    page, but it doesn't distribute anything back into the system.

    Site PageRank vs. Page PageRankThe examples above focus on total system PageRank. We assume that a system of

    pages is a single collection of pages that start with an initial PageRank of 0.15 on

    each page. We also assume that the system includes all existing pages, and

    therefore, all possible PageRank is accounted for. In other words, a system of pages

    can be thought of as a website that doesn't link out and doesn't have inbound links.

  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    18/35

    In reality, you are not likely to encounter a site such as that on the Internet (and it

    probably wouldn't be indexed), but its a good way to illustrate the fluctuation of

    page-level PageRank within a single system whose total PageRank remains

    constant. The following sections will briefly describe how certain changes to a

    website can affect page-level PageRank values, as well as the total site PageRank.

    Add Links to Important PagesMany SEOs and developers focus too heavily on inbound links from external sites

    and we neglect the links that we have complete control over--our internal links! It is

    no coincidence that a website's home page usually has the highest PageRank value.

    The high PR is not just from external inbound links. It's also because every page on

    a given website usually links to the home page. However, most websites have

    several important pages that deserve high rankings for relevant terms--not just the

    home page. The following example shows how the PageRank value of a single page

    can be increased by adding more links to it.

    Subtract Links from Unimportant PagesIf you ask any website owner to show you the unimportant pages on his or her site,

    they will probably try to tell you that all their pages are important. Naturally, they

    are using a definition of "importance" that caters to their on-site users, but probably

    not to people who begin at search engines. For the purposes of sculpting PageRank,

    we must use a definition of usefulness that revolves around a search engine,

    specifically Google. Google wants to rank pages that are independentlyvaluable to

    Google's users. In other words, when a user types in a query, they are searching for

    something specific and Google wants to satisfy that user after one click. For

    example, if a user searches for [iphone], Google wants to deliver Apple's page about

    iphones--not the Apple home page or their sitemap or any other page that would

  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    19/35

    require the user to click unnecessarily. So our definition of an important page is one

    that contains unique content that is the most-relevant to one of our keywords. For

    our purposes here, unimportant pages are those which provide little use for search

    engine users, such as Contact pages, About pages, etc.

    The following example shows how removing links from unimportant pages canincrease the PageRank going to important pages.

    Add ContentSEOs often say, "content is king," but what exactly qualifies as "content"? In terms

    of PageRank sculpting, content refers to pages. The more pages you have on your

    site, the higher the maximum total PageRank. Remember, creating new pages also

    creates new PageRank, and if your site is well-linked, then every new page adds a

    PageRank value of 1 to the site's total. (As a side note, don't forget that Google only

    gives PageRank to pages that it knows about and that meet certain criteria of

    quality. In other words, creating a million blank pages isn't going to increase your

    site's total PageRank, because Google isn't going to index a million blank pages.

    When we talk about "adding a page," it is assumed that the new page will be

    valuable and indexable, according to Google.) The example below shows how

    adding supporting content can increase the PageRank of a landing page.

  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    20/35

    Subtract ContentSubtracting content is by far the most challenging PageRank sculpting concept to

    understand, but the following example should help. We have already learned that

    more pages mean more total PageRank, so what would be the benefit of subtracting

    content? To answer this question, we have to have a clear understanding of total

    site PageRank vs. page-level PageRank. Once you realize that Google ranks pages--

    not sites--then you can understand why we would be willing to sacrifice some site-

    level PageRank for the sake of increasing a single page's PageRank.

    When we remove a page from a system, as long as that system remains well-linked,then the total system PageRank will only decrease by 1. This is the same amount of

    PageRank that would be added to a system when we add a new page. In certain

    cases, the page we remove might have a PageRank value that's higher than 1, but

    it makes no difference: The rest of its PageRank will redistribute to the remaining

    pages. As long as there is an increase in PageRank to the remaining important

    pages, the total system decrease is justified.

  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    21/35

    Ideal PageRank Distribution

    Natural Distribution from a Hierarchical StructureThe Google Webmaster Guidelines include the following recommendation when

    creating a hierarchy for a website:

    Make a site with a clear hierarchy and text links. Every page should be

    reachable from at least one static text link.

    A hierarchical data structure is formed when the pages of a website are organized

    by topical categories. The home page typically targets broad, single-word keyword

    phrases, and the landing pages typically target more specific keyword phrases that

    are a subset of the main topic. Here is a simple example:

  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    22/35

    In the example above, the PageRank distribution and the choice of categorization

    are both aligned with the search activity of users. In other words, we are assuming

    that "cats" would be more difficult to rank for than "Persian cats," and therefore,this structure naturally channels more PageRank to the "cats" page. Choosing how

    to categorize the content of your site is not an exact science, but sorting the pages

    by keyword-relevant topics is usually a good place to start.

    Site Architecture DefinitionsIn order to effectively communicate the techniques of PageRank sculpting, we must

    first establish some definitions for related terms. The sections that follow will use

    these definitions:

    Landing Page - when we are talking about PageRank sculpting, a landing page is a

    page that we are trying to get ranked for certain keywords. This is the page that wewant to show up in the results of search engines, and therefore, it is a page that we

    want to focus PageRank on.

    Supporting Page - a supporting page is a page that is somewhat optimized for

    certain keywords, but it isn't the landing page for those keywords. An example of

    this is the "Persian cats" page from the previous illustration. This page would

  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    23/35

    inevitably discuss "cats" in its content, but we wouldn't expect it to rank for "cats."

    It would be the landing page for "Persian cats," but the supporting page for "cats."

    Global Navigation - this refers to the links that appear on every page of a website.

    These are usually located at the top and bottom of a page, but they can also be a

    left navigation or right navigation. Most sites manage the global navigation codethrough some kind of content management system or server includes, so the

    webmaster can change one file and that change will appear on every page of the

    site. Global links (those that appear on every page on a site) are extremely

    important for sculpting PageRank, as they are the quickest way to make significant

    changes to a site's linking structure.

    Secondary Navigation - for the purposes of PageRank sculpting, we will refer to a

    site's secondary navigation as the set of links that can be found on every page in a

    specific section of a website. This is similar to a global navigation, because the code

    is typically managed through a single file that populates every page in a certain

    section of the site. A secondary navigation may not affect every page on a website,but it should still have a considerable impact on PageRank sculpting.

    Depth of Content on a WebsiteWe already discussed the relationship between accumulated PageRank and

    distributed PageRank, in a previous section, but let's review. A page accumulates

    PageRank from its inbound links and distributes PageRank evenly across its

    outbound links. Due to the damping factor, every page only receives 85% of the

    PageRank that was sent to it. As you navigate from your home page to your

    important pages, imagine each new page's PageRank decreasing by 15% after each

    click. Because of this damping effect, it is in your best interest to make sure that

    your important pages are accessible after the fewest number of clicks possible.

    Here is an illustration to help you visualize this concept:

  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    24/35

    Unless the yellow pages in this example contain important content, they should not

    stand between the home page and the important page in this site's linking

    structure.

  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    25/35

    PART II - Tools and Techniques for Sculpting

    PageRank

    Introduction to PageRank SculptingBuilding a website from scratch makes it pretty easy to incorporate best practices

    like topical categories and a hierarchical structure, but most SEOs deal with sites

    that are already up and running. For these sites, we can use various techniques to

    redistribute PageRank to important pages, a process called PageRank sculpting. The

    goal of PageRank sculpting is to increase our current search engine rankings by

    simply altering the crawlable linking structure of our site.

    Natural vs. Unnatural LinkingBefore making any significant changes to a site's navigation structure, an SEO

    should consider how search engines might interpret those changes. Google

    continues to advance their algorithms, but the fact remains that your content is

    being interpreted by a machine and it needs to be machine-readable. However,

    there are certain ways to write code that a web browser understands, but a search

    engine doesn't. An example of this is JavaScript. Web browsers have JavaScript

    interpreters built into them, so they can read the code and create links from it, but

    search engines have a very limited understanding of JavaScript, so chances are they

    won't recognize JavaScript links. This gives us the opportunity to distinguish

    between user navigation and search engine navigation. For instance, we can rewrite

    the code for a link, using JavaScript instead of HTML. This would prevent that page

    from distributing PageRank through that link, but the link itself would still functionfor users.

    We will soon discuss several ways to code links that search engines can't crawl, but

    keep in mind that the only foolproof way to be sure a link isn't getting

    PageRank is to not have it there at all. The more natural your link code is, the

    less you have to worry about search engines crawling links you didn't want them to.

    So yes, we will be making assumptions about what types of links count towards

    PageRank distribution, but we don't know what the future holds. Google is

    constantly making improvements to their ability to crawl JavaScript URLs, forms,

    and even links in Flash files, but we may never fully understand how or if those links

    affect PageRank. The bottom line is: the only way to know for sure that a link isn't

    passing PageRank is to not put it on your page at all.

    Link Level PageRank ControlsIn order to control the flow of PageRank on a website, we must understand how to

    control the following:

  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    26/35

    Which pages accumulate PageRank

    Which pages distribute PageRank

    Which links pass PageRank

    In the next section, we will list the most common options (i.e. tools) for sculpting

    PageRank, and we will discuss how they affect these important details.

    rel="nofollow"This is by far the most popular choice for sculpting PageRank, and for good reasons.

    First, it is officially supported and endorsed by Google as a way to prevent the flow

    of PageRank through links. Second, it is easy to implement: all you need to do is

    add the attribute to any anchor tag to stop the flow of PageRank through that link.

    Third, it allows precise control over PageRank flow, since the nofollow attribute can

    be applied at the single-link level (as opposed to affecting the entire page).

    Google has publicly informed the SEO community that all paid links should include

    the nofollow attribute, so we have good reason to believe that the attribute really

    does block the flow of PageRank. Additionally, Google has said that they do not use

    nofollowed links for discovery, and nofollowed links do not pass anchor text either.

    In other words, we will assume that nofollowed links are essentially invisible to

    Googlebot.

    Here is a simple example of a link, before and after adding the nofollow attribute to

    the HTML code. Both of these links would appear and function the same to a user,

    regardless of whether or not they contain the rel="nofollow" attribute.

    Distributes PageRank:

    SEOmoz

    Does NOT distribute PageRank:

    SEOmoz

    JavaScriptFor a long time, Google completely ignored JavaScript, because it was too difficult or

    too costly to interpret all the scripts on every webpage Google indexed. However,

    rumors from the webmaster community suggest that this may no longer be the

    case. There is a growing body of evidence that Google has developed at least a

    fundamental understanding of JavaScript code, and that it can interpret simple

    functions and parse (i.e. find and extract) URLs and file names, in an attempt to

    discover new pages and new content.

    This is great news for sites that are unknowingly preventing Googlebot from

    indexing their pages, but it is not-so-great for sites that intentionally use JavaScript

  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    27/35

    links to prevent Googlebot from finding or indexing certain pages on the site. At the

    time of writing this document, Google is still telling webmasters that its spiders

    ignore JavaScript entirely. Since there is no way to know for sure, we recommend

    that you avoid using JavaScript as your sole mechanism for controlling Googlebot or

    PageRank. If you do use JavaScript to show your users links that you don't want

    Google to crawl or distribute PageRank to, then here are a couple of tips to keep inmind:

    1. Don't include complete URLs (or the HTML code for links) in your JavaScript

    code. If you do, Google will have a much easier time finding the URLs you

    don't want found.

    2. Externalize as much as possible. Putting your JavaScript code into an external

    .js file is a web design/SEO best practice and it also has the added benefit of

    removing "user-only" links and URLs from your on-page HTML code. This

    means that Google would have to fetch and read your external .js file, if it

    wanted to figure out which of your JavaScript functions insert links onto thepage. As an additional safeguard, you can also disallow Googlebot from

    accessing that external .js file, using your robots.txt file.

    Here are 3 examples of links that rely on JavaScript. Each of the following examples

    represents a complete webpage that contains nothing more than a single link to

    SEOmoz.org. All 3 pages appear exactly the same to users (assuming they have

    JavaScript enabled), but the actual page code itself is different from one example to

    the next. The important sections of code have been highlighted, and each example

    includes a brief explanation.

    Example 1:

    This example is a single page of XHTML code. It doesn't rely on any external files,

    because the JavaScript is coded directly onto the page (between the tags).

    The document.write() method is a built-in function of JavaScript that inserts

    additional code when the page is rendered by a web browser. The additional code in

    this example is highlighted. It is the plain HTML code that creates the link that users

    see. This example shows how easy it would be for Google to recognize that link,

    despite the fact that it's technically part of a JavaScript script. Therefore, we would

    assume that this example would not be an effective way to prevent PageRank from

    flowing through this link.

    Link 1

  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    28/35

    Example 2:

    This example is another single page of XHTML code. Again, it doesn't rely on any

    external files, because the JavaScript is coded directly onto the page. However, this

    example doesn't use tags like Example 1 did. Instead, it assigns JavaScript

    code to the href attribute of a regular HTML link. When a user clicks this link, their

    web browser executes the JavaScript code, instead of taking them to a new URL. Inthis example, the JavaScript code tells the browser to change the current window

    location to SEOmoz.org, so it essentially performs the same basic function that a

    pure-HTML link would. The only difference is this example requires users to have

    JavaScript enabled. So in theory, Google shouldn't recognize this link because it

    requires a JavaScript interpreter, but in reality, chances are that Google would have

    no trouble seeing the URL and understanding the intent of this code. Therefore, we

    would assume that this is not an effective technique for preventing the flow of

    PageRank through this link.

    Link 2

    SEOmoz

    Example 3:

    This example is by far our best option for preventing the flow of PageRank through

    this link. This example is similar to example 2, except that it removes the JavaScript

    code that contains the link URL and places it in an external file, named

    javascript.js. The external file is then disallowed in the robots.txt file. This means

    that the only code Google ever sees is the reference to a function called

  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    29/35

    homePage(), but Google has no way of knowing what that function does unless it can

    access the external JavaScript. Even though this example uses two more files than

    the other examples, it is the only way we can be confident that PageRank is not

    flowing through this link.

    Link 3

    SEOmoz

    External JavaScript (javascript.js):

    function homePage() {location.href = 'http://www.seomoz.org';}

    robots.txt:

    User-agent: *Disallow: javascript.js

    Iframes can be a very convenient way to externalize large portions of HTML code

    into a separate file. In most practical applications of PageRank sculpting, iframes

    are used to display global content and navigations. An iframe element is usually just

    one or two lines of HTML code that allows you to view another webpage's content

    by referencing its URL in the iframe's src attribute. An iframe is basically a window

    that we can embed in our webpage and view another webpage through. We control

    the iframe's dimensions, border, and its ability to use a scrollbar or not, but the

    content that appears (to users) in the iframe "window" can only be changed by

    editing the content of the external page that we referenced in our src attribute.Since the iframe content (and HTML code) is not actually on our webpage, we don't

    have to worry about distributing PageRank through those links. When Google crawls

    our page, all it sees is our iframe element with a src attribute pointing to another

    webpageit doesn't see the links that users see in their browser. Note the following

    example.

    Example - Before:

  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    30/35

    As an example of using an iframe for PageRank sculpting, imagine that your website

    uses a global header navigation that links to a bunch of unimportant pages that are

    wasting PageRank. In this case, you could cut-and-paste the header's HTML code

    into its own separate webpage, and add an iframe element to your original

    webpage where the header code used to be.

    The highlighted div element here shows the links to unimportant pages, as they

    would normally appear in the HTML code:

    Before Iframe

    Contact UsAbout UsPrivacy PolicyView CartCalculate ShippingReturns Policy

    HomeLanding Page 1Landing Page 2Landing Page 3

    Page Heading

    Page content.

    Example - After:

    This is the HTML code after the "badlinks" div has been moved into its own external

    file. Now this page is only distributing PageRank through "goodlinks," but the

    appearance and functionality would not change for users. (The HTML code for the

    newly-created "header.html" page follows.)

  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    31/35

    After Iframe

    HomeLanding Page 1Landing Page 2Landing Page 3

    Page Heading

    Page content.

    Header Page:

    The following HTML code represents the external file that we reference in our iframe

    tag. In other words, this is the content that we view through our iframe "window."

    The HTML code for the "badlinks" div element has basically been cut from the

    original example and pasted into a new page, but with some necessary tweaks

    added to it:

    The meta robots tag has been added to prevent Google from indexing this

    page.

    The body tag has been styled to remove the default spacing that would

    otherwise change the appearance for users.

    Each link now includes the target attribute to maintain the original

    functionality for users.

    Header

    Contact UsAbout UsPrivacy Policy

  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    32/35

    View CartCalculate ShippingReturns Policy

    Now, instead of Google seeing our unimportant links on every page, it only sees the

    iframe src URL. One thing to keep in mind is that Google will still parse the src URL

    from iframe tags, and it will add that URL to its list of pages to crawl. It is uncertain

    whether or not Google would treat the iframe URL reference as a link to the header

    page, but we do know that Google will crawl and index the header page, just as it

    would any other webpage. For this reason, you may want to consider taking some of

    the following precautions, depending on your specific needs:

    Add the rel="nofollow" attribute to the unimportant links in the header

    navigation.

    Add the Meta robots tag to your header page, and set it to "noindex" or

    "none".

    Disallow the header page URL in your robots.txt file.

    FlashSince Google has announced that they have improved their ability to read Flash

    content, using Flash for PageRank sculpting is not the reliable tool it once was.

    Chances are that Google's primary goal concerning Flash files is to find new content

    contained in them, effectively negating Flashs usefulness as a means by which tohide content. Since it is possible to build an entire site using Flash, and then embed

    the Flash file on a single URL, this raises questions about how Google would index

    such a site. Despite the fact that Google continues to improve its understanding of

    Flash content, we will still assume that Flash links do not distribute PageRank.

    Just like JavaScript and Flash, forms are not the foolproof "spider blockers" that they

    once were. Google has announced that they are trying out new methods of crawling

    through forms. Presumably, Googlebot would test various combinations of input

    parameters and analyze the resulting pages for unique content. However, using the

    test URLs for discovery doesn't mean Google distributes PageRank to them. In thisguide, we assume that PageRank does not flow through forms.

    Summary Chart of Link Level PageRank ControlsThe following chart summarizes the functionality of the PageRank controls

    discussed in the previous section. These controls can be used at the link level,

  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    33/35

    meaning you can use them to block the flow of PageRank through specific links on a

    page, while still allowing the remaining links to function as usual.

    Does Googlesee these

    links?

    Does Google usethese links fordiscovering new

    pages?

    Does Google

    distributePageRank

    through theselinks?

    rel="nofollow" yes no noJavaScript maybe maybe no

    Flash maybe maybe no yes yes no

    * no no no*Assuming Google is blocked from the src file

    Page Level PageRank ControlsControlling PageRank distribution at the link level is fairly straightforward: either thelink distributes PageRank or it doesn't. Page-level controls are a bit trickier, because

    we have to consider whether or not a given page accumulates PageRank, in

    addition to whether or not it distributes PageRank.

    Robots Meta TagMost SEOs and webmasters should already be familiar with the robots meta tag. It is

    placed in the section of a webpage, and it tells search engines whether or

    not they can crawl, index, or cache the content of that page. Before we explain how

    this tag can be used to sculpt PageRank, let's define these three search engine

    processes.

    Indexing - this is the process where Google "reads" the content of your

    webpage and transforms it into a representation of content--one that is easily

    sorted in Google's index and processed for search results. The process of

    indexing a webpage includes things like removing HTML code and stop

    words, reducing words to their root (stemming), determining term

    frequencies, and assigning weighted values to certain terms (depending on

    how the terms appeared in the content). In other words, when Google

    indexes a document, it determines the document's keyword relevancy. If you

    include the noindex attribute in your page's robots Meta tag, you are basically

    telling Google not to consider that page relevant to any query, and therefore,

    not to list it in any search results.

    Crawling - this is the process of a search engine identifying links in your

    content and recording them. Google uses this information to discover new

    pages and to calculate PageRank. Adding the nofollow attribute to the robots

  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    34/35

    Meta tag will tell Google to ignore all the links on the page. This has the same

    effect as adding the rel="nofollow" attribute to every link on the page.

    Caching - this is when Google stores a local copy of your page on its own

    computers. This copy is essentially a snapshot of your page's code, from the

    time that Google last saw it. Webmasters who don't want Google to cachetheir page can add the noarchive attribute to the robots Meta tag. Because

    caching is for archival purposes only, we assume that this attribute does not

    have any applications in PageRank sculpting.

    Adding the nofollow attribute to a robots Meta tag would have obvious implications

    for PageRank sculpting. It completely prevents that page from distributing

    PageRank. However, the effect of adding the noindex attribute is not as intuitive.

    Most webmasters would assume that a page must be indexed in order to

    accumulate PageRank, but that isn't entirely true. Preventing Google from indexing

    your page doesn't prevent other pages from linking to it. We have to assume that

    when Google finds links pointing to your page, it records them for calculatingPageRank. Therefore, as far as PageRank sculpting is concerned, the noindex

    attribute does not prevent a page from accumulating PageRank--it only prevents it

    from showing up in the search results.

    robots.txtIn contrast to the noindex tag, exclusion via robots.txt does not prevent a page

    from showing up in Googles search results. The purpose of a websites robots.txt

    file is to block certain user-agents from accessing certain files or directories of your

    site. However, many SEOs and webmasters make the false assumption that

    disallowing a page in the robots.txt file will prevent it from accumulating PageRank.

    The truth is, a page that has been disallowed can still be linked to by other pages,

    and Google is still going to consider those links when calculating PageRank. This

    would create the same result that the robots Meta tag noindex attribute has, aside

    from the fact that a page excluded in a robots.txt file can still appear in rankings.

    301 Permanent RedirectsWe should all be familiar by now with the 301 redirect and its role in preserving

    "link juice." This isn't so much a page-level PageRank control as it is a best practice.

    If you need to remove a page of content from your site for whatever reason, you

    should configure your server to return a 301 response that redirects to the new URL

    location or another page with similar content.

    302 Temporary RedirectsThese types of redirects will not forward the PageRank of the old URL to the new

    URL (like the 301 does). Therefore, we avoid using these for sculpting PageRank.

    302 redirects are useful in a number of situations, but not for the purpose of

    distributing PageRank.

    http://www.google.com/support/webmasters/bin/answer.py?answer=93633http://www.google.com/support/webmasters/bin/answer.py?answer=93633
  • 8/9/2019 The Professionals Guide to Pagerank Optimization

    35/35

    Summary Chart of Page Level PageRank ControlsCan Google

    indexcontent from

    this page?

    Does thispage show up

    in searchresults?

    Does thispage

    accumulatePageRank?

    Does thispage

    distributePageRank?

    Meta robots"noindex"

    no no yes yes

    Meta robots"nofollow"

    yes yes yes no

    Disallowed inrobots.txt

    no yes yes no

    301 redirects no no no no302 redirects no yes yes no

    Conclusion

    We have found that there is a widespread misunderstanding of PageRank: how it isobtained, how it is distributed and how best to structure a website for optimal

    PageRank optimization. This guide has focused quite heavily on the PageRank

    model because there is no good way to effectively optimize something without

    having a good understanding of how it operates.

    Beware of the tendency to think about PageRank in terms of the green bar at the

    bottom of a web browser, as difficult as it can be to forget years of conditioning. By

    developing a good understanding of real PageRank, we are certain that you can

    improve your linking structure for maximum PageRank benefit.

    ResourcesGoogles Webmaster Guidelines

    Robotstxt.org

    The PageRank Citation Ranking: Bringing Order to the Web

    The Anatomy of a Large-Scale Hypertextual Web Search Engine

    http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=35769http://www.robotstxt.org/http://dbpubs.stanford.edu:8090/pub/1999-66http://infolab.stanford.edu/~backrub/google.htmlhttp://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=35769http://www.robotstxt.org/http://dbpubs.stanford.edu:8090/pub/1999-66http://infolab.stanford.edu/~backrub/google.html