log file analysis: the most powerful tool in your seo toolkit

25
LOG FILE ANALYSIS The most powerful tool in your SEO toolkit Tom Bennet Consultant, Builtvisible @tomcbennet

Upload: tom-bennet

Post on 05-Dec-2014

2.903 views

Category:

Technology


3 download

DESCRIPTION

Slide deck from Tom Bennet's presentation at Brighton SEO, September 2014. Accompanying guide can be found here: http://builtvisible.com/log-file-analysis/ Image Credits: https://www.flickr.com/photos/nullvalue/4188517246 https://www.flickr.com/photos/small_realm/11189803763/ https://www.flickr.com/photos/florianric/7263382550 http://fotojenix.wordpress.com/2011/07/08/weekly-photo-challenge-old-fashioned/

TRANSCRIPT

Page 1: Log File Analysis: The most powerful tool in your SEO toolkit

LOG FILE ANALYSIS The most powerful tool in your SEO toolkit

Tom Bennet

Consultant, Builtvisible

@tomcbennet

Page 2: Log File Analysis: The most powerful tool in your SEO toolkit
Page 3: Log File Analysis: The most powerful tool in your SEO toolkit

Getting Started

Page 4: Log File Analysis: The most powerful tool in your SEO toolkit

What is a log file? A record of all hits that a server has received – humans and robots.

http://www.brightonseo.com/about/

1. Protocol

2. Host name

3. File name

Host name -> IP Address via DNS -> Connection to Server ->

HTTP Get Request via Protocol for File -> HTML to Browser

Page 5: Log File Analysis: The most powerful tool in your SEO toolkit

They’re not pretty…

Page 6: Log File Analysis: The most powerful tool in your SEO toolkit

…but they’re very powerful.

188.65.114.122 - - [30/Sep/2013:08:07:05 -0400] "GET

/resources/whitepapers/retail-whitepaper/ HTTP/1.1" 200 "-"

"Mozilla/5.0 (compatible; Googlebot/2.1; +

http://www.google.com/bot.html)"

Server IP

Timestamp (date & time)

Method (GET / POST)

Request URI

HTTP status code

User-agent

Page 7: Log File Analysis: The most powerful tool in your SEO toolkit

Log Files & SEO

Page 8: Log File Analysis: The most powerful tool in your SEO toolkit

What is Crawl Budget?

Crawl Budget = The number of URLs crawled on each visit to your site.

Higher Authority = Higher Crawl Budget

Page 9: Log File Analysis: The most powerful tool in your SEO toolkit

Crawl Budget Utilisation http://example.com/thin-product-page-1

http://example.com/category/thin-product-page-1

http://example.com/category/subcategory/thin-product-page-1

http://example.com/category/subcategory/thin-product-page-1?colour=blue

Etc…

Conservation of crawl budget is key.

Page 10: Log File Analysis: The most powerful tool in your SEO toolkit

Working With Logs

Page 11: Log File Analysis: The most powerful tool in your SEO toolkit

Preparing Your Data Extraction: Varies by server. See accompanying guide.

Filter: By Googlebot user-agent, validate the IP range. https://support.google.com/webmasters/answer/80553?hl=en

Tools: Gamut and Splunk are great, but you can’t beat Excel.

Page 12: Log File Analysis: The most powerful tool in your SEO toolkit

Working in Excel 1. Convert .log to .csv

(cool tip: just change the file extension)

Page 13: Log File Analysis: The most powerful tool in your SEO toolkit

Working in Excel 2. Sample size

(60-120k Googlebot requests / rows is a good size)

Page 14: Log File Analysis: The most powerful tool in your SEO toolkit

Working in Excel 3. Text-to-columns

(a space will usually be a suitable delimiter)

Page 15: Log File Analysis: The most powerful tool in your SEO toolkit

Working in Excel 4. Create a table

(Label your columns, sort by timestamp)

Page 16: Log File Analysis: The most powerful tool in your SEO toolkit

Investigate

Page 17: Log File Analysis: The most powerful tool in your SEO toolkit

Most vs Least Crawled

Formula: Use COUNTIF on Request URL.

Tip: Extract top-level category for crawl distribution by site-section.

http://www.brightonseo.com/speakers/person-name/

Page 18: Log File Analysis: The most powerful tool in your SEO toolkit

Crawl Frequency Over Time

Formula: Pivot date against count of requests.

Tip: Segment by site section or by user-agent (G-bot Mobile, Images, Video, etc).

Page 19: Log File Analysis: The most powerful tool in your SEO toolkit

HTTP Response Codes

Formula: Total up HTTP Response Codes.

Tip: Find most common 302s or 404s, filter by code and sort by URL occurrence.

Page 20: Log File Analysis: The most powerful tool in your SEO toolkit
Page 21: Log File Analysis: The most powerful tool in your SEO toolkit

Level Up Robots.txt – Crawl all URLs with Screaming Frog to determine if they are

blocked in robots.txt. Investigate most frequently crawled.

Faceted Nav Issues – Dedupe a list of unique resources, sort by times requested.

Sitemap – Add your sitemap URLs into an Excel table, VLOOKUP against your logs. Which mapped URLs are crawl deficient?

CSS / JS – These resources should be crawlable, but are files unnecessary for render absorbing an inordinate amount of crawl budget?

Page 22: Log File Analysis: The most powerful tool in your SEO toolkit

Top Level Crawl Waste

Formula: Use IF statements to check for every cause of waste.

Page 23: Log File Analysis: The most powerful tool in your SEO toolkit

Crime = Solved

Page 24: Log File Analysis: The most powerful tool in your SEO toolkit

All Brighton SEO attendees will receive the guide via email.

Page 25: Log File Analysis: The most powerful tool in your SEO toolkit

THANKS FOR LISTENING

Get in touch e: [email protected] t: @tomcbennet

Tom Bennet

Consultant, Builtvisible

@tomcbennet