Download - Getting started with Scrapy in Python
Web Scraping with ScrapyVirendra Rajput
Hacker @Markitty
Agenda
● What is web scraping and why it's fun● My experiments with web scraping● Getting started with Scrapy● How Scrapy works and a quick Demo ● Why Scrapy● Questions
What is Web Scraping?
● Extracting information from websites● Problem:
○ Static websites ○ No access to APIs to extract the data you
need○ Need to extract data periodically
● Manual solution - go to the website and copy the required data
● Smarter solution: Web Scraping
My Experiments with Scraping
Web Scraping in Python
● Download webpage with urllib2, requests
● Parse the page with BeautifulSoup/lxml
● Select with XPath or css selectors
Scrapy - fast high Level Screen Scraping and web crawling Framework● Pick a website● Define the data you want to scrape● Write the spider to extract the data● Run the spider ● Store the Data
Demo
Why Scrapy
● Simplicity● Fast● Productive/ Extensible● Portable● Well docs & Healthy community● Commercial Support
Advanced Features (built in)
● Interactive shell for trying XPaths (useful for debugging)
● selecting and extracting data from html sources
● cleaning and sanitizing the scraped data● generating feed exports (JSON, CSV)● media pipeline for downloading stuff● Middlewares for (cookies, HTTP
compression, cache, user-agent spoofing, etc)
questions?