python web interaction

11
Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London Rob Sanderson [email protected] [email protected] ‐ @azaroth42 Digital Library Prototyping Team Los Alamos NaBonal Laboratory, USA http://www.flickr.com/photos/42311564@N00/2355590274/

Upload: robert-sanderson

Post on 11-May-2015

2.733 views

Category:

Technology


2 download

DESCRIPTION

Dev8D presentation showing my top 10 Python libraries for interacting with the web.

TRANSCRIPT

Page 1: Python Web Interaction

Python for Web Interaction Rob Sanderson

Dev8D, Feb 24-27 2010, London

RobSanderson

[email protected][email protected]‐@azaroth42

DigitalLibraryPrototypingTeamLosAlamosNaBonalLaboratory,USA

http://www.flickr.com/photos/42311564@N00/2355590274/

Page 2: Python Web Interaction

Overview

Top 10 Libraries for Web Interaction

•  urllib •  urllib2 •  urlparse •  httplib •  lxml •  rdflib •  json/simplejson •  mod_python, mod_wsgi •  bpython

Python for Web Interaction Rob Sanderson

Dev8D, Feb 24-27 2010, London

Page 3: Python Web Interaction

urllib

>>> import urllib >>> urllib.quote('~azaroth/s?q=http://foo.com/') '%7Eazaroth/s%3Fq%3Dhttp%3A//foo.com/'

>>> urllib.unquote('%7Eazaroth/s%3Fq%3Dhttp%3A//foo.com/') '~azaroth/s?q=http://foo.com/'

>>> fh = urllib.urlopen('http://www.google.com/') >>> html = fh.read() >>> fh.close()

>>> fh.getcode() 200 >>> fh.headers.dict['content-type'] 'text/html; charset=ISO-8859-1'

Python for Web Interaction Rob Sanderson

Dev8D, Feb 24-27 2010, London

Page 4: Python Web Interaction

urllib2

>>> import urllib2 >>> ph = urllib2.ProxyHandler(

{'http' : 'http://proxyout.lanl.gov:8080/'}) >>> opener = urllib2.build_opener(ph) >>> urllib2.install_opener(opener) >>> # From now on, all requests will go through proxy

>>> r = urllib2.Request('http://www.google.com/') >>> r.add_header('Referrer', 'http://www.somewhere.net') >>> fh = urllib2.urlopen(r) >>> html = fh.read() >>> fh.close()

>>> # fh is the same as urllib's for headers/status

Python for Web Interaction Rob Sanderson

Dev8D, Feb 24-27 2010, London

Page 5: Python Web Interaction

urlparse

Python for Web Interaction Rob Sanderson

Dev8D, Feb 24-27 2010, London

>>> import urlparse >>> pr = urlparse.urlparse( 'https://www.google.com/search?q=foo&bar=bz#frag')

>>> pr.scheme 'https' >>> pr.hostname 'www.google.com' >>> pr.path '/search' >>> pr.query 'q=foo&bar=bz' >>> pr.fragment 'frag'

Page 6: Python Web Interaction

httplib

Python for Web Interaction Rob Sanderson

Dev8D, Feb 24-27 2010, London

>>> import httplib >>> cxn = httplib.HTTPConnection('www.google.com') >>> hdrs = {'Accept' : 'application/rdf+xml'} >>> path = "/search?q=some+search+query"

>>> cxn.request("HEAD", path, headers=hdrs) >>> resp = cxn.getresponse()

>>> resp.status 200 >>> resp_hdrs = dict(resp.getheaders()) >>> resp_hdrs['content-type'] # :( 'text/html; charset=ISO-8859-1'

>>> data = resp.read() >>> cxn.close()

Page 7: Python Web Interaction

lxml

Python for Web Interaction Rob Sanderson

Dev8D, Feb 24-27 2010, London

$ easy_install lxml

>>> from lxml import etree >>> et = etree.XML('<a b="B"> A <c>C</c> </a>') >>> et.text ' A ' >>> et.attrib['b'] 'B' >>> for elem in et.iterchildren(): ... print elem <Element c at 16d1ed0>

>>> html = etree.parse(StringIO.StringIO("<html><p>hi"), parser=etree.HTMLParser()) >>> html.xpath('/html/body/p') [<Element p at 16e00f0>]

Page 8: Python Web Interaction

rdflib

Python for Web Interaction Rob Sanderson

Dev8D, Feb 24-27 2010, London

$ easy_install rdflib

>>> import rdflib as rdf >>> inp = rdf.URLInputSource(

'http://xmlns.com/foaf/spec/20100101.rdf') >>> inp2 = rdf.StringInputSource("<a> <b> <c> .") >>> graph = rdf.ConjunctiveGraph() >>> graph.parse(inp)

>>> sparql = "SELECT ?l WHERE {?w rdfs:label ?l . }" >>> res = graph.query(sparql, initNs={'rdfs':rdf.RDFS.RDFSNS})) >>> res.selected[0] rdf.Literal(u'Given name')

>>> nt = graph.serialize(format='nt')

Page 9: Python Web Interaction

json / simplejson

Python for Web Interaction Rob Sanderson

Dev8D, Feb 24-27 2010, London

>>> try: import simplejson as json ... except ImportError: import json

>>> data = {'o' : (True, None, 1.0), "ints" : [1,2,3]} >>> json.dumps(data) '{"o": [true, null, 1.0], "ints": [1, 2, 3]}'

>>> json.dumps(data, separators=(',', ':')) # compact '{"o":[true,null,1.0],"ints":[1,2,3]}'

>>> json.loads('[1,2,"foo",null]') [1, 2, u'foo', None]

Page 10: Python Web Interaction

mod_python, mod_wsgi

Python for Web Interaction Rob Sanderson

Dev8D, Feb 24-27 2010, London

import cgitb from mod_python import apache from mod_python.util import FieldStorage

def handler(req): try: form = FieldStorage(req) # dict-like object for query path = req.uri req.status = 200 req.content_type = "text/plain" req.send_http_header() req.write(path) except: req.content_type = "text/html" cgitb.Hook(file=req).handle() return apache.OK

Page 11: Python Web Interaction

bpython

Python for Web Interaction Rob Sanderson

Dev8D, Feb 24-27 2010, London

$ easy_install bpython $ bpython