relational database access with python ‘sans’ orm
DESCRIPTION
Slides from my PyCon APAC 2012 talk in SingaporeTRANSCRIPT
Relational Database Access with Python ‘sans’ ORM
Mark ReesCTO
Century Software (M) Sdn. Bhd.
Your Current Relational Database Access Style?
# Django ORM>>> from ip2country.models import Ip2Country
>>> Ip2Country.objects.all()[<Ip2Country: Ip2Country object>, <Ip2Country: Ip2Country object>, '...(remaining elements truncated)...']
>>> sgp = Ip2Country.objects.filter(assigned__year=2012)\... .filter(countrycode2='SG')
>>> sgp[0].ipfrom1729580032.0
Your Current Relational Database Access Style?
# SQLAlchemy ORM>>> from sqlalchemy import create_engine, extract>>> from sqlalchemy.orm import sessionmaker>>> from models import Ip2Country
>>> engine = create_engine('postgresql://ip2country_rw:secret@localhost/ip2country')>>> Session = sessionmaker(bind=engine)>>> session = Session()
>>> all_data = session.query(Ip2Country).all()
>>> sgp = session.query(Ip2Country).\... filter(extract('year',Ip2Country.assigned) == 2012).\... filter(Ip2Country.countrycode2 == 'SG')
print sgp[0].ipfrom1729580032.0
SQL Relational Database Access
SELECT * FROM ip2country;
"ipfrom";"ipto";"registry";"assigned";"countrycode2";"countrycode3";"countryname"1729522688;1729523711;"apnic";"2011-08-05";"CN";"CHN";"China"1729523712;1729524735;"apnic";"2011-08-05";"CN";"CHN";"China”. . .
SELECT * FROM ip2countryWHERE date_part('year', assigned) = 2012AND countrycode2 = 'SG';
"ipfrom";"ipto";"registry";"assigned";"countrycode2";"countrycode3";"countryname"1729580032;1729581055;"apnic";"2012-01-16";"SG";"SGP";"Singapore"1729941504;1729942527;"apnic";"2012-01-10";"SG";"SGP";"Singapore”. . .
SELECT ipfrom FROM ip2countryWHERE date_part('year', assigned) = 2012AND countrycode2 = 'SG';
"ipfrom"17295800321729941504. . .
Python + SQL == Python DB-API 2.0
• The Python standard for a consistent interface to relational databases is the Python DB-API (PEP 249)
• The majority of Python database interfaces adhere to this standard
Python DB-API UML Diagram
Python DB-API Connection Object
Access the database via the connection object• Use connect constructor to create a
connection with databaseconn = psycopg2.connect(parameters…)
• Create cursor via the connectioncur = conn.cursor()
• Transaction management (implicit begin)conn.commit()conn.rollback()
• Close connection (will rollback current transaction)
conn.close()• Check module capabilities by globals
psycopg2.apilevel psycopg2.threadsafety psycopg2.paramstyle
Python DB-API Cursor Object
A cursor object is used to represent a database cursor, which is used to manage the context of fetch operations.• Cursors created from the same connection
are not isolatedcur = conn.cursor()cur2 = conn.cursor()
• Cursor methodscur.execute(operation, parameters) cur.executemany(op,seq_of_parameters)cur.fetchone()cur.fetchmany([size=cursor.arraysize])cur.fetchall()cur.close()
Python DB-API Cursor Object
• Optional cursor methodscur.scroll(value[,mode='relative']) cur.next()cur.callproc(procname[,parameters])cur.__iter__()
• Results of an operationcur.descriptioncur.rowcountcur.lastrowid
• DB adaptor specific “proprietary” cursor methods
Python DB-API Parameter Styles
Allows you to keep SQL separate from parameters
Improves performance & security
Warning Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.
From http://initd.org/psycopg/docs/usage.html#query-parameters
Python DB-API Parameter Styles
Global paramstyle gives supported style for the adaptor
qmark Question mark styleWHERE countrycode2 = ?
numeric Numeric positional styleWHERE countrycode2 = :1
named Named styleWHERE countrycode2 = :code
format ANSI C printf format styleWHERE countrycode2 = %s
pyformat Python format style WHERE countrycode2 = %(name)s
Python + SQL: INSERTimport csv, datetime, psycopg2conn = psycopg2.connect("dbname=ip2country user=ip2country_rw password=secret”)cur = conn.cursor()with open("IpToCountry.csv", "rb") as f: reader = csv.reader(f) try: for row in reader: print row if row[0][0] != "#": row[3] = datetime.datetime.utcfromtimestamp(float(row[3])) cur.execute("""INSERT INTO ip2country( ipfrom, ipto, registry, assigned, countrycode2, countrycode3, countryname) VALUES (%s, %s, %s, %s, %s, %s, %s)""", row) except: conn.rollback() else: conn.commit() finally: cur.close() conn.close()
Python + SQL: SELECT# Find ipv4 address ranges assigned to Singaporeimport psycopg2, socket, struct
def num_to_dotted_quad(n): """convert long int to dotted quad string http://code.activestate.com/recipes/66517/""" return socket.inet_ntoa(struct.pack('!L',n))
conn = psycopg2.connect("dbname=ip2country user=ip2country_rw password=secret")
cur = conn.cursor()
cur.execute("""SELECT * FROM ip2country WHERE countrycode2 = 'SG' ORDER BY ipfrom""")
for row in cur: print "%s - %s" % (num_to_dotted_quad(int(row[0])), num_to_dotted_quad(int(row[1])))
SQLite
• sqlite3• CPython 2.5 & 3• DB-API 2.0• Part of CPython distribution since 2.5
PostgreSQL
• psycopg• CPython 2 & 3• DB-API 2.0, level 2 thread safe• Appears to be most popular• http://initd.org/psycopg/
• py-postgresql• CPython 3• DB-API 2.0• Written in Python with optional C
optimizations• pg_python - console• http://python.projects.postgresql.org/
PostgreSQL
• PyGreSQL• CPython 2.3+• Classic & DB-API 2.0 interfaces• http://www.pygresql.org/• Last release 2009
• pyPgSQL• CPython 2• Classic & DB-API 2.0 interfaces• http://www.pygresql.org/• Last release 2006
PostgreSQL
• pypq• CPython 2.7 & pypy 1.7+• Uses ctypes• DB-API 2.0 interface• psycopg2-like extension API• https://bitbucket.org/descent/pypq
• psycopg2ct• CPython 2.6+ & pypy 1.6+• Uses ctypes• DB-API 2.0 interface• psycopg2 compat layer • http://github.com/mvantellingen/
psycopg2-ctypes
MySQL
• MySQL-python• CPython 2.3+• DB-API 2.0 interface• http://sourceforge.net/projects/mysql-
python/• PyMySQL• CPython 2.4+ & 3• Pure Python DB-API 2.0 interface• http://www.pymysql.org/
• MySQL-Connector• CPython 2.4+ & 3• Pure Python DB-API 2.0 interface• https://launchpad.net/myconnpy
Other “Enterprise” Databases
• cx_Oracle• CPython 2 & 3• DB-API 2.0 interface• http://cx-oracle.sourceforge.net/
• informixda• CPython 2• DB-API 2.0 interface• http://informixdb.sourceforge.net/• Last release 2007
• Ibm-db• CPython 2• DB-API 2.0 for DB2 & Informix• http://code.google.com/p/ibm-db/
ODBC
• mxODBC• CPython 2.3+• DB-API 2.0 interfaces• http://www.egenix.com/products/pytho
n/mxODBC/doc
• Commercial product
• PyODBC• CPython 2 & 3• DB-API 2.0 interfaces with extensions• http://code.google.com/p/pyodbc/
• ODBC interfaces not limited to Windows thanks to iODBC and unixODBC
Jython + SQL
• zxJDBC• DB-API 2.0 Written in Java using JDBC
API so can utilize JDBC drivers• Support for connection pools and JNDI
lookup• Included with standard Jython
installation http://www.jython.org/• jyjdbc• DB-API 2.0 compliant• Written in Python/Jython so can utilize
JDBC drivers• Decimal data type support• http://code.google.com/p/jyjdbc/
IronPython + SQL
• adodbapi• IronPython 2+• Also works with CPython 2.3+ with
pywin32• http://adodbapi.sourceforge.net/
Gerald, the half a schema
import geralds1 = gerald.PostgresSchema(’public', 'postgres://ip2country_rw:secret@localhost/ip2country')s2 = gerald.PostgresSchema(’public', 'postgres://ip2country_rw:secret@localhost/ip2countryv4')
print s1.schema['ip2country'].compare(s2.schema['ip2country'])DIFF: Definition of assigned is differentDIFF: Column countryname not in ip2countryDIFF: Definition of registry is differentDIFF: Column countrycode3 not in ip2countryDIFF: Definition of countrycode2 is different
• Database schema toolkit• via DB-API currently supports• PostgreSQL• MySQL• Oracle
• http://halfcooked.com/code/gerald/
SQLPython
$ sqlpython --postgresql ip2country ip2country_rwPassword: 0:ip2country_rw@ip2country> select * from ip2country where countrycode2='SG';...1728830464.0 1728830719.0 apnic 2011-11-02 SG SGP Singapore 551 rows selected.0:ip2country_rw@ip2country> select * from ip2country where countrycode2='SG'\j[...{"ipfrom": 1728830464.0, "ipto": 1728830719.0, "registry": "apnic”,"assigned": "2011-11-02", "countrycode2": "SG", "countrycode3": "SGP", "countryname": "Singapore"}]
• A command-line interface to relational databases• via DB-API currently supports• PostgreSQL• MySQL• Oracle
• http://packages.python.org/sqlpython/
SQLPython, batteries included0:ip2country_rw@ip2country> select * from ip2country where countrycode2 ='SG’;...1728830464.0 1728830719.0 apnic 2011-11-02 SG SGP Singapore 551 rows selected.0:ip2country_rw@ip2country> pyPython 2.6.6 (r266:84292, May 20 2011, 16:42:25) [GCC 4.4.5 20110214 (Red Hat 4.4.5-6)] on linux2
py <command>: Executes a Python command. py: Enters interactive Python mode. End with `Ctrl-D` (Unix) / `Ctrl-Z` (Windows), `quit()`, 'exit()`. Past SELECT results are exposed as list `r`; most recent resultset is `r[-1]`. SQL bind, substitution variables are exposed as `binds`, `substs`. Run python code from external files with ``run("filename.py")`` >>> r[-1][-1](1728830464.0, 1728830719.0, 'apnic', datetime.date(2011, 11, 2), 'SG', 'SGP', 'Singapore')>>> import socket, struct>>> def num_to_dotted_quad(n):... return socket.inet_ntoa(struct.pack('!L',n))...>>> num_to_dotted_quad(int(r[-1][-1].ipfrom))'103.11.220.0'
SpringPython – Database Templates# Find ipv4 address ranges assigned to Singapore# using SpringPython DatabaseTemplate & DictionaryRowMapper
from springpython.database.core import *from springpython.database.factory import * conn_factory = PgdbConnectionFactory( user="ip2country_rw", password="secret", host="localhost", database="ip2country")dt = DatabaseTemplate(conn_factory)
results = dt.query( "SELECT * FROM ip2country WHERE countrycode2=%s", ("SG",), DictionaryRowMapper())
for row in results: print "%s - %s" % (num_to_dotted_quad(int(row['ipfrom'])), num_to_dotted_quad(int(row['ipto'])))
DB-API 2.0 PEP http://www.python.org/dev/peps/pep-0249/
Travis Spencer’s DB-API UML Diagram http://travisspencer.com/
Andrew Kuchling's introduction to the DB-API http://www.amk.ca/python/writing/DB-API.html
Attributions
Andy Todd’s OSDC paper http://halfcooked.com/presentations/osdc2006/python_databases.html
Source of csv data used in examples from WebNet77 licensed under GPLv3 http://software77.net/geo-ip/
Attributions
Mark Reesmark at centurysoftware dot com dot my
+Mark Rees@hexdump42
hex-dump.blogspot.com
Contact Details