mnh csv python
DESCRIPTION
PyGotham 09:45 AM - 10:45 AM on August 17, 2014. If you're new to Python, you might find that you're using Python as if it were C. This talk will demonstrate how to take advantage of Python's special data structures to build tools for analyzing and creating nicely-formatted reports from CSV data.TRANSCRIPT
![Page 1: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/1.jpg)
Building flexible tools to store sums and report on CSV data
Presented by
Margery Harrison
Audience level: Novice09:45 AM - 10:45 AM
August 17, 2014Room 704
![Page 2: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/2.jpg)
Python Flexibility
● Basic, Fortran, C, Pascal, Javascript,...● At some point, there's a tendency to think
the same way, and just translate it● You can write Python as if it were C● Or you can take advantage of Python's
special data structures.● The second option is a lot more fun.
![Page 3: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/3.jpg)
Using Python data structures to report on CSV data
● Lists● Sets● Tuples● Dictionaries● CSV Reader
● DictReader● Counter
![Page 4: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/4.jpg)
Also,
● Using tuples as dictionary keys● Using enumerate() to count how many
times you've looped– See “Loop like a Native”
http://nedbatchelder.com/text/iter.html
![Page 5: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/5.jpg)
Code Development Method
● Start with simplest possible version● Test and validate● Iterative improvements
– Make it prettier
– Make it do more
– Make it more general
![Page 6: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/6.jpg)
This is a CSV file
color,size,shape,number
red,big,square,3
blue,big,triangle,5
green,small,square,2
blue,small,triangle,1
red,big,square,7
blue,small,triangle,3
![Page 7: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/7.jpg)
https://c1.staticflickr.com/3/2201/2469586703_cfdaf88195.jpg
![Page 8: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/8.jpg)
http://i239.photobucket.com/albums/ff263/peacelovebones/two-pandas-rolling-1.jpg
![Page 9: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/9.jpg)
CSV DictReader
>>> import csv
>>> import os
>>> with open("simpleCSV.txt") as f:
... r=csv.DictReader(f)
... for row in r:
... print row
...
![Page 10: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/10.jpg)
Running DictReader
![Page 11: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/11.jpg)
DictReader is sequential
![Page 12: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/12.jpg)
Tabulate All Possible Values
![Page 13: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/13.jpg)
How many of each?
● It's nice to have a listing that shows the variety of objects that can appear in each column.
● Next, we'd like to count how many of each● And guess what? Python has a special data
structure for that.
![Page 14: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/14.jpg)
collections.Counter
![Page 15: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/15.jpg)
Playing with Counters
![Page 16: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/16.jpg)
Index into Counters
![Page 17: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/17.jpg)
Counter + DictReader
Let's use counters to tell us how many of each value was in each column.
![Page 18: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/18.jpg)
Print number of each value
![Page 19: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/19.jpg)
Output
colorblue : 3green : 1red : 2
shapesquare : 3triangle: 3
number1 : 13 : 22 : 15 : 17 : 1
sizesmall : 3big : 3
![Page 20: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/20.jpg)
You might ask, why not this?
for row in r: for head in r.fieldnames: field_value = row[head] possible_values[head].add(field_value) #count_of_values.update(row[head]) count_of_values.update(field_value) print count_of_values
![Page 21: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/21.jpg)
Because
Counter({'e': 13, 'l': 12, 'a': 9, 'r': 9, 'g': 7, 'b': 6, 'i': 6, 's': 6, 'u': 6, 'n': 4, 'm': 3, 'q': 3, 't': 3, 'd': 2, '3': 2, '1': 1, '2': 1, '7': 1, '5': 1})
color
blue : 0
green : 0
red : 0
shapesquare : 0triangle: 0
number1 : 13 : 22 : 15 : 17 : 1
sizesmall : 0big : 0
![Page 22: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/22.jpg)
Output
colorblue : 3green : 1red : 2
shapesquare : 3triangle: 3
number1 : 13 : 22 : 15 : 17 : 1
sizesmall : 3big : 3
![Page 23: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/23.jpg)
How many red squares?
● We can use tuples as an index into the counter
– (red,square)
– (big,red,square)
– (small,blue,triangle)
– (small,square)
![Page 24: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/24.jpg)
Let's use a simpler CSV
color,size,shape
red,big,square
blue,big,triangle
green,small,square
blue,small,triangle
red,big,square
blue,small,triangle
![Page 25: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/25.jpg)
Counting Tuplestrying to use magic update()
>>> c=collections.Counter([('a,b'),('c,d,e')])>>> cCounter({'a,b': 1, 'c,d,e': 1})>>> c.update(('a','b'))>>> cCounter({'a': 1, 'b': 1, 'a,b': 1, 'c,d,e': 1})>>> c.update((('a','b'),))>>> cCounter({'a': 1, ('a', 'b'): 1, 'b': 1, 'a,b': 1, 'c,d,e': 1})
![Page 26: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/26.jpg)
Oh well>>> c.update([(('a','b'),)])>>> cCounter({'a': 2, 'b': 2, (('a', 'b'),): 1, 'c,d,e': 1, 'a,b': 1, ('a', 'b'): 1})>>> c[('a','b')]1>>> c[('a','b')]+=5>>> cCounter({('a', 'b'): 6, 'a': 2, 'b': 2, (('a', 'b'),): 1, 'c,d,e': 1, 'a,b': 1})
![Page 27: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/27.jpg)
Combo Count Part 1: Initialize
![Page 28: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/28.jpg)
Combo Count 2: Counting
![Page 29: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/29.jpg)
Combo Count 3: Printing
![Page 30: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/30.jpg)
Combo Count Outputcolorblue : 33 blue in 1 combinations:('blue', 'big'): 1('blue', 'small'): 23 blue in 2 combinations:('blue', 'big', 'triangle'): 1('blue', 'small', 'triangle'): 2green : 11 green in 1 combinations:('green', 'small'): 11 green in 2 combinations:('green', 'small', 'square'): 1red : 22 red in 1 combinations:('red', 'big'): 22 red in 2 combinations:('red', 'big', 'square'): 2
shapesquare : 33 square in 1 combinations:3 square in 2 combinations:('red', 'big', 'square'): 2('green', 'small', 'square'): 1triangle: 33 triangle in 1 combinations:3 triangle in 2 combinations:('blue', 'big', 'triangle'): 1('blue', 'small', 'triangle'): 2sizesmall : 33 small in 1 combinations:('blue', 'small'): 2('green', 'small'): 13 small in 2 combinations:('green', 'small', 'square'): 1('blue', 'small', 'triangle'): 2
big : 33 big in 1 combinations:('blue', 'big'): 1('red', 'big'): 23 big in 2 combinations:('red', 'big', 'square'): 2('blue', 'big', 'triangle'): 1
![Page 31: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/31.jpg)
Well, that's ugly
● We need to make it prettier● We need to write out to a file● We need to break things up into Classes
![Page 32: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/32.jpg)
Printing Combination Levels
Number of Squares
Number of Red Squares
Number of Blue Squares
Number of Triangles
Number of Red Triangles
Number of Blue Triangles
Total Red
Total Blue
![Page 33: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/33.jpg)
Indentation per level
● If we're indexing by tuple, then the indentation level could correspond to the number of items in the tuple.
● Let's have general methods to format the indentation level, given the number of items in the tuple, or input 'level' integer
![Page 34: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/34.jpg)
A class write_indent() methodIf part of class with counter and msgs dict,
just pass in the tuple:
def write_indent(self, tup_index):''' :param tup_index: tuple index into counter''' indent = ' ' * len(tup_index) msg = self.msgs[tup_index] sum = self.counts[tup_index] indented_msg = ('{0:s}{1:s}'.format( indent, msg, sum) return indented_msg
![Page 35: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/35.jpg)
class-less indent_message()
def indent_message(level, msg, sum,\
space_per_indent=2, space=' '):
num_spaces = self.space_per_indent * level
indent = space * num_spaces
# We'll want to tune the formatting..
indented_msg = ('{0:s}{1:s}:{2:d}'.format(
indent, msg, sum)
return indented_msg
![Page 36: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/36.jpg)
Adjustable field widths
Depending on data, we'll want different field widths
red squares 5
Blue squares 21
Large Red Squares in the Bronx 987654321
![Page 37: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/37.jpg)
Using format to format a format string
>>> f='{{0:{0:d}s}}'.format(3)
>>> f
'{0:3s}'
>>> f='{{0:{0:d}s}}{{1:{1:d}d}}'.format(3,5)
>>> f
'{0:3s}{1:5d}'
>>> f='{{0:s}}{{1:{0:d}s}}{{2:{1:d}d}}'.format(3,5)
>>> f
'{0:s}{1:3s}{2:5d}'
![Page 38: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/38.jpg)
Format 3 values
● Our formatting string will print 3 values:– String of space chars: {0:s}
– Message: {1:[msg_width]s}
– Sum: Right justified {2:-[sum_width]d}
![Page 39: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/39.jpg)
Class For Flexible Indentation
![Page 40: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/40.jpg)
Flexible Indent Class Variables
![Page 41: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/41.jpg)
Flexible Indent Method
![Page 42: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/42.jpg)
Testing IndentMessages class
![Page 43: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/43.jpg)
SimpleCSVReporter
● Open a CSV File● Create
– Set of possible values
– Set of possible tuples
– Counter indexed by each value & tuple
● Use IndentMessages to format output lines
![Page 44: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/44.jpg)
SimpleCSVReporter class vars
![Page 45: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/45.jpg)
readCSV() beginsinitialize sets..
![Page 46: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/46.jpg)
readCSV() continued: Loop to collect & sum
![Page 47: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/47.jpg)
Write to Report File
![Page 48: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/48.jpg)
Using recursion for limitless indentation
![Page 49: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/49.jpg)
Recursive print sub-levels
![Page 50: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/50.jpg)
Word transform stubs
![Page 51: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/51.jpg)
General method to test
![Page 52: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/52.jpg)
Test with simpler CSV
![Page 53: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/53.jpg)
Output for simpler CSV
![Page 54: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/54.jpg)
A bigger CSV file
"CCN","REPORTDATETIME","SHIFT","OFFENSE","METHOD","BLOCKSITEADDRESS","WARD","ANC","DISTRICT","PSA","NEIGHBORHOODCLUSTER","BUSINESSIMPROVEMENTDISTRICT","VOTING_PRECINCT","START_DATE","END_DATE"
4104147,"4/16/2013 12:00:00 AM","MIDNIGHT","HOMICIDE","KNIFE","1500 - 1599 BLOCK OF 1ST STREET SW",6,"6D","FIRST",105,9,,"Precinct 127","7/27/2004 8:30:00 PM","7/27/2004 8:30:00 PM"
5047867,"6/5/2013 12:00:00 AM","MIDNIGHT","SEX ABUSE","KNIFE","6500 - 6599 BLOCK OF PINEY BRANCH ROAD NW",4,"4B","FOURTH",402,17,,"Precinct 59","4/15/2005 12:30:00 PM",
● From http://data.octo.dc.gov/
![Page 55: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/55.jpg)
Deleted all but 4 columns
"SHIFT","OFFENSE","METHOD","DISTRICT"
"MIDNIGHT","HOMICIDE","KNIFE","FIRST"
"MIDNIGHT","SEX ABUSE","KNIFE","FOURTH"
...
"DAY","THEFT/OTHER","OTHERS","SECOND"
"MIDNIGHT","SEX ABUSE","OTHERS","THIRD"
"MIDNIGHT","SEX ABUSE","OTHERS","THIRD"
"EVENING","BURGLARY","OTHERS","FIFTH"
...
![Page 56: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/56.jpg)
Method to run crime report
![Page 57: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/57.jpg)
Output - top
![Page 58: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/58.jpg)
Output - bottom
![Page 59: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/59.jpg)
Improvements
● Allow user-specified order for values, e.g. FIRST, SECOND, THIRD
● Other means of tabulating● Keeping track of blank values● Summing counts in columns● ...
![Page 60: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/60.jpg)
https://c1.staticflickr.com/3/2201/2469586703_cfdaf88195.jpg
![Page 61: Mnh csv python](https://reader033.vdocuments.site/reader033/viewer/2022052400/559b1c311a28ab47128b47a7/html5/thumbnails/61.jpg)
LinksThis talk: http://www.slideshare.net/pargery/mnh-csv-python
● https://github.com/pargery/csv_utils2
● Also some notes in http://margerytech.blogspot.com/
Info on Data Structures
● http://rhodesmill.org/brandon/slides/2014-04-pycon/data-structures/
● http://nedbatchelder.com/text/iter.html
DC crime stats
● http://data.octo.dc.gov/“The data made available here has been modified for use from its original source, which is the Government of the District of Columbia. Neither the District of Columbia Government nor the Office of the Chief Technology Officer (OCTO) makes any claims as to the completeness, accuracy or content of any data contained in this application; makes any representation of any kind, including, but not limited to, warranty of the accuracy or fitness for a particular use; nor are any such warranties to be implied or inferred with respect to the information or data furnished herein. The data is subject to change as modifications and updates are complete. It is understood that the information contained in the web feed is being used at one's own risk."