demystifying digital humanities: winter 2014 workshop #2: programming on the whiteboard

103
Winter 2014: Session #2 Programming on the Whiteboard (Paige Morgan, Sarah Kremen-Hicks, Brian Gutierrez)

Upload: paige-morgan

Post on 20-Dec-2014

958 views

Category:

Education


3 download

DESCRIPTION

Slides for the second workshop on programming in digital humanities through the University of Washington's Demystifying Digital Humanities project.

TRANSCRIPT

Page 1: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Winter 2014: Session #2Programming on the Whiteboard

(Paige Morgan, Sarah Kremen-Hicks, Brian Gutierrez)

Page 2: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Previously, at DMDH...

•The work of creating usable data

•Forms that this data might take:

•markup language

•spreadsheets

Page 3: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Workshop #2•Caveat Curator (challenges of working with data)

•Programming on the whiteboard, i.e., conceptualizing the specific steps that you need to take to accomplish your goals

Page 4: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Why this focus on data?•Understanding your data, and

your intended actions, is a key skill for working with any programming language or platform.

•This is true whether you are the programmer or whether you are working with professional programmers.

Page 5: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Programming languages are like human languages in that they both have phrases, patterns, and

rules.

Page 6: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Programming languages are unlike human languages in

that they aren’t for

communicating with people.

Page 7: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

They are also unlike human

languages in that every programming utterance does something, i.e.,

causes an action to occur.

Page 8: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

You can get used to patterns – even unfamiliar

ones.

Page 9: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

The shift is in getting used to thinking in

terms of every single action.

Page 10: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Our subject matter today is all actions that you’ll need to think about before you work with...

Page 11: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Image: Josh Lee, @wtrsld, via Twitter, January 2014.

Page 12: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Even when you’re just experimenting, you need to prep your

data.

Page 13: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

You may know your dataset in detail already, from your research -- but your

computer is concerned with

different levels of detail.

Page 14: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Becoming aware of those levels of

detail is not only helpful for your project ideas...

Page 15: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

...it’s also a useful skill for working with programming

languages.(where a stray /> or ; can break your program/website)

Page 16: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Caveat Curator

Page 17: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Data only works if your computer can

read it.

Page 18: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

But my data is just text!

(Isn’t that easy?)

Page 19: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

(Remember, your computer is fairly

stupid).

Page 20: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Formatted text is

often full of text your

computer can’t parse correctly.

Page 21: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

The┘re┘sÜlt ís that yoÜr te┘xt

might come┘ oÜt looking

like┘this

whe┘n yoÜ ope┘n it in a

programming e┘nvironme┘nt.

Page 22: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

So you need to

convert it to plain text.

(without any of the fancy details encoded in MS Word fonts.)

Page 23: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

But even that can produce unexpected

errors.

Page 24: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Maybe you want to work with sailing data and ports of

call:

Page 25: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

The ship you’re interested in leaves the Ivory Coast for

St. Helena...

Page 26: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard
Page 27: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

But when you create your map, you get

this:

Page 28: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard
Page 29: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

The latitude/longitude coordinate is the significant datum.

Page 30: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

The city name is just the human-readable

component.

Page 31: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Each datum needs to be unique.

Page 32: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Figuring out what sort of

unique configuration will work best involves at least some

experimentation.

Page 33: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

To experiment effectively, you’ll want to keep careful

records.

Page 34: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

If you develop categories of

information, you’ll want to keep a

record of what each category means, and what its limits

are.

Page 35: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Cleaning and structuring your

data is a foundation issue that changes, depending on the

available format of your data.

Page 36: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

What if your data is crowdsourced?

Page 37: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

You can require a particular format for

submissions

Page 38: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

You can even put programmatic limits

on the formats available for submission

Page 39: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

But in the end, you’re still going to need to scrub and/or

format.

Page 40: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

This is true even for data from supposedly

reputable sources, like government or

media organizations.

Page 41: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Example: Doctor Who Villains dataset

http://tinyurl.com/doctorwhovillains

Page 42: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

This step is no fun!

Page 43: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

But it’s absolutely necessary.

Page 44: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

What does a baby computer call his father: “data”

Break!

Page 45: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Working with “little data”:

GIS and the Spatial Turn

Page 46: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

GIS technology has paved the way for the analyzing qualitative data associated with cultural experiences

Page 47: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

“A good map is worth a thousand words, cartographers say, and

they are right: because it produces a thousand words: it

raises doubts, ideas. It poses new questions, and forces you

to look for new answers.”

(Moretti 1998, 3–4)

Page 48: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Literary texts are filled with

subjective spatial data: an author or

character's articulation of geographically

located dwellings, urban and rural

landscapes, as well as performance spaces

Page 49: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Project: Mapping William Wordsworth's

Conspicuous Consumption in The

Prelude

(Brian R. Gutierrez)

Page 50: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Objective: to map the visual culture events referenced in Wordsworth’s autobiographical poem The Prelude (as well as the ones not referenced)

Page 51: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Problem to solve: Prove that literary galleries, specifically Joseph Boydell’s “Shakespeare Gallery” shaped the dramaturgical choices in the only play written by Wordsworth. He reads Shakespeare not through a personal copy of the play, but through the visual and performative texts at that time

Page 52: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Data: place-names, indirect references,

and all non-referenced visual cultural events

Page 53: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Access to data: Project Gutenberg, digital archive of British newspapers and periodicals

Page 54: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

What to do with that data?

Map it!!

Page 55: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

First data set:Literary spatial articulations

Page 56: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Wordsworth mentions these following place names and references:

"Oh wonderous power of words, how sweet  they are  / According to the meaning which they bring-- / Vauxhall and Ranelagh, I then had heard / Of your green groves and wilderness of lamps, / Your gorgeous ladies, fairy cataracts, And pageant fireworks"  (119-125) "Half-rural Sadler's Wells" (267)

Page 57: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

First, I need to know what and where these places were in order to identify them as

spatial data

Ex: Vauxhall and Ranelagh

Page 58: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Second, if I'm interested in visual cultural experiences, I need to identify what kind of event occurred there: galley play, etc.

Page 59: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Third, how would I access the data? Answer: place-names in a book are not under any copyright.  

However, if I wanted to include sections from the text when a viewer would click on that place name then I would have to think about copyright, but it's on PG, so that's covered.

Page 60: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Fourth, I would have to locate any indirect reference to visual cultural phenomena.

Ex: Wordsworth mentions two actresses by name Mary Robinson and Sarah Siddons.

Since I cannot map a person, I need to investigate which plays they were in and at which theaters during that moment of his life (it's an autobiography)

Page 61: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Fifth, I need to research what special events were occurring at other places he mentions. For that, I

look to The Times (newspapers) and various

periodicals.

Page 62: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Sixth, because I going to create a

map, using ArcGIS, I need to put my data

in an excel spreadsheet so that it can be read by the

program.

Page 63: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard
Page 64: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

What is the relationship between

the data?

Page 65: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Analyze the qualitative data

Humanist skill=Dhumanist skill

Page 66: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Programming on the whiteboard involves

looking at the categories of

information, and thinking about how they interact.

Page 67: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Categories•Place names

•Poetic lines

•Genre of visual/cultural event

•Spatial data (latitude/longitude)

Page 68: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Return to the source of original data—the

literary text—to examine how the

author is describing these phenomena

Page 69: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Why use ArcGIS?

Page 70: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Benefits of ArcGIS•It allows the overlay of historical

maps

•Trainings were available and accessible (through DHSI and UW courses)

•As a software program, ArcGIS is established enough to be considered robust

•Available through the UW software suite

Page 71: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Disadvantages of ArcGIS•Available only for PCs

• Proprietary file format (even if input data is open-access, the end result is not)

•Available only on an annual subscription model (and prohibitively expensive for scholars without campus-granted access)

Page 72: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

In Franco Moretti’s Atlas of the European

Novel 1800-1900 (1998), he calls for

a “literary geography,”

predicated on the creation of “readerly maps” and the use of

those maps as analytical tools.

Page 73: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Caveats?

The pursuit of mapping data may exclude complex

social spaces (e.g., gender domestic environments)

Page 74: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Caveats?

Cartographical representations should not be

divorced from their primary texts

Page 75: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Project: Visualizing Prosody

(Sarah Kremen-Hicks)

x / |x /|xx / | x / |x /Sir Walter Vivian all a summer's day / x | / x | x / | x / | x /Gave his broad lawns until the set of sun

Page 76: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Marking up a poem for metrical scansion is encoding it with

data.

What can a computer do with that data?

Page 77: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Computers are good at counting things –

like iambs.

Page 78: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Is it possible to predict deviations from a metrical norm based on author or

lyric classification?

Page 79: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Will authors show a tendency for

particular types of metrical

substitution?

Page 80: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Prepping the Data

•For proof of concept, start with one author (Alfred, Lord Tennyson)

•Get Tennyson’s poems from Project Gutenberg

•Hand-mark representative poems for prosody

Page 81: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Programming on the Whiteboard

What should the computer do?

Page 82: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Computer tasks•Count feet per line

•Recognize | as a foot boundary

•Recognize carriage return as a line boundary

•Supply foot boundaries at beginning/end of lines

•Count the number of areas contained within foot boundaries for each line

Page 83: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

These steps involve recognizing each metrical foot as units that contain

particular accentual-syllabic data.

x / |x /|xx / | x / |x /

Sir Walter Vivian all a summer's day

Page 84: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Computer tasks, cont’d.•Identify the most common

number of feet per line

•Supply a report on lines (by number) that deviate

•Calculate rate of deviation/adherence

•Mode = paradigm

Page 85: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

After recognizing the foot as a unit, the

computer can calculate what patterns of data each foot contains.

Page 86: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Computer tasks, cont’d.

•Identify the most common foot type

•Identify markings within foot boundaries

•Compare markings to foot dictionary to identify type

Page 87: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

These tasks identify each line as a unit composed of one or

more feet.

x / |x /|xx / | x / |x /

Sir Walter Vivian all a summer's day

(iambic pentameter with third foot anapestic substitution)

Page 88: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Still more computing tasks!•Identify the most common foot type within a poem

•Supply a report on feet (by line and foot number) that deviate

•Calculate rate of deviation/adherence

•Mode = paradigm

Page 89: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Just as the feet contain patterns, the

lines contain patterns that can be analyzed as well.

Page 90: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Still more computing tasks!•Report on types of deviations arranged by most to least common

•Information should include location (line/foot number), as well as prevalence of substitution type

Page 91: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Deviations and their placement within each line and each poem should display certain patterns

unique to each author (I hope!)

Page 92: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Current status: I’m investigating using the Natural Language Toolkit to tokenize each foot; and to

establish syllables, feet, and lines as a unique hierarchy.

Page 93: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Applicable Values

•Iterative development

•Failure as valuable

•Collaboration

Page 94: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

If you are thinking about your data, and the tasks that you need to accomplish, then it’s easier to determine what sort

of language or platform your project

needs.

Page 95: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

There are countless tutorials, online courses, etc., for

almost any programming language or platform.

(We’re giving you a cheat sheet, too; and http://www.dmdh.org is

your friend. So is Google.)

Page 96: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Learning them can be a slow process,

especially at first.

Page 97: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

However, knowing what tasks you’re working towards makes it

easier to understand the purpose of the

introductory lessons.

Page 98: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

It’s also easy to think about how the first rules you learn for any language or platform might affect

your goals.

Page 99: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

And now, it’s your turn...

Page 100: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

For this activity, we recommend that you pair up, or form

small groups to work together.

Page 101: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Group Activity•What do you need to do with your data?

•What units might that data exist in?

•What categories do you need to create?

•What relationships need to exist between the units and categories?

Page 102: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

Spring Workshops!

•Project Ideation and Development

•April 5th and April 26th (advance registration for DMDH participants at the end of Winter Quarter

Page 103: Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the Whiteboard

DMDH content is developed by Paige Morgan, Sarah Kremen-Hicks, and Brian Gutierrez, with generous support from the Simpson Center

for the Humanities at the University of Washington.

Content is available under a Creative Commons Attribution-NonCommercial 3.0 Unported

License.

Please contact Paige at [email protected] with questions.