introduction to humanities computing spring 1999 lecture six

25
Introduction to Humanities Computing Spring 1999 Lecture Six

Upload: felix-gordon

Post on 13-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Humanities Computing Spring 1999 Lecture Six

Introduction to Humanities Computing

Spring 1999

Lecture Six

Page 2: Introduction to Humanities Computing Spring 1999 Lecture Six

Passport to Tour

What is the importantConcept from

Computer Confluence Chapter 6 Devoted to spread sheetsWhich is transferable to The world of text analysis

?

Malleable Matrix

Page 3: Introduction to Humanities Computing Spring 1999 Lecture Six

A tour

Dartmouth Dante Database Project DDP is still best accessed via Telnet. The address remains:

library.Dartmouth.EDU

at the prompt type

connect dante

Page 4: Introduction to Humanities Computing Spring 1999 Lecture Six

What is an electronic text?

Can you provide examples?

What type of electronic text will survive?

Page 5: Introduction to Humanities Computing Spring 1999 Lecture Six

What is a electronic text?

Any string of characters Any file or document that can be read A word processing file A text file

Page 6: Introduction to Humanities Computing Spring 1999 Lecture Six

Types of Electronic Texts

Literary text Linguistic corpus Hypermedia work

Page 7: Introduction to Humanities Computing Spring 1999 Lecture Six

A variety of forms

WWW site (Hypermedia)

Myst, Macbeth (Software, Text, and Media)

MS Word formatted file (Wordprocessing)

ASCII Text file (aka “Flat File”)

Page 8: Introduction to Humanities Computing Spring 1999 Lecture Six

ASCII

American Standard Code for Information Interchange

0 Null

3 EOT (end of text)

13 CR (carriage return)

32 SP (space)

48 0

49 1

65 A

97 a, 98 b, 99 c ...

Page 9: Introduction to Humanities Computing Spring 1999 Lecture Six

Coding Standards ASCII = 7 bits per character

128 characters - 7 bits per character32 Reserved for printing infoRest for printing characters

Extended ASCII = 8 bits or 1 byte, 256 characters, upper ASCII characters used for

special characters, characters with diacritical marks & ligatures

UNICODE = 16 bit character set65,000 characters - most known languages

Page 10: Introduction to Humanities Computing Spring 1999 Lecture Six

Why?

Cross-Platform Long-term survival of data You can use it to encode more complex

documents using markup (SGML)

ASCII Text + Markup =

Electronic Representation of Literary Text

Page 11: Introduction to Humanities Computing Spring 1999 Lecture Six

Encoding <html>

<Head><Title>Welcome</Title></Head>

<Body><H1>Welcome to 3F03</h1>

This is the home page for 3F03<P>

<B>Quantitative Methods in the Humanities

</B> Fran&ccedil;ais

</Body></html>

In HTML all formatting provided by codes using ASCII characters

Page 12: Introduction to Humanities Computing Spring 1999 Lecture Six

Content Model

Text

Head Body

Title Heading Paragraph

Page 13: Introduction to Humanities Computing Spring 1999 Lecture Six

Limits of HTML

No codes for many of the features: Character, Author, Text type, Sonnet, Lines

Text analysis software can’t handle it

Languages other than English

Page 14: Introduction to Humanities Computing Spring 1999 Lecture Six

COCOA Markup Continuous Tags

Do note require closing </tag> - change value Format:

<variable value>

(angled brackets < > are delimiters) Example

<speaker Romeo><scene 1><L 1><text-type frontmatter><<Comments not meant to be indexed>>

Page 15: Introduction to Humanities Computing Spring 1999 Lecture Six

COCOA example<Title Misunderstanding>

<<Example for Demonstration, 1997>>

<t titlepage>THE MISUNDERSTANDING

A PLAY IN THREE ACTS

<t dedication>To my friends of the THEATRE DE L'EQUIPE

<t characters>CHARACTERS IN THE PLAY:

THE OLD MANSERVANT ...

MARIA

<t information>LE MALENTENDU (THE MISUNDERSTANDING) was presented for the first time at the Theatre des Mathurins, Paris, in 1944

Page 16: Introduction to Humanities Computing Spring 1999 Lecture Six

Example continued

<act 1>

<t stagedir>Noon. The clean, brightly lit public room of an inn. Everything is very spick and span.

<t play>

<p mother>He'll come back.

<p martha>Did he tell you so?

Page 17: Introduction to Humanities Computing Spring 1999 Lecture Six

Brief HistoryText Analysis Tools

Text-analysis tools grew out of concordances:1247, Concordance to the Vulgate Bible, Paris

1949, Father Busa Index Thomisticus

1970s, Batch Concordancers like OCP

1989, TACT - Interactive Concordancers

1990s, Textual Visualization

Page 18: Introduction to Humanities Computing Spring 1999 Lecture Six

What can be done...

Text-analysis tools provide Speed

Complex Searches

Reconfigured Views

Statistics

Researchers can generate custom concordances

interactively

Page 19: Introduction to Humanities Computing Spring 1999 Lecture Six

Concordances and Interpretation

Concordances provide an alternative arrangement of the text that brings passages together into a concordantia.

Interpretative strategy where answers are drawn from the text by assembling passages on the subject in question and reading this rearranged text as a meaningful whole.

Concordance facilitates this rearrangement providing alternative views.

Page 20: Introduction to Humanities Computing Spring 1999 Lecture Six

Types of Text-Analysis Stylistic

Describing author’s style and comparing itAuthorship studies

LinguisticCreate representative corpusDescribe linguistic use (diachronic or synchronic)

Thematic Finding patterns (words & phrases) in a textFollowing themes through a workComparing themes

Demands a reiterative reading

Page 21: Introduction to Humanities Computing Spring 1999 Lecture Six

Problematic equations

That a theme is the passages where a set of words appearCan themes be identified by key words?What about ambiguous words?

That concording passages into a new text is an acceptable interpretative strategyWhere does the passage start and end around a word?Is reading a rearranged text appropriate?

That the distribution of words indicates the progress of a themeDo the number of hits indicate intensity of theme?

Page 22: Introduction to Humanities Computing Spring 1999 Lecture Six

What’s the connection

Interpretation (Understanding)

Surface Measurement (Quantification)

Page 23: Introduction to Humanities Computing Spring 1999 Lecture Six

Two Views

Text-analysis

is about proving things about texts

Stylistic analysis provides reproducible descriptions of authors style

Measurement of surface features allows us to prove more interesting points

Reaction to impressionistic reader oriented literary theory

Text-analysis

is the rereading a text in ways that help one better understand it

Text-analysis is only one of many strategies

Text-analysis reveals anomalies to be researched

Text-analysis is useful precisely because the computer can’t do well what human readers do well, and can do other things well

Page 24: Introduction to Humanities Computing Spring 1999 Lecture Six

E-Text Research Project

Planning

Prototyping Scanningor Buying

Proofing

TraditionalResearch

Markup

Interactive Study

Planning Phase

Implementation Phase

Research Phase

Publication

Page 25: Introduction to Humanities Computing Spring 1999 Lecture Six

Obtaining an E-text

Acquire one from someone else.Oxford Text ArchiveSearch the Internet using WWWCommercial Vendors

Create it yourselfScan it using OCR software

OCR = Optical Character RecognitionType it in or hire services for inputMarkupValidate