integrating greek and english digital resources sean boisen ([email protected])[email protected] computer...
TRANSCRIPT
Integrating Greek and English Digital Resources
Sean Boisen ([email protected])Computer Assisted Research Section,
S19-108
Slides at: http://semanticbible.org/other/presentations/2007-sbl-integrating/
Outline
• Motivation and thesis• Overview of cross-lingual resources• Cross-lingual semantic mapping and
applications• Conclusions
Motivation
• Expand the utility of existing Greek resources• Open new possibilities for English-oriented
students• Thesis: integrating English resources can
provide additional tools, resources, and insights
Overview of Resources
• ESV English-Greek Reverse Interlinear New Testament
• OpenText.Org Syntactically Analyzed Greek New Testament (“OpenText”)
• Greek-English Lexicon of the New Testament Based on Semantic Domains (“Louw-Nida”)
• WordNet
ESV Reverse Interlinear
• Designed to aid English readers in accessing the Greek NT
• Careful attention to word-level correspondence
Acts 21:1
Reverse Interlinear Information Structure
• Bi-directional lexical mapping• Preserves word order in both languages
And when we had parted from them and set sail
Ὡς δὲ ἐγένετο ἀναχθῆναι ἡμᾶς ἀποσπασθέντας ἀπ’ αὐτῶν
Reverse Interlinear Applications
• Lexicographic distribution– γίνομαι and English translational equivalents– ἔρχω: coming vs. going
• Part-of-speech – εὐθυδρομήσαντες (VPNPMAA) vs. “by a straight
course” (Adj)– Distributional analysis
• Integration of other English resources
Overview: OpenText.org
• Syntactic annotation of the Greek New Testament– Syntactic groups up to the clause level
Acts 21:1
OpenText Applications
• Word-level alignment to ESV Reverse Interlinear enables integration with English tools and resources– Numerous automated English analytical tools are
available– Enables cross-lingual comparison: part-of-speech
distribution, syntactic analysis, etc.
Overview: Louw-Nida
• Domain grouping: – “Object referents” (entities): 1-12– Events: 13-57– “Abstracts”: 58-91 – Discourse referentials: 92– Names of persons and places: 93
• Sub-domains with hypernyms (“is-a”) and hyponyms
Overview: Louw-Nida (2)
• Organized semantically into “meaning entries” – groups of terms with a shared sense that are
semantically distinguishable from others– Meanings within a sub-domain are ordered:
• “those meanings which are treated first tend to be of a more generic nature, while more specific meanings follow” (LN Introduction, p. vi)
• Index of Greek terms to meaning entries• Partial index of English terms to sense groups
Louw-Nida Information Structure
6: Artifacts
6.B: Agriculture & Husbandry
6.C: Fishing 6.D: Binding and Fastening 6.E: Traps, Snares
6.A: General Artifacts
Also:WeaponsBoatsVehiclesFor WritingMoneyFor MusicImagesLightsFurnitureAnd others …
6.23 παγίς“an object used for trapping or snaring, principally of birds”• ‘trap’• ‘snare’
6.24 θήρα “an instrument used for trapping, especially of animals other than birds”• ‘trap’• ‘snare’
6.25 σκάνδαλον “a trap, probably of the type which has a stick which when touched by an animal causes the trap to shut”• ‘trap’
Polysemy in Louw-Nida
• 6978 meanings• Greek index
– 6805 terms– 8428 term-meaning pairs
• English index– 4622 terms– 9586 term-meaning pairs
1
10
100
1000
10000
0 5 10 15 20 25
# of senses
Ter
m c
ount
Greek English
Applications of Louw-Nida
• Semantic concordance• Identifying semantic coherence and lexical
chains• Text similarity assessment• Collocation analysis (O’Donnell 2005)• Challenges
– Shallow hierarchy– Coverage limited to NT only
Overview: WordNet
• Rich hierarchy of English meanings• Organized into synonym sets (synsets)• Additional relationships beyond hypernyms
– Part/whole (holonym/meronym)– Derivational relationships
• On-line browser at http://wordnet.princeton.edu• Using version 3.0
– Python and Natural Language Toolkit (NLTK, http://nltk.org),
WordNet Information Structure
artifact
device
trap
instrumentality, instrument
net
fishnet, fishing net
snare, gin, noose
trap (verb) Related-tobait, decoy, lureHas-part
Mapping Louw-Nida to WordNet
• Extract meaning entries and their hierarchy• Extract the term-to-meaning indexes• Invert the English index to map LN meanings to
English terms• Use the English terms to identify a WordNet
synsets (or cluster) – More refined approach: Use Logos’ disambiguated
annotation of the Greek NT with Louw-Nida data (forthcoming)
– Refine this with mappings from ESV Reverse Interlinear
Application: Lexical Chains ὁ (R) The 33: Communcation
33.F: Speak, Talk
λόγος (N)
saying Saying, expression, locution
Speech, speech communication
Auditory communication
Communication (others)
is 31: Hold a View 31.I: Trust,
Rely 31.87 πιστός
(J) trustworthy: Trustworthy,
trusty
εἰ (T) If τὶς (P) anyone 25: Attitudes and Emotions
25.B: Desire Strongly
25.15 ὀρέγομαι (V)
aspires Aspire, aim, shoot for
Plan, be after Intend, mean, think
Will, wish (others)
to the office of 53: Religious Activities
53.1: Roles and Functions
53.69 ἐπισκοπή (N)
overseer, Overseer, superintendent
Supervisor Superior, higher-up, superordinate
Leader, (others)
he 25: Attitudes and Emotions
25.B: Desire Strongly
25.12 ἐπιθυμέω (V)
desires Desire, want
a 65: Value 65.C:
Good, Bad 65.22 καλός
(J) noble Noble (vs.
ignoble) Nobility, nobleness
42: Work, Do 42.D: Work, Toil
42.42 ἔργον (N)
task. Undertaking, project, task, labor
Work Activity Act, human action, human activity (others)
Application: Semantic Indexing for Search
• Use the more refined WordNet hierarchy to provide a richer search interface
• Example: “addiction”– ESV uses related adjective in 1Tim.3.8, “not addicted to
much wine”– Relevant LN meaning for προσέχω is LN.68.19
• Domain Aspect, Subdomain Continue• Gloss: ‘to continue to give oneself to, to continue to apply oneself
to.’
– “addiction”, “addict”, “addicted” not in the LN English index– Nothing leads directly back to LN.25.A (Desire, Want,
Wish) or LN.25.B (Desire Strongly)
English Applications: Search (2)
• WordNet hierarchy:– Addiction (an abnormally strong craving)
• Craving (an intense desire for some particular thing)– Desire (the feeling that accompanies an unsatisfied
state)
– “crave”, “craving” also not in LN English index• Though two related terms, νοσέω and ὀρέγομαι, also occur
in 1Tim and are translated by ESV as “craving”
• Richer semantic hierarchy “fills in the gaps”– Connects with user interest – Leads back to relevant semantic groups
Integration as a Research Strategy
• General benefits of Greek and English resource integration– Evaluate Greek results against a larger background– Provide the benefits of Greek scholarship to a wider
audience– Extending narrow resources to a broader corpus
Conclusions
• Cross-lingual integration opens up new possibilities
• Valuable data resources for empirically-based analysis
References
• Fellbaum, C., editor (1998). WordNet: An Electronic Lexical Database.
• Louw, J. P. and Nida, E. A., editors (1989). Greek-English Lexicon of the New Testament: Based on Semantic Domains.
• O'Donnell, M. B. (2005). Corpus Linguistics and the Greek of the New Testament.