mining and mapping places with multiple names
TRANSCRIPT
Mining and mapping places with multiple names
James Butler & Christopher Donaldson
Lancaster University
1901
Corpus of Lake District Literature
1688 1789 1837
• 80 texts, comprising more than 1,500,000 words
• Mixture of canonical and non-canonical literature about the Lake District, mainly from c18 and c19 (78 out of 80 works)
• Mixture of genres, including guidebooks, travelogues, novels, poems, journals, and private letters
34 Texts650K words
22 Texts250K words
22 Texts613K words
Sample sentence collocation: beautiful
‘Again entering the boat, we passed up the channel between Lord’s Island the shore, from whence beautiful prospects are obtained of the majestic form of Skiddaw, with the woods of Castlehead and Cockshot Park in the foreground.’ (Edward Baines, A Companion to the Lakes [1829] 121.)
±5 tokens: No place-names identified
±10 tokens: 2 place-names identified – Lord’s Island & Skiddaw
Within sentence: 4 place-names identified – Lord’s Island, Skiddaw, Castlehead & Cockshot Park.
Average sentence length
Lake District corpus = 29.8 wordsBritish National Corpus (BNC) = 16 words
from C. Grover, et al., ‘Use of the Edinburgh Geoparser for Georeferencing Digitized Historical Collections’, Phil. Trans. R. Soc. A 368 (2010) 3875–89.
Diagram of the Edinburgh Geoparser System
Example of input/output from the Edinburgh Geoparser System
Geo-referenced Data from the Edinburgh Geoparser
Geo-referenced Data, Corrected
Bowness: ‘the curved headland’, from ON bogi/OE boga ‘bow’ and ON nes/OE naess ‘headland’
*Variant Historical Spellings: Bownus, Bawnas, Bonas, Bonus, Boulness
cf. D. Whaley, A Dictionary of Lake District Place Names (Nottingham: English Place-Name Society, 2006), 42.
Some of the common generic gazetteer geo-referenced issues…
Spatial misattribution.
Onomastic misassumptionIncorrect weighting
Just for the items that are found!
An extract of our custom manually-collected gazetteer for the corpus
Unique ID
Topog. Cat.
Primary Name Secondary Names Regional Placement
CONISTON (lake):
Thurstan, Coniston Lake, Coniston Water, Thurston, Conistone, Conistone Lake, Cunnistone Lake, Thurston Lake, Coniston Mere, Lake of Coniston, Conis- ton, Conyngs Tun, Conyngeston, Thorstane's watter, Turstinus.
Geospatial categories chosen for flexibility and degree of universal referential specificity
An extract from the latest iteration of the corpus - allowing referential relationships to be analysed on a whole new level.
Lake, Vale, Specific - Farm, Waterfall