mining and mapping places with multiple names

12
Mining and mapping places with multiple names James Butler & Christopher Donaldson Lancaster University

Upload: lancaster-university-library

Post on 22-Jan-2018

582 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Mining and mapping places with multiple names

Mining and mapping places with multiple names

James Butler & Christopher Donaldson

Lancaster University

Page 2: Mining and mapping places with multiple names

1901

Corpus of Lake District Literature

1688 1789 1837

• 80 texts, comprising more than 1,500,000 words

• Mixture of canonical and non-canonical literature about the Lake District, mainly from c18 and c19 (78 out of 80 works)

• Mixture of genres, including guidebooks, travelogues, novels, poems, journals, and private letters

34 Texts650K words

22 Texts250K words

22 Texts613K words

Page 3: Mining and mapping places with multiple names

Sample sentence collocation: beautiful

‘Again entering the boat, we passed up the channel between Lord’s Island the shore, from whence beautiful prospects are obtained of the majestic form of Skiddaw, with the woods of Castlehead and Cockshot Park in the foreground.’ (Edward Baines, A Companion to the Lakes [1829] 121.)

±5 tokens: No place-names identified

±10 tokens: 2 place-names identified – Lord’s Island & Skiddaw

Within sentence: 4 place-names identified – Lord’s Island, Skiddaw, Castlehead & Cockshot Park.

Average sentence length

Lake District corpus = 29.8 wordsBritish National Corpus (BNC) = 16 words

Page 4: Mining and mapping places with multiple names

from C. Grover, et al., ‘Use of the Edinburgh Geoparser for Georeferencing Digitized Historical Collections’, Phil. Trans. R. Soc. A 368 (2010) 3875–89.

Diagram of the Edinburgh Geoparser System

Page 5: Mining and mapping places with multiple names

Example of input/output from the Edinburgh Geoparser System

Page 6: Mining and mapping places with multiple names

Geo-referenced Data from the Edinburgh Geoparser

Page 7: Mining and mapping places with multiple names

Geo-referenced Data, Corrected

Page 8: Mining and mapping places with multiple names

Bowness: ‘the curved headland’, from ON bogi/OE boga ‘bow’ and ON nes/OE naess ‘headland’

*Variant Historical Spellings: Bownus, Bawnas, Bonas, Bonus, Boulness

cf. D. Whaley, A Dictionary of Lake District Place Names (Nottingham: English Place-Name Society, 2006), 42.

Page 9: Mining and mapping places with multiple names

Some of the common generic gazetteer geo-referenced issues…

Spatial misattribution.

Onomastic misassumptionIncorrect weighting

Just for the items that are found!

Page 10: Mining and mapping places with multiple names

An extract of our custom manually-collected gazetteer for the corpus

Unique ID

Topog. Cat.

Primary Name Secondary Names Regional Placement

CONISTON (lake):

Thurstan, Coniston Lake, Coniston Water, Thurston, Conistone, Conistone Lake, Cunnistone Lake, Thurston Lake, Coniston Mere, Lake of Coniston, Conis- ton, Conyngs Tun, Conyngeston, Thorstane's watter, Turstinus.

Page 11: Mining and mapping places with multiple names

Geospatial categories chosen for flexibility and degree of universal referential specificity

Page 12: Mining and mapping places with multiple names

An extract from the latest iteration of the corpus - allowing referential relationships to be analysed on a whole new level.

Lake, Vale, Specific - Farm, Waterfall