bottom-up dialectometry using the geoling package...bottom-up dialectometry using the geoling...

49
Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker Schmidt Methods in Dialectology XV, Groningen Friday, August 15, 2014

Upload: others

Post on 01-Mar-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

Bottom-up dialectometry using the GeoLing package

Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker Schmidt

Methods in Dialectology XV, Groningen Friday, August 15, 2014

Page 2: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

A statistical software package for geolinguistic data

• developed in cooperation by statisticians (Ulm University) and dialectologists (Universities of Augsburg and Salzburg)

• funded by the Deutsche Forschungsgemeinschaft (DFG)

• multi-platform (written in Java)

• open source (GPLv3)

• tried and tested with data from the Sprachatlas von Bayerisch-Schwaben (SBS) and other geolinguistic corpora

• www.geoling.net

Page 3: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

A tool for bottom-up dialectometry (cf. Pickl/Rumpf 2012)

With GeoLing, you can

• produce probabilistic area-class maps of linguistic variables using intensity estimation

• find groups of maps that are spatially similar using cluster analysis

• identify and plot recurring spatial patterns using factor analysis

Page 4: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

• What can you do with GeoLing? → Simon Pickl • intensity estimation

• factor analysis

• How do you use GeoLing? → Aaron Spettl • installing GeoLing

• performing analyses

• importing your data

Outline

Page 5: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

• What can you do with GeoLing? → Simon Pickl • intensity estimation

• factor analysis

• How do you use GeoLing? → Aaron Spettl • installing GeoLing

• performing analyses

• importing your data

Outline

Page 6: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

Testbed: Sprachatlas von Bayerisch-Schwaben (SBS)

• compiled 1984‒2009 at the University of Augsburg under the direction of Werner König

• approximately 2,700 maps in 14 volumes

• 272 sites

• for each map 0–3 records per site

photo by Stefan Puchner

Germany

Bavaria

Page 7: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

Intensity estimation Cf. Rumpf/Pickl/Elspaß/König/Schmidt 2009; Pickl/Rumpf 2011; 2012

• Method for estimating the probabilistic distribution underlying the records

• Motivation: Individual records are not necessarily representative

• Records are treated as statistical samples from an underlying distribution

• Intensity estimation uses the geographical or linguistic relatedness between sites to infer local probabilities

Page 8: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

Intensity estimation Cf. Rumpf/Pickl/Elspaß/König/Schmidt 2009; Pickl/Rumpf 2011; 2012

intensity estimation continuous intensity estimation

words for ‘woodlouse’

Page 9: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

Linguistic distances in intensity estimation Cf. Pickl/Spettl/Pröll/Elspaß/König/Schmidt 2014

intensity estimation based on geographical distances

intensity estimation based on linguistic (in this case: lexical) distances

words for ‘woodlouse’

Page 10: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

• Intensity estimation with linguistic distances:

• less “smooth” isoglosses and areas

• more detail

• preservation of language island (e.g. towns) and dialect borders

• continuous plot not possible with linguistic distances

Linguistic distances in intensity estimation Cf. Pickl/Spettl/Pröll/Elspaß/König/Schmidt 2014

Page 11: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker
Page 12: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker
Page 13: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

• Further analysis:

• statistical analysis of spatial characteristics (homogeneity, complexity)

• Rumpf/Pickl/Elspaß/König/Schmidt 2010

• cluster analysis to obtain groups of maps with similar spatial structure

• Rumpf/Pickl/Elspaß/König/Schmidt 2010; Meschenmoser/Pröll 2012

Intensity estimation Cf. Rumpf/Pickl/Elspaß/König/Schmidt 2009; Pickl/Rumpf 2011; 2012; Pickl/Spettl/Pröll/Elspaß/König/Schmidt 2014

Page 14: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

• statistical tool for dimensionality reduction Applications in dialectometry: Clopper/Paolillo 2006; Nerbonne 2006; Leinonen 2010; Grieve/Speelman/Geeraerts 2011

• condenses large numbers of variants with similar distributions into so-called “factors”

• provides a “summary” of predominant spatial patterns in the data

Factor Analysis Cf. Pröll/Pickl/Spettl (to appear)

Page 15: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

• summarize 59.9 % of the data (equivalent of 16,961 variants)

• areas of similar variant distributions

• only the 10 locally dominant factors visible in this map

• in total: 15 factors (Kaiser criterion)

• non-dominant factors are hidden but ‘latently’ present

Combined Factor Map: Dominant factors in the SBS (all 2,160 maps, 28,315 variants)

Factor Analysis Cf. Pröll/Pickl/Spettl (to appear)

Page 16: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker
Page 17: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

• summarizes 14.58 % of the data (equivalent of 4,128 variant distributions)

• area of tendential co-occurrence of variants

• fuzzy distribution

Example: Factor 1

Factor Analysis Cf. Pröll/Pickl/Spettl (to appear)

Page 18: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

• summarizes 12.40 % of the data (equivalent of 3,511 variant distributions)

• area of tendential co-occurrence of variants

• fuzzy distribution

Example: Factor 2

Factor Analysis Cf. Pröll/Pickl/Spettl (to appear)

Page 19: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

• summarizes 0.71 % of the data (equivalent of 201 variant distributions)

• area of tendential co-occurrence of variants

• fuzzy, discontinous distribution

Example: Factor 10

Factor Analysis Cf. Pröll/Pickl/Spettl (to appear)

Page 20: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

• summarizes 0.62 % of the data (equivalent of 176 variant distributions)

• area of tendential co-occurrence of variants

• fuzzy distribution

Example: Factor 11

Factor Analysis Cf. Pröll/Pickl/Spettl (to appear)

Page 21: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

Factor Analysis Cf. Pröll/Pickl/Spettl (to appear) Catchment area of market town Lauingen Factor 11

Page 22: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

• nuanced and detailed account of overall spatial patterns in the data

• useful for

• a quick overview of major spatial structures

• a differentiated division into graded, fuzzy dialect areas

• an exploratory look into recurring spatial structures (even weak ones) that are hitherto unknown

Factor Analysis Cf. Pröll/Pickl/Spettl (to appear)

Page 23: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

• What can you do with GeoLing? → Simon Pickl • intensity estimation

• factor analysis

• How do you use GeoLing? → Aaron Spettl • installing GeoLing

• performing analyses

• importing your data

Outline

Page 24: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

• What can you do with GeoLing? → Simon Pickl • intensity estimation

• factor analysis

• How do you use GeoLing? → Aaron Spettl • installing GeoLing

• performing analyses

• importing your data

Outline

Page 25: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

Installing GeoLing

Simple installation:

• download: www.geoling.net

• GeoLing is ready to use after unzipping a single file; no installation is required.

• Sprachatlas von Bayerisch-Schwaben (SBS) is included for demonstration purposes

Live demonstration of GeoLing – some screenshots are supplied at the end of this

presentation!

Page 26: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

Installing GeoLing

Requirements:

• Java 7 (or higher) must be installed

• a 64-bit Java is recommended on 64-bit operating systems

• processor and memory requirements depend on the database, e.g. number of locations

• SBS database: dual-core CPU and 2 GB RAM recommended

Live demonstration of GeoLing – some screenshots are supplied at the end of this

presentation!

Page 27: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

Performing analyses

Main window of GeoLing:

• maps are hierarchically organized for easy navigation

• individual maps can be investigated directly

• but: most analyses are performed on ‘groups’ of maps

• example: groups in SBS database • full corpus

• lexical sub-corpus

• phonetic sub-corpus

• morphological sub-corpus

Live demonstration of GeoLing – some screenshots are supplied at the end of this

presentation!

Page 28: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

Performing analyses

With a group of maps, you can

• perform intensity estimations, plot maps to image files, calculate characteristics etc.

• perform factor analyses and cluster analyses

For factor and cluster analysis:

• results are visualized immediately

• results can be saved to CSV or XML files for further processing

Live demonstration of GeoLing – some screenshots are supplied at the end of this

presentation!

Page 29: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

Importing data

• “Create new database” / “Edit existing database”

• “Database management” dialog: • import your own data from simple text files (CSV), whose format is

described in the user guide

• import custom distances between locations

• export/import database e.g. for backup and exchange of data

• computation of linguistic distances

• computation of bandwidths for intensity estimations

Live demonstration of GeoLing – some screenshots are supplied at the end of this

presentation!

Page 30: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

Bottom line

GeoLing provides

• several methods for the detection of spatial patterns in geolinguistic data

• easy installation and import of your own data

• open-source license allows modifications and custom extensions

You can start now to use it on your own data!

Page 31: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

• Clopper, C. G. / Paolillo, J. C. (2006): “North American English Vowels: A Factor-analytic Perspective”. Literary and Linguistic Computing 21/4, 445–462.

• Leinonen, T. (2010): An Acoustic Analysis of Vowel Pronunciation in Swedish Dialects. Groningen: Rijksuniversiteit Groningen.

• Meschenmoser, D. / Pröll, S. (2012): “Using fuzzy clustering to reveal recurring spatial patterns in corpora of dialect maps”. International Journal of Corpus Linguistics 17/2, 176–197.

• Nerbonne, J. (2006): “Identifying linguistic structure in aggregate comparison”. Literary and Linguistic Computing 21/4, 463–475.

• Pickl, S. / Rumpf, J. (2011): “Automatische Strukturanalyse von Sprachkarten. Ein neues statistisches Verfahren”. In: Glaser, E. / Schmidt, J. E. / Frey, N. (eds): Dynamik des Dialekts – Wandel und Variation. Akten des 3. Kongresses der Internationalen Gesellschaft für Dialektologie des Deutschen (IGDD). Stuttgart: Steiner, 267–285.

• Pickl, S. / Rumpf, J. (2012): “Dialectometric Concepts of Space: Towards a Variant-Based Dialectometry”. In: Hansen, S. / Schwarz, C. / Stoeckle, P. / Streck, T. (eds): Dialectological and folk dialectological concepts of space. Berlin: Walter de Gruyter. 199–214.

• Pickl, S. / Spettl, A. / Pröll, S. / Elspaß, S. / König, W. / Schmidt, V. (2014): “Linguistic distances in dialectometric intensity estimation”. Journal of Linguistic Geography 2, 25–40.

• Pröll, S. / Pickl, S. / Spettl, A. (to appear): “Latente Strukturen in geolinguistischen Korpora”. In: Elmentaler, M. / Hundt, M. / Schmidt, J. E. (eds.): Deutsche Dialekte. Konzepte, Probleme, Handlungsfelder. Akten des 4. Kongresses der Internationalen Gesellschaft für Dialektologie des Deutschen (IGDD) in Kiel. Stuttgart: Steiner.

• Rumpf, J. / Pickl, S. / Elspaß, S. / König, W. / Schmidt, V. (2009): “Structural analysis of dialect maps using methods from spatial statistics”. Zeitschrift für Dialektologie und Linguistik 76/3, 280–308.

• Rumpf, J. / Pickl, S. / Elspaß, S. / König, W. / Schmidt, V. (2010): “Quantification and statistical analysis of structural similarities in dialectological area-class maps”. Dialectologia et Geolinguistica 18, 73–98.

References

Page 32: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

• www.geoling.net

• contents of extracted ZIP archive

• starting GeoLing

Appendix: Screenshots

Page 33: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

www.geoling.net

Page 34: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker
Page 35: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

double-click to start GeoLing

Page 36: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

• main window after startup

• navigation by hierarchical categories to individual maps

• graded area-class-maps by intensity estimation

Appendix: Screenshots

Page 37: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker
Page 38: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker
Page 39: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker
Page 40: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker
Page 41: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker
Page 42: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

• „groups“ for operations/analyses on many maps

• export function to generate e.g. graded area-class-maps for all maps of a group

• factor analysis

Appendix: Screenshots

Page 43: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker
Page 44: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker
Page 45: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker
Page 46: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker
Page 47: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

• importing data

• example of file format required

Appendix: Screenshots

Page 48: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker

choose file name of database

Page 49: Bottom-up dialectometry using the GeoLing package...Bottom-up dialectometry using the GeoLing package Simon Pickl, Aaron Spettl, Simon Pröll, Stephan Elspaß, Werner König, Volker