optimizing (socio-)linguistic analysis: language variation

79
Introduction Data Preparation Language Variation Suite Working with Data Visual Analytics Inferential Analysis Mixed Effects Appendix References Optimizing (Socio-)linguistic Analysis: Language Variation Suite Toolkit Dr. Olga Scrivner Research Scientist, CNS, SICE, IU Corporate Faculty, Data Analytics Graduate Program, HU CEWIT Faculty Fellow April 12, 2018 1 / 72

Upload: others

Post on 12-Apr-2022

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Optimizing (Socio-)linguistic Analysis:Language Variation Suite Toolkit

Dr. Olga Scrivner

Research Scientist, CNS, SICE, IUCorporate Faculty, Data Analytics Graduate Program, HU

CEWIT Faculty Fellow

April 12, 20181 / 72

Page 2: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Goal

Provide researchers with a variety of quantitative methods toadvance language variation studies.

PositionSentencep < 0.001

1

ind, pre post

Heavinessp = 0.003

2

≤ 1 > 1

Periodp < 0.001

3

≤ 1 > 1

Node 4 (n = 81)

VO

OV

00.20.40.60.81

Node 5 (n = 119)

VO

OV

00.20.40.60.81

Node 6 (n = 181)

VO

OV

00.20.40.60.81

Periodp < 0.001

7

≤ 2 > 2

Node 8 (n = 221)

VO

OV

00.20.40.60.81

Focusp < 0.001

9

cf nf

Node 10 (n = 66)

VO

OV

00.20.40.60.81

Main_Verb_Structurep < 0.001

11

ACIOther, Restructuring

Node 12 (n = 43)

VO

OV

00.20.40.60.81

Node 13 (n = 265)

VO

OV

00.20.40.60.81

2 / 72

Page 3: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Objectives

1 Introduce a novel (socio)linguistic toolkit

2 Develop practical skills

3 Understand and interpret advanced statistical models

3 / 72

Page 4: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

What is LVS?

Language Variation Suite

It is a Shiny web application designed for data analysis insociolinguistic research.

It can be used for:

Processing spreadsheet data

Reporting in tables and graphs

Analyzing means, regression, conditional trees ...(and much more)

4 / 72

Page 5: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Background

LVS is built in R using Shiny package:

1 R - a free programming language for statistical computingand graphics

2 Shiny App - a web application framework for R

Computational power of R + Web interactivity

5 / 72

Page 6: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Background

http://littleactuary.github.io/blog/Web-application-framework-with-Shiny/

6 / 72

Page 7: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Data Preparation

Important things to consider before data entry:

File format:

Comma separated value (CSV) - faster processing

Excel format will slow processing

Column names should not contain spaces

Permitted: non-accented characters, numbers, underscore,hyphen, and period

One column must contain your dependent variable

The rest of the columns contain independent variables

7 / 72

Page 8: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Workspace

Browser

Chrome, Firefox, Safari - recommendable

Explorer may cause instability issues

Accessibility

PC, Mac, Linux

Data files will be uploaded from any location on yourcomputer

Smart Phone

Data files must be on a cloud platform connected to yourphone account (e.g. dropbox)

8 / 72

Page 9: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Server

Since LVS is hosted on a server, Shiny idle time-out settingsmay stop application when it is left inactive (it will grey out).

Solution: Click reload and re-upload your csv file

9 / 72

Page 10: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Terminology Review

a. Categorical - non-numerical data with two values

yes - no; deletion - retention; perfective - imperfective

b. Continuous - numerical data

duration, age, chronological period

c. Multinomial - non-numerical data with three or morevalues

deletion - aspiration - retention

d. Ordinal - scale: currently not supported

10 / 72

Page 11: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Terminology Review

a. Categorical - non-numerical data with two values

yes - no; deletion - retention; perfective - imperfective

b. Continuous - numerical data

duration, age, chronological period

c. Multinomial - non-numerical data with three or morevalues

deletion - aspiration - retention

d. Ordinal - scale: currently not supported

10 / 72

Page 12: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Workshop Files

https://languagevariationsuite.wordpress.com/

1 categoricaldata.csv: categorical dependent - Labov NewYork 1966 study

2 continuousdata.csv: continuous dependent - Intervocalic/d/ in Caracas corpus (Dıaz-Campos et al.)

3 LVS web site: www.languagevariationsuite.com

11 / 72

Page 13: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Language Variation Suite - Structure

1 Data

Upload file, data summary, adjust data, cross tabulation

2 Visual Analysis

Plotting, cluster classification

3 Inferential Statistics

Modeling, regression, conditional trees, random forest

12 / 72

Page 14: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Language Variation Suite - Structure

1 Data

Upload file, data summary, adjust data, cross tabulation

2 Visual Analysis

Plotting, cluster classification

3 Inferential Statistics

Modeling, regression, conditional trees, random forest

12 / 72

Page 15: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Upload File

Upload movie metadata.csv

13 / 72

Page 16: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Excel Format

1 Slow processing

2 Requires the name of your excel sheet

14 / 72

Page 17: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Save Excel as CSV Format

To optimize speed - Save as CSV prior upload

15 / 72

Page 18: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Upload File

Upload categoricaldata.csv

16 / 72

Page 19: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Uploaded Dataset

The data content is imported as a table and allows for sortingcolumns.

17 / 72

Page 20: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Summary

Summary provides a quantitative summary for each variable,e.g. frequency count, mean, median.

18 / 72

Page 21: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Data Structure

1 Total number of observations (rows)

2 Number of variables (columns)

3 Variable types

Factor - categorical values

Num - numeric values (0.95, 1.05)

Int - integer values (1, 2, 3)19 / 72

Page 22: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Cross Tabulation

Cross-tabulation examines the relationship between variables.

20 / 72

Page 23: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Cross-Tabulation: One Dependent and OneIndependent Variables

21 / 72

Page 24: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Cross-Tabulation Output

Raw frequency / Proportion by column / Proportion across row

22 / 72

Page 25: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Language Variation Suite - Structure

1 Data

Upload file, data summary, adjust data, cross tabulation

2 Visual Analysis

Plotting, cluster classification

3 Inferential statistics

Modeling, regression, varbrul analysis, conditional trees,random forest

23 / 72

Page 26: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Adjusting Browser - Layout

Shiny pages are fluid and reactive.

24 / 72

Page 27: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

One Variable Plot

25 / 72

Page 28: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Two Variables Plot

26 / 72

Page 29: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Saving Plot

27 / 72

Page 30: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Three Variables Plot

28 / 72

Page 31: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Cluster Plot

Classification of data into sub-groups is based onpairwise similarities

Groups are clustered in the form of a tree-likedendrogram

29 / 72

Page 32: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Cluster Plot

30 / 72

Page 33: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Cluster Plot

Saks (upper middle-class store), Macy’s (middle-class store), Kleins(working-class)

31 / 72

Page 34: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Inferential Statistics

32 / 72

Page 35: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Language Variation Suite - Structure

1 Data

Upload file, data summary, adjust data, cross tabulation

2 Visual Analysis

Plotting, cluster classification

3 Inferential statistics

Modeling, regression, varbrul analysis, conditional trees,random forest

33 / 72

Page 36: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

How to Create a Regression Model

Step 1 Modeling - create a model with dependent andindependent variables

Step 2 Regression - specify the type of regression (fixed,mixed) and type of dependent variable (binary,continuous, multinomial)

Step 3 Stepwise Regression - compare models(Log-likelihood, AIC, BIC)

Step 4 Conditional Trees - apply non-parametric teststo the model

34 / 72

Page 37: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Modeling

35 / 72

Page 38: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Modeling

35 / 72

We are interested in RETENTION= Application

Page 39: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Regression Types

Model

a.) Fixed effect

b.) Mixed effect - individual speaker/token variation (withingroup)

Type of Dependent Variable

a.) Binary/categorical (only two values)

b.) Continuous (numeric)

c.) Multinomial - categorical with more than two values

36 / 72

Page 40: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Regression

37 / 72

Page 41: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Model Output

38 / 72

Page 42: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Model Output

38 / 72

Page 43: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Interpretation

Deletion is the reference value

Positive coefficient - positive effect

Negative coefficient - negative effect

39 / 72

Page 44: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Interpretation

Lexical item Fourth has a negative effect on retentioncompared to Floor and is significant

Normal style has a slightly negative effect on retention but itscoefficient is not significant

Macy’s and Saks have a positive and significant effect onretention. Saks (upper middle class store) is more significantthan Macy’s (middle class store)

http://www.free-online-calculator-use.com/scientific-notation-converter.html40 / 72

Page 45: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Interpretation

Lexical item Fourth has a negative effect on retentioncompared to Floor and is significant

Normal style has a slightly negative effect on retention but itscoefficient is not significant

Macy’s and Saks have a positive and significant effect onretention. Saks (upper middle class store) is more significantthan Macy’s (middle class store)

http://www.free-online-calculator-use.com/scientific-notation-converter.html40 / 72

exponential notation:

1.48e-8

0.0000000148

Page 46: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Conditional Tree

Conditional tree: a simple non-parametric regression analysis,commonly used in social and psychological studies

Linear regression: all information is combined linearly

Conditional tree regression: visual splitting to captureinteraction between variables

Recursive splitting (tree branches)41 / 72

Page 47: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Conditional Tree - Tagliamonte and Baayen 2012

1 The distribution of was/were is split in two groups byindividuals.

2 The variant were occurs significantly more frequently with thefirst group.

42 / 72

Page 48: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Conditional Tree - Tagliamonte and Baayen (2012)

1 Polarity is relevant to the second group of individuals.

2 The variant were occurs significantly more often with negativepolarity

43 / 72

Page 49: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Conditional Tree - Tagliamonte and Baayen (2012)

1 Affirmative Polarity is conditioned by Age.

2 The variant was is produced significantly more often byIndividuals of 46 and younger.

44 / 72

Page 50: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Conditional Tree

45 / 72

Page 51: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Conditional Tree

1 Store is the most significant factor for R-use

Kleins (working class store) - more R-deletion

2 R-use in Macy’s and Saks is conditioned by lexical item:

Floor shows more R-retention than Fourth

3 Style is not significant46 / 72

Page 52: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Random Forest

1 Variable importance for predictors

2 Robust technique with small n large p data

3 All predictors considered jointly (allows for inclusion ofcorrelated factors)

47 / 72

Page 53: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Random Forest

48 / 72

Page 54: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Random Forest

1 Store is the most important predictor

2 Lexical Item is the second predictor

3 Style is irrelevant: close to zero and red dotted line (cut-offvalue).

49 / 72

Page 55: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Fixed and Mixed Models

Fixed Effects Model : All predictors are treated independent.

Underlying assumption - no group-internalvariation between speakers or tokens

Mixed Effects Model : Allows for evaluation of individual- andgroup-level variation

50 / 72

Page 56: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Fixed and Mixed Models

Fixed Regression Model - ignoring individual variations(speakers or words) may lead to Type I Error:“a chance effect is mistaken for a real differencebetween the populations”

Mixed Regression Model - prone to Type II Error:“if speaker variation is at a high level, we cannotdiscern small population effects without a largenumber of speakers” (Johnson 2009, 2015)

51 / 72

Page 57: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Mixed Effect Regression

Mixed Model = fixed effects + random effects

Fixed-effect factor - “repeatable and a small number of levels”

Random-effect factor - “a non-repeatable random sample froma larger population” (Wieling 2012)

walk, sleep, study, finish, eat, etc

event verb, stative verb

speaker1, speaker3, speaker3, etc

male, female

52 / 72

Page 58: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Mixed Effect Regression

Mixed Model = fixed effects + random effects

Fixed-effect factor - “repeatable and a small number of levels”

Random-effect factor - “a non-repeatable random samplefrom a larger population” (Wieling 2012)

walk, sleep, study, finish, eat, etc

event verb, stative verb

speaker1, speaker3, speaker3, etc

male, female

52 / 72

Page 59: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Preparing for Mixed Model

1 Download continuousdata.csv

2 Upload this file on LVS

53 / 72

Page 60: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Data - Uploaded Dataset

54 / 72

Page 61: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Mixed Effect Modeling

55 / 72

NULL when the dependent variable is continuous

Fixed Effects - independent variables

Page 62: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Mixed Effect Modeling

56 / 72

Mixed Effects - group-internal variation

Page 63: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Regression Results

57 / 72

Page 64: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Regression Results

57 / 72

Page 65: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Interpretation - Random Effects

1 Standard Deviation: a measure of the variability for eachrandom effect (speakers and tokens)

2 Residual: random variation that is not due to speakers ortokens (residual error)

58 / 72

Page 66: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Interpretation - Fixed Effects

1 Estimate/coefficient: reported in log-odds (negative orpositive)

2 P-value: tells you if the level is significant

59 / 72

Page 67: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Bonus - Word Clouds

60 / 72

Page 68: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Frequency Plot

61 / 72

Page 69: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Frequency Plot

62 / 72

Page 70: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Appendix 1: Density

63 / 72

Number of bins can have adisproportionate effect on visualization

Page 71: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Histogram

Density: a non-parametric model of the distribution of points basedon a smooth density estimate

http://scikit-learn.org/stable/modules/density.html

64 / 72

Page 72: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Appenix 2 - Data Modification

65 / 72

Page 73: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Adjust Data

Retain: Select data subset

Exclude: Exclude variables from a factor group

Recode: Combine and rename variables

Change class: Numeric → factor; factor → numeric

Transform: Apply log transformation to a specific column

ADJUSTED DATASET:

Run - to apply all above changes

Reset - to reset to the original dataset

66 / 72

Page 74: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Exclude: Emphatic Style

67 / 72

Page 75: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Adjusted Dataset

68 / 72

Page 76: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Adjusting Dataset

To revert to the original data, select RESET:

69 / 72

Page 77: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Appendix 3 - Model Comparison

70 / 72

Page 78: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

Thank you!

[email protected]

What features/analyses would you like to see in LVS?

71 / 72

Page 79: Optimizing (Socio-)linguistic Analysis: Language Variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

Mixed Effects

Appendix

References

References I

[1] Baayen, Harald. 2008. Analyzing linguistic data: A practical introduction to statistics. Cambridge:Cambridge University Press

[2] Bentivoglio, Paola and Mercedes Sedano. 1993. Investigacion sociolinguıstica: sus metodos aplicados auna experiencia venezolana. Boletın de Linguıstica 8. 3-35

[3] Gries, Stefan Th. 2015. Quantitative designs and statistical techniques. In Douglas Biber RandiReppen (eds.), The Cambridge Handbook of English Corpus Linguistics. Cambridge: CambridgeUniversity Press

[4] Labov, W. 1966. The Social Stratification of English in New York City. Washington: Center for AppliedLinguistics

[5] Schnapp, Jeffrey, and Peter Presner. 2009. Digital Humanities Manifesto 2.0.

[6] http://gifsanimados.espaciolatino.com/x bob esponja 8.gif

[7] https://daniellestolt.files.wordpress.com/2013/01/are-you-ready1.gif

[8] http://www.martijnwieling.nl/R/sheets.pdf

72 / 72