workshop on quantitative analytics using interactive on-line tool

53
Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Interactive Visual Data Analysis Part One Language Variation Suite Olga Scrivner Indiana University Workshop in Methods 1 / 44

Upload: olga-scrivner

Post on 13-Apr-2017

55 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Interactive Visual Data AnalysisPart One

Language Variation Suite

Olga Scrivner

Indiana University

Workshop in Methods1 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Outline

1 Introduce a web application for quantitativeanalysis: LVS

2 Develop practical skills

3 Understand and interpret advanced statistical modelsPositionSentence

p < 0.001

1

ind, pre post

Heavinessp = 0.003

2

≤ 1 > 1

Periodp < 0.001

3

≤ 1 > 1

Node 4 (n = 81)

VO

OV

00.20.40.60.81

Node 5 (n = 119)

VO

OV

00.20.40.60.81

Node 6 (n = 181)

VO

OV

00.20.40.60.81

Periodp < 0.001

7

≤ 2 > 2

Node 8 (n = 221)

VO

OV

00.20.40.60.81

Focusp < 0.001

9

cf nf

Node 10 (n = 66)

VO

OV

00.20.40.60.81

Main_Verb_Structurep < 0.001

11

ACIOther, Restructuring

Node 12 (n = 43)

VO

OV

00.20.40.60.81

Node 13 (n = 265)

VO

OV

00.20.40.60.81

2 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

What is LVS?

Language Variation Suite

It is a Shiny web application originally designed for dataanalysis in sociolinguistic research.

It can be used for:

Processing spreadsheet data

Reporting in tables and graphs

Analyzing means, regression, conditional trees ...(and much more)

3 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Background

LVS is built in R using Shiny package:

1 R - a free programming language for statistical computingand graphics

2 Shiny App - a web application framework for R

Computational power of R + Web interactivity

4 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Background

http://littleactuary.github.io/blog/Web-application-framework-with-Shiny/

5 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Workspace

Browser

Chrome, Firefox, Safari - recommendable

Explorer may cause instability issues

Accessibility

PC, Mac, Linux

Data files will be uploaded from any location on yourcomputer

Smart Phone

Data files must be on a cloud platform connected to yourphone account (e.g. dropbox)

6 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Server

Since LVS is hosted on a server, Shiny idle time-out settingsmay stop application when it is left inactive (it will grey out).

Solution: Click reload and re-upload your csv file

7 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Data Preparation

Important things to consider before data entry:

File format:

Comma separated value (CSV) - faster processing

Excel format will slow processing

Column names should not contain spaces

Permitted: non-accented characters, numbers, underscore,hyphen, and period

One column must contain your dependent variable

The rest of the columns contain independent variables

8 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Terminology Review

a. Categorical - non-numerical data with two values

yes - no; male - female

b. Continuous - numerical data

duration, age, year

c. Multinomial - non-numerical data with three or morevalues

regions, nationalities

d. Ordinal - scale: currently not supported

9 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Terminology Review

a. Categorical - non-numerical data with two values

yes - no; male - female

b. Continuous - numerical data

duration, age, year

c. Multinomial - non-numerical data with three or morevalues

regions, nationalities

d. Ordinal - scale: currently not supported

9 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Workshop Files

http://ssrc.indiana.edu/seminars/wim.shtml

1 movie metadata.csvSimplified set from https://www.kaggle.com/deepmatrix/

imdb-5000-movie-dataset

2 LVS web site: https://languagevariationsuite.com

10 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Movie Data

Budget

Director

Actor 1

Director facebook likes

Actor 1 facebook likes

Genre

Year

11 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Language Variation Suite - Structure

1 Data

Upload file, data summary, adjust data, cross tabulation

2 Visual Analysis

Plotting, cluster classification

3 Inferential Statistics

Modeling, regression, conditional trees, random forest

12 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Language Variation Suite - Structure

1 Data

Upload file, data summary, adjust data, cross tabulation

2 Visual Analysis

Plotting, cluster classification

3 Inferential Statistics

Modeling, regression, conditional trees, random forest

12 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Upload File

Upload movie metadata.csv

13 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Uploaded Dataset

The data content is imported as a table and allows for sortingcolumns.

14 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Summary

Summary provides a quantitative summary for each variable,e.g. frequency count, mean, median.

15 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Data Structure

1 Total number of observations (rows)

2 Number of variables (columns)

3 Variable types

Factor - categorical values

Num - numeric values (0.95, 1.05)

Int - integer values (1, 2, 3)16 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Cross Tabulation

Cross-tabulation examines the relationship between variables.

17 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Cross Tabulation Plot

18 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Cross Tabulation Plot

18 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Adjusting Browser - Layout

Shiny pages are fluid and reactive.

19 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Adjusting Browser - Layout

20 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Language Variation Suite - Structure

1 Data

Upload file, data summary, adjust data, cross tabulation

2 Visual Analysis

Plotting, cluster classification

3 Inferential statistics

Modeling, regression, varbrul analysis, conditional trees,random forest

21 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Visual Analytics

Visual Analytics: “The science of analytical reasoningfacilitated by visual interactive interfaces”

(Thomas et al. 2005)

22 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

One Variable Plot

23 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

One Variable Plot

23 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Customizing Plot

24 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Customizing Plot

24 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Saving Plot

25 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Cluster Plot

Classification of data into sub-groups is based onpairwise similarities

Groups are clustered in the form of a tree-likedendrogram

26 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Cluster Plot

27 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Cluster Plot

Group 1 Animation, Biography and Group 2 Action, Drama,Comedy

28 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Inferential Statistics

29 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Language Variation Suite - Structure

1 Data

Upload file, data summary, adjust data, cross tabulation

2 Visual Analysis

Plotting, cluster classification

3 Inferential statistics

Modeling, regression, varbrul analysis, conditional trees,random forest

30 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

How to Create a Regression Model

Step 1 Modeling - create a model with dependent andindependent variables

Step 2 Regression - specify the type of regression (fixed,mixed) and type of dependent variable (binary,continuous, multinomial)

Step 3 Stepwise Regression - compare models(Log-likelihood, AIC, BIC)

Step 4 Conditional Trees - apply non-parametric teststo the model

31 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Modeling

32 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Regression Types

Model

a.) Fixed effect

b.) Mixed effect - individual speaker/token variation (withingroup)

Type of Dependent Variable

a.) Binary/categorical (only two values)

b.) Continuous (numeric)

c.) Multinomial - categorical with more than two values

33 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Regression

34 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Model Output

35 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Model Output

35 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Interpretation: Budget and Genre

Genre Action is the reference value

Positive coefficient - positive effect

Negative coefficient - negative effect

http://www.free-online-calculator-use.com/scientific-notation-converter.html

36 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Interpretation: Budget and Genre

Genre Action is the reference value

Positive coefficient - positive effect

Negative coefficient - negative effect

http://www.free-online-calculator-use.com/scientific-notation-converter.html

36 / 44

exponential notation:

1.46e-7

0.000000146

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Conditional Tree

Conditional tree: a simple non-parametric regression analysis,commonly used in social and psychological studies

Linear regression: all information is combined linearly

Conditional tree regression: visual splitting to captureinteraction between variables

Recursive splitting (tree branches)37 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Conditional Tree

38 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Conditional Tree

38 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Conditional Tree

1 Genre is the significant factor for budget

2 Budget distribution is split in two groups:

Action and Animation

Biography, Comedy and Drama

3 Budget is significantly higher for Animation and Action

39 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Random Forest

1 Variable importance for predictors

2 Robust technique with small n large p data

3 All predictors considered jointly (allows for inclusion ofcorrelated factors)

40 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Random Forest

Let’s add more factors!

Return to Modeling

Add independent factors: director facebook likes, actor1 facebook likes, title year

41 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Random Forest

Genre is the most important predictor for this model.

Close to zero or red-dotted line (cut off values) - irrelevantfor this model

42 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Random Forest

Genre is the most important predictor for this model.

Close to zero or red-dotted line (cut off values) - irrelevantfor this model

42 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Let’s Have a Short Break

43 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

References I

[1] Baayen, Harald. 2008. Analyzing linguistic data: A practical introduction to statistics. Cambridge:Cambridge University Press

[2] Gries, Stefan Th. 2015. Quantitative designs and statistical techniques. In Douglas Biber RandiReppen (eds.), The Cambridge Handbook of English Corpus Linguistics. Cambridge: CambridgeUniversity Press

[3] Schnapp, Jeffrey, and Peter Presner. 2009. Digital Humanities Manifesto 2.0.

[4] http://gifsanimados.espaciolatino.com/x bob esponja 8.gif

[5] https://daniellestolt.files.wordpress.com/2013/01/are-you-ready1.gif

[6] http://www.martijnwieling.nl/R/sheets.pdf

44 / 44