topics, summer 2008 day 1. introduction models and the world (probability & frequency) types of...

6
Topics, Summer 2008 Day 1. Introduction • models and the world (probability & frequency) • types of data (nominal, count, interval, ratio) • some R basics (read.table, barplot, hist, etc.) Day 2. Central limit theorem, sampling, evaluating differences between samples (& between populations) Day 3. Evaluating relationships – scatterplots, correlation, Principal Components Analysis Day 4. Regression and Analysis of Variance Day 5. Logistic regression – log odds, maximum likelihood, relationship to GoldVarb

Upload: melvin-bond

Post on 12-Jan-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Topics, Summer 2008 Day 1. Introduction models and the world (probability & frequency) types of data (nominal, count, interval, ratio) some R basics (read.table,

Topics, Summer 2008

Day 1. Introduction• models and the world (probability & frequency)• types of data (nominal, count, interval, ratio)• some R basics (read.table, barplot, hist, etc.)

Day 2. Central limit theorem, sampling, evaluating differences between samples (& between populations)

Day 3. Evaluating relationships – scatterplots, correlation, Principal Components Analysis

Day 4. Regression and Analysis of Variance

Day 5. Logistic regression – log odds, maximum likelihood, relationship to GoldVarb

Page 2: Topics, Summer 2008 Day 1. Introduction models and the world (probability & frequency) types of data (nominal, count, interval, ratio) some R basics (read.table,

Goals

• Understand and appreciate “four main goals of quantitative analysis” (Johnson, 2008, p. 3):

1.data reduction, summary

2. Inference (generalization to larger population)

3.discovery of (potentially causal) relationships

4.exploration of processes that may have a basis in probability

• Also see “justification for course” written for grant proposal to the National Science Foundation for support for the mini-Institute

• Ancillary goal: assuage any fear of tools such as R

Page 3: Topics, Summer 2008 Day 1. Introduction models and the world (probability & frequency) types of data (nominal, count, interval, ratio) some R basics (read.table,

What is a model?

• “a simplified description, especially a mathematical one, of a system or process, to assist calculations and predictions” (Concise Oxford English Dictionary, 11th edition)

• “A model is any simplification, substitute or stand-in for what you are actually studying or trying to predict. Models are used because they are convenient substitutes, the way that a recipe is a convenient aid in cooking.” (Craig M. Pease & James J. Bull)

• Types (Pease & Bull): abstract, physical, sampling• Goal: acquire some tools for using sampling models

and relating them to abstract models

Page 4: Topics, Summer 2008 Day 1. Introduction models and the world (probability & frequency) types of data (nominal, count, interval, ratio) some R basics (read.table,

Data and models

• Each datum is, at some level, a model• Relationship of data to larger model – via sequence of

pivotal questions:• What is the question that I’m trying to answer?• What are the relevant assumptions in the linguistic

model(s) that I am adopting that provide the context for this question?

• What are the simplifying assumptions about the world that provide the observational “instrument”?

• What are the mathematical models that let me relate the observations to the linguistic model(s)?

Page 5: Topics, Summer 2008 Day 1. Introduction models and the world (probability & frequency) types of data (nominal, count, interval, ratio) some R basics (read.table,

Types of variable

1. nominal - unordered, named

2. ordinal - ordered, named

3. interval - measured on a scale without an absolute zero

4. ratio - measured on a scale with an absolute zero

Page 6: Topics, Summer 2008 Day 1. Introduction models and the world (probability & frequency) types of data (nominal, count, interval, ratio) some R basics (read.table,

Distributions for nominal variables

• Counts

how many Xs do I have?• Proportions

how many Xs do I have out of the total number of observations?

Example:• How many of the clauses tagged in the Switchboard

portion of the Bresnan et al. (2007) dataset show the PP realization of the recipient?

• What proportion of the Switchboard observations …?