topics, summer 2008 day 1. introduction models and the world (probability & frequency) types of...
TRANSCRIPT
Topics, Summer 2008
Day 1. Introduction• models and the world (probability & frequency)• types of data (nominal, count, interval, ratio)• some R basics (read.table, barplot, hist, etc.)
Day 2. Central limit theorem, sampling, evaluating differences between samples (& between populations)
Day 3. Evaluating relationships – scatterplots, correlation, Principal Components Analysis
Day 4. Regression and Analysis of Variance
Day 5. Logistic regression – log odds, maximum likelihood, relationship to GoldVarb
Goals
• Understand and appreciate “four main goals of quantitative analysis” (Johnson, 2008, p. 3):
1.data reduction, summary
2. Inference (generalization to larger population)
3.discovery of (potentially causal) relationships
4.exploration of processes that may have a basis in probability
• Also see “justification for course” written for grant proposal to the National Science Foundation for support for the mini-Institute
• Ancillary goal: assuage any fear of tools such as R
What is a model?
• “a simplified description, especially a mathematical one, of a system or process, to assist calculations and predictions” (Concise Oxford English Dictionary, 11th edition)
• “A model is any simplification, substitute or stand-in for what you are actually studying or trying to predict. Models are used because they are convenient substitutes, the way that a recipe is a convenient aid in cooking.” (Craig M. Pease & James J. Bull)
• Types (Pease & Bull): abstract, physical, sampling• Goal: acquire some tools for using sampling models
and relating them to abstract models
Data and models
• Each datum is, at some level, a model• Relationship of data to larger model – via sequence of
pivotal questions:• What is the question that I’m trying to answer?• What are the relevant assumptions in the linguistic
model(s) that I am adopting that provide the context for this question?
• What are the simplifying assumptions about the world that provide the observational “instrument”?
• What are the mathematical models that let me relate the observations to the linguistic model(s)?
Types of variable
1. nominal - unordered, named
2. ordinal - ordered, named
3. interval - measured on a scale without an absolute zero
4. ratio - measured on a scale with an absolute zero
Distributions for nominal variables
• Counts
how many Xs do I have?• Proportions
how many Xs do I have out of the total number of observations?
Example:• How many of the clauses tagged in the Switchboard
portion of the Bresnan et al. (2007) dataset show the PP realization of the recipient?
• What proportion of the Switchboard observations …?