stats 245.3

82
Stats 245.3 Introduction to Statistical Methods

Upload: kieu

Post on 05-Jan-2016

76 views

Category:

Documents


0 download

DESCRIPTION

Stats 245.3. Introduction to Statistical Methods. Instructor:. W.H.Laverty. Office:. 235 McLean Hall. Phone:. 966-6096. Lectures:. M W F 11:30am - 12:20pm Thorv 271 Lab: W 3:30 - 4:20 Physics107. Evaluation:. Assignments, Labs, Term tests - 40% - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Stats 245.3

Stats 245.3

Introduction to Statistical Methods

Page 2: Stats 245.3

Instructor: W.H.Laverty

Office: 235 McLean Hall

Phone: 966-6096

Lectures:M W F

11:30am - 12:20pm Thorv 271Lab: M 3:30 - 4:20 Thorv 271

Evaluation:Assignments, Labs, Term tests - 40%

Every 2nd Week (approx) – Term TestFinal Examination - 60%

Page 3: Stats 245.3

Dates for midterm tests:1. Monday, Jan 23 (in the lab, 3:30pm)

2. Monday, Feb 06 (in the lab, 3:30pm)

3. Monday, Feb 27 (in the lab, 3:30pm)

4. Monday, Mar 13 (in the lab, 3:30pm)

5. Monday, Mar 27 (in the lab, 3:30pm)

6. Monday, Apr 03 (in the lab, 3:30pm)

Each test and the Final Exam are Open Book

Students are allowed to take in Notes, texts, formula sheets, calculators (No laptop computers.)

The tests and the Final Exam are multiple choice and computer marked – Students need an HB pencil and to identify their paper with their student number.

Page 4: Stats 245.3

Computer Assignments – due dates and time1. Wednesday, February 8

2. Wednesday, March 8

3. Wednesday, March 22

4. Wednesday, April 5

Computer Assignments

It is important to learn to use at least one of the powerful statistical Packages – SPSS, Minitab, S-plus, SAS, R

Very quickly statistical computations become outside the range of feasibility of simple computing devices (hand-held calculators, computer spreadsheets)

These assignments are designed to give some initial experience with these packages.

Page 5: Stats 245.3

Computer Assignments will be accepted and given a mark if they are submitted after the due date and time, however assignments that are submitted late will not be returned.

Page 6: Stats 245.3

Text•The lectures will be given in Power Point

•These will be posted on the Stats245 website

•Tables that are required will be posted on the Stats 245 website

•A text is not be required

•I will post a list books in the library can be consulted

Page 7: Stats 245.3

Alternative Texts (Available in Library) Title Author(s)

1. Statistics Informed Decision using Data Sullivan 2. Introductory Statistics Mann 3. Modern Elementary Statistics Freund 4. Elementary Statistics: A Brief version Bluman 5. Elementary Statistics Hoel 6. Statistics The Exploration and Analysis of Data Devore and Peck 7. Statistics -A first course Freund 8. Statistics -A first course Saunders, Smit, Adatia & Larson 9. Basic Statistical Concepts Bartz 10. An Introduction To Statistical Methods and

Data Analysis Ott

11. Introductory Statistics Wonnacott & Wonnacott

Page 8: Stats 245.3

To download lectures1. Go to the stats 245 web site

a) Through PAWS or

b) by going to the website of the department of Mathematics and Statistics -> people -> faculty -> W.H. Laverty -> Stats 245-> Lectures.

2. Then a) select the lecture

b) Right click and choose Save as

Page 9: Stats 245.3

To print lectures1. Open the lecture using MS Powerpoint

2. Select the menu item File -> Print

Stat 245.3

Page 10: Stats 245.3

The following dialogue box appear

Page 11: Stats 245.3

In the Print what box, select handouts

Page 12: Stats 245.3

Set Slides per page to 6 or 3.

Page 13: Stats 245.3

6 slides per page will result in the least amount of paper being printed

1 2

3 4

5 6

Page 14: Stats 245.3

3 slides per page leaves room for notes.

1

2

3

Page 15: Stats 245.3

Course Outline

Page 16: Stats 245.3

Introduction

• Populations, samples

• Variables

• Data Collection

Page 17: Stats 245.3

Exploratory Statistics

Organizing and displaying DataNumerical measures of Central Tendency and VariabilityDescribing Bivariate Data

Page 18: Stats 245.3

Probability Theory Concepts of ProbabilityRandom variables and their distributionsBinomial distribution, Normal distribution

Page 19: Stats 245.3

Inferential Statistics

Estimation, Hypotheses testingComparing SamplesAnalyzing count dataRegression and CorrelationNon-parametric Statistics

Page 20: Stats 245.3

End – Lecture 1

Page 21: Stats 245.3

Introduction

Page 22: Stats 245.3

The circular process of research:

Questions arise about a phenomenon

A decision is made to collect data

A decision is made as how to collect the

data

The data is collected

The data is summarized and

analyzed

Conclusion are drawn from the analysis

Page 23: Stats 245.3

What is Statistics?

It is the major mathematical tool of scientific inference (research) – with an interest in drawing conclusion from data.

Data that is to some extent corrupted by some component of random variation (random noise)

Page 24: Stats 245.3

Random variation or (random noise) can be defined to be the variation in the data that is not accounted for by factors considered in the analysis.

Page 25: Stats 245.3

Example

Suppose we are collecting data on

• Blood Pressure

• Height

• Weight

• Age

Page 26: Stats 245.3

Suppose we are interested in how

• Blood Pressure

is influenced by the following factors

• Height

• Weight

• Age

Page 27: Stats 245.3

Blood Pressure will not be perfectly predictable from :

• Height

• Weight

• Age

There will departures (random variation) from a perfect prediction because of other factors the could affect Blood pressure

(diet, exercise, hereditary factors)

Page 28: Stats 245.3

Another ExampleIn this example we are interested in the use of:1. antidepressants,

2. mood stabilizing medication,

3. anxiety medication,

4. stimulants and

5. sleeping pills.

The data were collected for n = 16383 cases

Page 29: Stats 245.3

In addition we are interested in how the use these medications is affected by:1. Age

20-29, 30-39,40-49, 50-59, 60-69, 70+2. Gender

Male, female 3. Education

– < Secondary,

– Secondary Grad.,

– some Post-Sec.,

– Post-Sec. Grad.

Page 30: Stats 245.3

4. Income

– Low, Low Mid, Up Mid, High5. Role

– parent, partner , worker– parent, partner– parent, worker– partner, worker– worker only– parent only– partner only– no roles

Page 31: Stats 245.3

Some questions of interest

1. How are the dependent variables (antidepressant use, mood stabilizing medication use, anxiety medication use, stimulants use, sleeping pill use) interrelated?

2. How are the dependent variables (drug use) related to the independent variables (age, gender, income, education and role)?

Page 32: Stats 245.3

• Again the relationships will not be perfect

• Because of the effects of other factors (variables) that have not been considered in the experiment

• If the data is recollected, the patterns observed at the second collection will not be exactly the same as that observed at the first collection

Page 33: Stats 245.3

The data appears in the following Excel file

Drug data

Page 34: Stats 245.3

In Statistics• Questions

– About some scientific, sociological, medical or economic phenomena

• Data– The purpose of the data is to find answers to the

questions

• Answers– Because of the random variation in the data (the

noise). Conclusions based on the data will be subject to error.

Page 35: Stats 245.3

The circular process of research:

Questions arise about a phenomenon

A decision is made to collect data

A decision is made as how to collect the

data

The data is collected

The data is summarized and

analyzed

Conclusion are drawn from the analysis

StatisticsStatistics

In what part of this process does statistics play a role?

ExperimentalDesign

Page 36: Stats 245.3

Statistical Theory is interested in

1. The design of the data collection procedures. (Experimental designs, Survey designs). The experiment can be totally lost if it is not designed correctly.

2. The techniques for analyzing the data.

Page 37: Stats 245.3

In any statistical analysis it is important to assess the magnitude of the error made by the conclusions of the analysis.

Page 38: Stats 245.3

Consider the following statement:

You can prove anything with Statistics.

Page 39: Stats 245.3

In fact:

One is unable to “prove” anything with Statistics.

Page 40: Stats 245.3

At the end of any statistical analysis there always is a possibility of an error in any of the decisions that it makes.

Page 41: Stats 245.3

The success of a research project does not depend on the its conclusions

The success of a research project depends on the accuracy of its conclusions

Page 42: Stats 245.3

If one is testing the effectiveness of a drug

There is two possible conclusions:

1. The drug is effective:

2. The drug is not effective:

Page 43: Stats 245.3

The success of a this project does not depend on the its conclusions

The success depends on the accuracy of its conclusions

Page 44: Stats 245.3

For this reason:

It is extremely important in any study to assess the accuracy of its conclusions

Page 45: Stats 245.3

Some definitions

important to Statistics

Page 46: Stats 245.3

A population:

this is the complete collection of subjects (objects) that are of interest in the study.

There may be (and frequently are) more than one in which case a major objective is that of comparison.

Page 47: Stats 245.3

A case (elementary sampling unit):

This is an individual unit (subject) of the population.

Page 48: Stats 245.3

A variable:

a measurement or type of measurement that is made on each individual case in the population.

Page 49: Stats 245.3

Types of variables Some variables may be measured on a numerical scale while others are measured on a categorical scale.

The nature of the variables has a great influence on which analysis will be used. .

Page 50: Stats 245.3

For Variables measured on a numerical scale the measurements will be numbers.

Ex: Age, Weight, Systolic Blood Pressure

For Variables measured on a categorical scale the measurements will be categories.

Ex: Sex, Religion, Heart Disease

Page 51: Stats 245.3

Note Sometimes variables can be measured on both a numerical scale and a categorical scale.

In fact, variables measured on a numerical scale can always be converted to measurements on a categorical scale.

Page 52: Stats 245.3

Example

The following variables were evaluated for a study of individuals receiving head injuries in Saskatchewan.

1. Cause of the injury (categorical)• Motor vehicle accident• Fall• Violence• other

Page 53: Stats 245.3

2. Time of year (date) (numerical or categorical)

• summer• fall• winter• spring

3. Sex on injured individual (categorical)• male• female

Page 54: Stats 245.3

4. Age (numerical or categorical)• < 10• 10-19• 20 - 29• 30 - 49 • 50 – 65• 65+

5. Mortality (categorical)• Died from injury• alive

Page 55: Stats 245.3

Types of variables

In addition some variables are labeled as dependent variables and some variables are labeled as independent variables.

Page 56: Stats 245.3

This usually depends on the objectives of the analysis.

Dependent variables are output or response variables while the independent variables are the input variables or factors.

Page 57: Stats 245.3

Usually one is interested in determining equations that describe how the dependent variables are affected by the independent variables

Page 58: Stats 245.3

Example

Suppose we are collecting data on

• Blood Pressure

• Height

• Weight

• Age

Page 59: Stats 245.3

Suppose we are interested in how

• Blood Pressure

is influenced by the following factors

• Height

• Weight

• Age

Page 60: Stats 245.3

Then

• Blood Pressure

is the dependent variable

and

• Height

• Weight

• Age

Are the independent variables

Page 61: Stats 245.3

Example – Head Injury studySuppose we are interested in how• Mortalityis influenced by the following factors• Cause of head injury• Time of year• Sex • Age

Page 62: Stats 245.3

Then• Mortalityis the dependent variableand• Cause of head injury• Time of year• Sex • AgeAre the independent variables

Page 63: Stats 245.3

dependent Response variable

independent predictor variable

Page 64: Stats 245.3

A population:

this is the complete collection of subjects (objects) that are of interest in the study.

There may be (and frequently are) more than one in which case a major objective is that of comparison.

Page 65: Stats 245.3

A case (elementary sampling unit):

This is an individual unit (subject) of the population.

Page 66: Stats 245.3

A variable: a measurement or type of measurement that is made on each individual case in the population.

Variables may be measured on a numerical scale while others are measured on a categorical scale.

Variables may be labeled as dependent variables and some variables are labeled as independent variables.

Page 67: Stats 245.3

Dependent

Dependent variables are output or response variables while the independent variables are the input variables or factors.

Independent

Page 68: Stats 245.3

A sample:

Is a subset of the population

Page 69: Stats 245.3

In statistics:

One draws conclusions about the population based on data collected from a sample

Page 70: Stats 245.3

Reasons:

Cost

It is less costly to collect data from a sample then the entire population

Accuracy

Page 71: Stats 245.3

Accuracy

Data from a sample sometimes leads to more accurate conclusions then data from the entire population

Costs saved from using a sample can be directed to obtaining more accurate observations on each case in the population

Page 72: Stats 245.3

Types of Samples

different types of samples are determined by how the sample is selected.

Page 73: Stats 245.3

Convenience Samples

In a convenience sample the subjects that are most convenient to the researcher are selected as objects in the sample.

This is not a very good procedure for inferential Statistical Analysis but is useful for exploratory preliminary work.

Page 74: Stats 245.3

Quota samples

In quota samples subjects are chosen conveniently until quotas are met for different subgroups of the population.

This also is useful for exploratory preliminary work.

Page 75: Stats 245.3

Random Samples

Random samples of a given size are selected in such that all possible samples of that size have the same probability of being selected.

Page 76: Stats 245.3

Convenience Samples and Quota samples are useful for preliminary studies. It is however difficult to assess the accuracy of estimates based on this type of sampling scheme.

Sometimes however one has to be satisfied with a convenience sample and assume that it is equivalent to a random sampling procedure

Page 77: Stats 245.3

Population

Sample

Case×

Variables

X

Y

Z

Page 78: Stats 245.3

Some other definitions

Page 79: Stats 245.3

A population statistic (parameter):

Any quantity computed from the values of variables for the entire population.

Page 80: Stats 245.3

A sample statistic:

Any quantity computed from the values of variables for the cases in the sample.

Page 81: Stats 245.3

Since only cases from the sample are observed

– only sample statistics are computed– These are used to make inferences about

population statistics– It is important to be able to assess the accuracy

of these inferences

Page 82: Stats 245.3

Organizing Datathe next topic