9/15/2015330 lecture 11 stats 330: lecture 1. 9/15/2015330 lecture 12 today’s agenda: introductory...

23
06/23/22 330 Lecture 1 1 STATS 330: Lecture 1

Upload: joel-lamb

Post on 16-Jan-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 9/15/2015330 Lecture 11 STATS 330: Lecture 1. 9/15/2015330 Lecture 12 Today’s agenda: Introductory Comments: –Housekeeping –Computer details –Plan of

04/21/23 330 Lecture 1 1

STATS 330: Lecture 1

Page 2: 9/15/2015330 Lecture 11 STATS 330: Lecture 1. 9/15/2015330 Lecture 12 Today’s agenda: Introductory Comments: –Housekeeping –Computer details –Plan of

04/21/23 330 Lecture 1 2

Today’s agenda:• Introductory Comments:

– Housekeeping– Computer details– Plan of the course

• Statistical Modelling: an overview– Our Analysis strategy

– Goals for the course

• Role of Graphics• Data cleaning

Page 3: 9/15/2015330 Lecture 11 STATS 330: Lecture 1. 9/15/2015330 Lecture 12 Today’s agenda: Introductory Comments: –Housekeeping –Computer details –Plan of

04/21/23 330 Lecture 1 3

HousekeepingContact details….

Plus much else on course web page

www.stat.auckland.ac.nz/~lee/330/

Or via Cecil

Page 4: 9/15/2015330 Lecture 11 STATS 330: Lecture 1. 9/15/2015330 Lecture 12 Today’s agenda: Introductory Comments: –Housekeeping –Computer details –Plan of

04/21/23 330 Lecture 1 4

Page 5: 9/15/2015330 Lecture 11 STATS 330: Lecture 1. 9/15/2015330 Lecture 12 Today’s agenda: Introductory Comments: –Housekeeping –Computer details –Plan of

04/21/23 330 Lecture 1 5

Assignments• There will be five assignments

• (20% of total grade)

• The due dates are listed in the course summary

• Assignment 1 is due on August 2.

• No paper will be issued in class – download assignments and data from the Web or Cecil

Page 6: 9/15/2015330 Lecture 11 STATS 330: Lecture 1. 9/15/2015330 Lecture 12 Today’s agenda: Introductory Comments: –Housekeeping –Computer details –Plan of

04/21/23 330 Lecture 1 6

Tutorials• These will cover computing details

• Held in basement tutorial lab, 303S Extension (Rm 303S-B75)– Tutorial 1: 11-12 Wed– Tutorial 2: 10-11 Fri– Tutorial 3: 2-3 Fri

• Start second week of semester

Page 7: 9/15/2015330 Lecture 11 STATS 330: Lecture 1. 9/15/2015330 Lecture 12 Today’s agenda: Introductory Comments: –Housekeeping –Computer details –Plan of

04/21/23 330 Lecture 1 7

Computer details• All analyses will be done using R• I require homework to be typed, use Word, cut

and paste R text and graphics into Word• Use home/laptop computer or basement lab

(303S)• See web page for info on downloading R330

package• URL is www.stat.auckland.ac.nz/~lee/330

Page 8: 9/15/2015330 Lecture 11 STATS 330: Lecture 1. 9/15/2015330 Lecture 12 Today’s agenda: Introductory Comments: –Housekeeping –Computer details –Plan of

04/21/23 330 Lecture 1 8

Course Plan• There will be 33 lectures, divided into

chapters as follows• Chapter 1: Introduction (1 lecture, this one!)• Chapter 2: Graphics (3 lectures)• Chapter 3: Multiple Regression Model (12 lectures)• Chapter 4: Factors (4 lectures)• Chapter 5: Models for categorical and count

responses (12 lectures)• Revision (1 lecture)

Page 9: 9/15/2015330 Lecture 11 STATS 330: Lecture 1. 9/15/2015330 Lecture 12 Today’s agenda: Introductory Comments: –Housekeeping –Computer details –Plan of

04/21/23 330 Lecture 1 9

Course book• This covers most of the material we will

discuss in the lectures

• Has 5 chapters corresponding to the division on the previous slide

• On-line version available on the course web site

• Paper copies available from the Statistics Department, Commerce A

Page 10: 9/15/2015330 Lecture 11 STATS 330: Lecture 1. 9/15/2015330 Lecture 12 Today’s agenda: Introductory Comments: –Housekeeping –Computer details –Plan of

04/21/23 330 Lecture 1 10

Overview of Statistical Modelling

• Statistical models summarize relationships between variables

• Regression models focus on one variable (the response) and how its distribution can be modelled by one or more explanatory variables

Page 11: 9/15/2015330 Lecture 11 STATS 330: Lecture 1. 9/15/2015330 Lecture 12 Today’s agenda: Introductory Comments: –Housekeeping –Computer details –Plan of

04/21/23 330 Lecture 1 11

Example: Response is BPD

BPD: Bi-Parietal Diameter

For an unborn baby, depends on several factors,

including Gestational Age.

Page 12: 9/15/2015330 Lecture 11 STATS 330: Lecture 1. 9/15/2015330 Lecture 12 Today’s agenda: Introductory Comments: –Housekeeping –Computer details –Plan of

Ultrasound image

04/21/23 330 Lecture 1 12

Page 13: 9/15/2015330 Lecture 11 STATS 330: Lecture 1. 9/15/2015330 Lecture 12 Today’s agenda: Introductory Comments: –Housekeeping –Computer details –Plan of

04/21/23 330 Lecture 1 13

Points to take into account:

• Not all babies of the same age have the same BPD.

• In general, the older the baby, the bigger the BPD.

Page 14: 9/15/2015330 Lecture 11 STATS 330: Lecture 1. 9/15/2015330 Lecture 12 Today’s agenda: Introductory Comments: –Housekeeping –Computer details –Plan of

04/21/23 330 Lecture 1 14

Model relating BPD and GA

Gestational Age

BPD

Mean BPD for 30 weeks = + 30

Mean BPD for 40 weeks = a + 40

Mean BPD for GA weeks = a + GA

Line is BPD =a + b GA

Each bell curve has sd

10 mm

30 40

Page 15: 9/15/2015330 Lecture 11 STATS 330: Lecture 1. 9/15/2015330 Lecture 12 Today’s agenda: Introductory Comments: –Housekeeping –Computer details –Plan of

04/21/23 330 Lecture 1 15

Regression model features

• Model specifies the distribution of the response

• Also says how the distribution of the response is affected by the covariates (explanatory variables)

• In this case the covariate GA determines the mean response:

mean BPD = a + b GAie a straight-line relationship.

Page 16: 9/15/2015330 Lecture 11 STATS 330: Lecture 1. 9/15/2015330 Lecture 12 Today’s agenda: Introductory Comments: –Housekeeping –Computer details –Plan of

04/21/23 330 Lecture 1 16

In general….• Response has a distribution depending on

(conditional on) the explanatory variables

• Mean of the distribution given by some function of the explanatory variables

• Need to – describe the function– describe the variability

• Balance accuracy with simplicity.

Page 17: 9/15/2015330 Lecture 11 STATS 330: Lecture 1. 9/15/2015330 Lecture 12 Today’s agenda: Introductory Comments: –Housekeeping –Computer details –Plan of

04/21/23 330 Lecture 1 17

Our Analysis strategy:

• Explore the data using graphics and summary statistics

• Construct a useful model (trial and error)

• Use the model to gain knowledge about the system under study (How big should a 40 week baby’s BPD be? )

• Communicate findings (be able to write a report!!)

Page 18: 9/15/2015330 Lecture 11 STATS 330: Lecture 1. 9/15/2015330 Lecture 12 Today’s agenda: Introductory Comments: –Housekeeping –Computer details –Plan of

04/21/23 330 Lecture 1 18

Goals for 330• Get more practice in exploring data• Expand knowledge of regression• Get better at fitting models• Improve your diagnostic skills (how to

recognize when a model doesn’t describe the data properly)

• Improve your interpretation and communication skills, in a more flexible way.

Page 19: 9/15/2015330 Lecture 11 STATS 330: Lecture 1. 9/15/2015330 Lecture 12 Today’s agenda: Introductory Comments: –Housekeeping –Computer details –Plan of

04/21/23 330 Lecture 1 19

Role of Graphics• Data Cleaning: Are there gross errors, outliers,

special codings (eg 999 for missing), missing data, absolute rubbish in the data?

• Exploratory analysis: what sort of model might be appropriate?

• Diagnostics: having tentatively selected a model, is it any good? (Residual plots, etc)

Page 20: 9/15/2015330 Lecture 11 STATS 330: Lecture 1. 9/15/2015330 Lecture 12 Today’s agenda: Introductory Comments: –Housekeeping –Computer details –Plan of

04/21/23 330 Lecture 1 20

Exploratory analysis for BPD data

25 30 35 40

70

75

80

85

90

95

10

0

GA

BP

D Outlier

Linear relationship

(straight line)appropriate?

Page 21: 9/15/2015330 Lecture 11 STATS 330: Lecture 1. 9/15/2015330 Lecture 12 Today’s agenda: Introductory Comments: –Housekeeping –Computer details –Plan of

04/21/23 330 Lecture 1 21

Diagnostic Graphics: residuals versus fitted values

70 75 80 85 90-20

-15

-10

-50

5

Fitted Values

Re

sid

ua

lsSmoothing enhances

interpretation.Conclusion?

Outlier stands out

Page 22: 9/15/2015330 Lecture 11 STATS 330: Lecture 1. 9/15/2015330 Lecture 12 Today’s agenda: Introductory Comments: –Housekeeping –Computer details –Plan of

04/21/23 330 Lecture 1 22

Data Cleaning• All real-life data sets are likely to contain errors• These will usually be revealed by suitable plots,

such as the one on the previous slide• Before starting any analysis, data should always

be carefully checked• To encourage this practice, I will occasionally

introduce errors into the assignment data supplied on the web.

Page 23: 9/15/2015330 Lecture 11 STATS 330: Lecture 1. 9/15/2015330 Lecture 12 Today’s agenda: Introductory Comments: –Housekeeping –Computer details –Plan of

04/21/23 330 Lecture 1 23

Data Cleaning (2)It is your responsibility to check all data supplied on the web, against the assignment sheets . Failure to so will cost marks.

YOU HAVE

BEEN WARNED!!!