tma 4255 applied statistics - ntnu · 2 about the course learning outcome the objective of the...

33
1 TMA 4255 Applied Statistics Spring 2010

Upload: others

Post on 18-Feb-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

1

TMA 4255Applied Statistics

Spring 2010

2

About the course

Learning outcomeThe objective of the course is to give the students a solid foundation for use of basic statistical methods in science and technology. In addition the students shall be capable of planning collection of data and to use statistical software for analysing data.

Learning methods and activitiesLectures and exercises with the use of a computer (computing programme MINITAB). The lectures may be given in English. Portfolio assessment is the basis for the grade awarded in the course. This portfolio comprises a written final examination 80% and selected parts of the exercises 20%. The results for the constituent parts are to be given in %-points, while the grade for the whole portfolio (course grade) is given by the letter grading system. Retake of examination may be given as an oral examination.

Compulsory assignmentsExercises

Recommended previous knowledgeThe course is based on ST0103 Statistics with Applications/TMA4240 Statistics/TMA4245 Statistics, or equivalent.

3

Contents (preliminary list)Hypotheses testing, simple and multiple linear regression, residual plots and selection of variables, transformations, design of experiments, 2^k experiments and fractions of these. Special designs. Graphical methods. Error propagation formula. Analysis of variance, statistical process control, contingency tables and nonparametric methods. Use of statistical computer package, MINITAB.

LecturerProfessor Bo Lindqvist, Room 1129, Sentralbygg II, NTNU GløshaugenTelephone: (735) 93532 Email: [email protected]

Teaching assistantStipendiat Håkon Toftaker, Room 1036, Sentralbygg II, NTNU GløshaugenTelephone (735) 91681Email: [email protected]

4

Teaching material

Main book:Walpole, Myers, Myers and Ye: "Probability and Statistics for Engineers and Scientists". Eighth Edition. Pearson International Edition.

Tables:”Tabeller og formler i statistikk”, 2. utgave. Tapir 2009.

MINITAB:Information is found on http://www.ntnu.no/adm/it/brukerstotte/programvare/minitab.

5

Weekly meetings

LecturesTuesdays 12.15 – 14.00 H3Thursdays 12.15 – 14.00 S4

ExercisesMondays 17.15 – 18.00 in H3

or a computer lab

6

Week Topic Chapter (WMMY) Exercise

2Introduction, motivation and repetition.

Descriptive measures and graphs. Normal plot.

(1-10) Particularly 8.1-8.7 1

3Two-sample case. Comparing variances.

F-distribution. Simple linear regression,

8.8, 9.8, 9.13, 10.8, 10.13, 10.18.(11.1-11.5) 11.6 – 11.12 2-3

4-5 Multiple linear regression 12.1 -12.7, 12.9-12.11 4-5

6-8 2k experiment and fractions thereof BHH 10 and 12 Alternatively, WMMY 15 6-8

9-10 Analysis of variance 13.1-13.4,13.6,13.8-13.10,13.13,13.15 14.1-14.4 9-10

11 Statistical process control 17.1-17.5 11

12 Chi-square tests and Contingency tables 10.14-10.16 12

13-14 (Tuesd) Easter vacation

14(Thursd)-15 Nonparametric statistics 16.1-16.3 12

16 Approximation of expectation and variance

17(Tuesd) Repetition

Preliminary curriculum, lecturing and progress plan

7

8

9

10

11

12

13

14

15

The compulsory project: Example

16

17

18

Introduction to course

TMA 4240/45 Statistics and ST 0103: Probability theory + simple statistics

TMA 4255 (this course):A little probability + APPLICABLE and APPLIED statistics

The ”classical” statistical methods:

• Regression analysis• Design of experiments• Analysis of variance (ANOVA)• Analysis of discrete data (contingency tables)• Nonparametric methods

19

Why is statistics important in science and industry?

The book emphasizes ”the Japanese industrial miracle”:

•Use of statistical methods in design and production•Statistical thinking in all parts of the production

In 2000, the highly reputated international medical journal New England Journal of Medicine appointed

•Use of statistical methods

as one of the 11 most important medical advances throughouttime

20

21

Originally: Statistics = ”collection and presentation of data”

Today much more:

•Design and collection of data from statistical investigations

•Modeling of the stochastic mechanisms behind the data

•Drawing conclusions about these mechanisms, based on the data

•Evaluation of the strength of the conclusions (variance, confidence interval, test power)

•Basic tool: Probability theory

22

Statistical investigations can be divided into two maintypes:

•Experimental studies based on design of experiments (DOE): Experiments are done under controlled conditions.

•Observational studies: When control of conditions are not possible.

23

Statistical studiesEksperimental studies:Clinical trialsComparison of drugs A og BTrial group of n persons

r drawn at random aregiven A

s drawn at random aregiven B

n-r-s (rest) get Placebo

Blind test: Patient does not know kind of drug

Double blind: Examiningdoctor does not know either

Observational studies:Epidemiological experiments• Smoking and cancer• Diet and coronary diseases

Cannot control the conditions; e.g. cannot force people to smoke/not smoke.

Difficulty in interpretation: May be unknown underlying causeswhich make the results biased(”confounding”). For example: A gene which increases theneed for smoking, and at thesame time influences chanceof getting cancer.

24

Statistics in scientific investigations:

•Generate hypotheses

•Derive consequences

•See whether these are fulfilled in observations

•Generate new hypotheses

•Etc.

25

Particular matters for statistics in industry:

Product and process engineer:

Off-line: Controlled experiments aiming at optimizing production

Production: Register production data; use them to controlproduction.

26

MINITAB 15: Rocket fuel exampleStat > Basic Statistics > Display Descriptive Statistics

27

Stat > Basic Statistics > Graphical Summary

42403836

Median

Mean

42,041,541,040,540,039,539,0

1st Q uartile 39,275Median 41,0003rd Q uartile 42,000Maximum 42,600

39,136 41,984

39,266 42,037

1,369 3,634

A -Squared 0,59P-V alue 0,092

Mean 40,560StDev 1,991V ariance 3,963Skewness -1,55360Kurtosis 2,72795N 10

Minimum 35,900

A nderson-Darling Normality Test

95% C onfidence Interv al for Mean

95% C onfidence Interv al for Median

95% C onfidence Interv al for StDev95% Confidence Intervals

Summary for X

28

More plots: Rocket fuel data

42,341,440,539,638,737,836,936,0X

Dotplot of X

47,545,042,540,037,535,0

99

95

90

80

70

60504030

20

10

5

1

X

Perc

ent

Mean 40,56StDev 1,991N 10AD 0,589P-Value 0,092

Probability Plot of XNormal - 95% CI

45,042,540,037,535,0

100

80

60

40

20

0

X

Perc

ent

Mean 40,56StDev 1,991N 10

Empirical CDF of XNormal

29

30

31

Statistical inference with MINITAB: Rocket fuel data

Stat > Basic Statistics > 1-Sample ZAssume known standard deviation sigma=2

One-Sample Z: X

Test of mu = 40 vs not = 40The assumed standard deviation = 2

Variable N Mean StDev SE Mean 95% CI Z PX 10 40,560 1,991 0,632 (39,320; 41,800) 0,89 0,376

One-Sample Z: X

Test of mu = 40 vs > 40The assumed standard deviation = 2

95% LowerVariable N Mean StDev SE Mean Bound Z PX 10 40,560 1,991 0,632 39,520 0,89 0,188

32

Stat > Basic Statistics > 1-Sample tAssume unknown standard deviation sigma

One-Sample T: X

Test of mu = 40 vs not = 40

Variable N Mean StDev SE Mean 95% CI T PX 10 40,560 1,991 0,629 (39,136; 41,984) 0,89 0,397

One-Sample T: X

Test of mu = 40 vs > 40

95% LowerVariable N Mean StDev SE Mean Bound T PX 10 40,560 1,991 0,629 39,406 0,89 0,198

33