analyzing measurement data

23
Engineering 1811.01 College of Engineering Engineering Education Innovation Center Analyzing Measurement Data Rev: 20130604, MC Analyzing Data 1

Upload: sean-francis

Post on 02-Jan-2016

17 views

Category:

Documents


1 download

DESCRIPTION

Analyzing Measurement Data. Example. Prediction: I f a spring on the slingshot were pulled back 1m , the softball will land a distance of 17m downrange To confirm prediction, data is collected from 20 trials. Example. Most values fall between 14 and 20 m. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Analyzing Measurement Data

• Engineering 1811.01

Analyzing Data 1

College of EngineeringEngineering Education Innovation Center

Analyzing Measurement Data

Rev: 20130604, MC

Page 2: Analyzing Measurement Data

• Engineering 1811.01

Analyzing Data 2

Example

Prediction: If a spring on the slingshot were pulled back 1m, the softball will land a distance of 17m downrange

To confirm prediction, data is collected from 20 trials.

Rev: 20120103, AM

.1 Brockman ,  Jay  B..  Data Analysis ∧ Empirical Models .   Introduction   to  Engineering :  Modeling   and  Problem  Solving .  Hoboken ,  NJ : John   Wiley  & Sons ,  Inc., 2009. 226−228.  Print .

Page 3: Analyzing Measurement Data

• Engineering 1811.01

Analyzing Data

Example

Rev: 20120103, AM 3

• Most values fall between 14 and 20 m.

• This data contains an outlier of 45.2 m.

.1 Brockman ,  Jay  B..  Data Analysis ∧ Empirical Models .   Introduction   to  Engineering :  Modeling   and  Problem  Solving .  Hoboken ,  NJ : John   Wiley  & Sons ,  Inc., 2009. 226−228.  Print .

Page 4: Analyzing Measurement Data

• Engineering 1811.01

Analyzing Data 4

Represent the Data with a Histogram• First, determine an appropriate bin size.• The bin size [k] can be assigned directly or can be calculated from a

suggested number of bins [h]:• Let’s try the most commonly

used formula first:

Rev: 20120103, AM

If you have this many data points [n]

Use this number of bins [h]

Less than 50 5 to 7

50 to 99 6 to 10

100 to 250 7 to 12

More than 250 10 to 20

= 4.43 ≈ 5

Page 5: Analyzing Measurement Data

• Engineering 1811.01

Analyzing Data 5

Histogram - Example

0-19 19-24 24-29 29-34 34-39 39-44 44-4902468

1012141618

Slingshot Data

Bin

Fre

qu

ency

Rev: 20120103, AM

Is this the best way to represent this data?By changing our bin size, [k], we can improve the representation.

Bin Size Frequency

0-19 17

19-24 2

24-29 0

29-34 0

34-39 0

39-44 0

44-49 1

Page 6: Analyzing Measurement Data

• Engineering 1811.01

Analyzing Data 6

Histogram - Example

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

46

0

1

2

3

4

5

6

7Slingshot Data

Bin

Fre

qu

ency

Rev: 20120103, AM

All 3 histograms represent the exact same data set, but the bin width and number of bins for the two shown above were selected manually.

Which one is most descriptive?

14-15 15-16 16-17 17-18 18-19 19-20 >200

1

2

3

4

5

6

7

Slingshot Data

Bin

Fre

qu

ency

Page 7: Analyzing Measurement Data

• Engineering 1811.01

Analyzing Data 7

Dealing with outliers

• Engineers must carefully consider any outliers when analyzing data.

• It is up to the engineer to determine whether the outlier is a valid data point or if it is invalid and should be discarded.

• Invalid data points can result from measurement errors or recording the data incorrectly.

Rev: 20120103, AM

Page 8: Analyzing Measurement Data

• Engineering 1811.01

Analyzing Data 8

Characterizing the data

• Statistics allows us to characterize the data numerically as well as graphically.

• We characterize data in two ways: – Central Tendency – Variation

Rev: 20120103, AM

Page 9: Analyzing Measurement Data

• Engineering 1811.01

Analyzing Data 9

Central Tendency (Expected Value)

• Central tendency is a single value that best represents the data.

• But which number do we choose? • Mean• Median• Mode

– Note: For most engineering applications, mean and median are most relevant.

Rev: 20120103, AM

Page 10: Analyzing Measurement Data

• Engineering 1811.01

Analyzing Data 10

Central Tendency - Mean

Rev: 20120103, AM

𝑀𝑒𝑎𝑛=∑ 𝑥n

=369.320

=18.47

Is the mean value a good depiction of the data?How does the outlier affect the mean?

Page 11: Analyzing Measurement Data

• Engineering 1811.01

Analyzing Data 11

Central Tendency - MeanProblem: Outliers may decrease the usefulness of the mean as a central value. Observe how outliers can affect the mean for this simple data set:

Rev: 20120103, AM

3 7 12 17 21 21 23 27 32 36 44-112 212

Without outliersChanging 3 to -112

Outlier: -112Changing 44 to 212

Outlier: 212

Solution: Look at the median.

Page 12: Analyzing Measurement Data

• Engineering 1811.01

Analyzing Data 12

Central Tendency - Median

Rev: 20120103, AM

n = 20 even number of data points. Must take the average of the 2 middle values

In this case, the 2 middle values are both 17.4

Which value looks like a better representation of the data? Mean (18.47) or median (17.4)? Why?

Page 13: Analyzing Measurement Data

• Engineering 1811.01

Analyzing Data 13

Central Tendency Median

Rev: 20120103, AM

Using the simple data set, observe how the median reduces the impact of outliers on the central tendency.

3 7 12 17 21 21 23 27 32 36 44

-112 7 12 17 21 21 23 27 32 36 212

Median = 21

Median = 21

Page 14: Analyzing Measurement Data

• Engineering 1811.01

Analyzing Data 14

0 5 10 15 20 2510

15

20

25

30

35

40

45

50

Slingshot Distance Testing

Trial Number

Dis

tan

ce T

rave

ld [

m]

Central Tendency – Mean and Median

Which value,

the mean (18.47 m) or the median (17.4) is a better representation of the data?

Rev: 20120103, AM

Page 15: Analyzing Measurement Data

• Engineering 1811.01

Analyzing Data 15

Characterizing the data

• We can select a value of central tendency to represent the data, but is one number enough?

• It is also important to know how much variation there is in the data set.

• Variation refers to how the data is distributed around the central tendency value.

Rev: 20120103, AM

Page 16: Analyzing Measurement Data

• Engineering 1811.01

Analyzing Data 16

Variation

• As with central tendency, there are multiple ways to represent the variation of a set of data. • ± (“Plus, Minus”) gives the range of the values.• Standard Deviation provides a more

sophisticated look at how the data is distributed around the central value.

Rev: 20120103, AM

Page 17: Analyzing Measurement Data

• Engineering 1811.01

Analyzing Data 17

Variation - Standard Deviation

Definition: how closely the values cluster around the mean; how much

variation there is in the data

Equation:

Rev: 20120103, AM

Page 18: Analyzing Measurement Data

• Engineering 1811.01

18

Standard Deviation Example

Rev: 20130604, MC Analyzing Data

mean = ∑ =

𝜎=√41.32 𝜎=6.4281

Page 19: Analyzing Measurement Data

• Engineering 1811.01

Analyzing Data 19

Standard Deviation: Interpretation

Rev: 20120103, AM

These curves describe the distribution of students’ exam grades. The average value is an 83%.

Which class would you rather be in?

Curve B

Curve A

A B

Page 20: Analyzing Measurement Data

• Engineering 1811.01

Analyzing Data 20

• Data that is normally distributed occurs with greatest frequency around the mean.

• Normal distributions are also frequently referred to as Gaussian distributions or bell curves

Normal Distribution

Rev: 20120103, AM

Fre

quen

cy

Bins

0 1 2 3 4 5-1-2-3-4-5

mean

Page 21: Analyzing Measurement Data

• Engineering 1811.01

Analyzing Data 21

Normal Distribution

Rev: 20120103, AM

Mean = Median = Mode

- 68% of values fall within 1 SD

- 95% of values fall within 2 SDs

Page 22: Analyzing Measurement Data

• Engineering 1811.01

Analyzing Data 22

Other Distributions

Rev: 20120103, AM

Skewed distributions:

Multimodal distribution: Uniform distribution:

Page 23: Analyzing Measurement Data

• Engineering 1811.01

Analyzing Data 23

What we’ve learned

• This lecture has introduced some basic statistical tools that engineers use to analyze data.

• Histograms are used to represent data graphically.

• Engineers use both central tendency and variation to numerically describe data.

Rev: 20120103, AM