statistics in science

48
Statistics in Science Statistics in Science Data can be collected about Data can be collected about a population (surveys) a population (surveys) Data can be collected about Data can be collected about a process (experimentation) a process (experimentation)

Upload: eric-gould

Post on 30-Dec-2015

46 views

Category:

Documents


3 download

DESCRIPTION

Statistics in Science. Data can be collected about a population (surveys) Data can be collected about a process (experimentation). STATISTICS!!!. The science of data. 2 types of Data. Qualitative Quantitative. Qualitative Data. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Statistics in Science

Statistics in ScienceStatistics in Science

Data can be collected about a Data can be collected about a population (surveys)population (surveys)

Data can be collected about a Data can be collected about a process (experimentation)process (experimentation)

Page 2: Statistics in Science

STATISTICS!!!STATISTICS!!!

The science of dataThe science of data

Page 3: Statistics in Science

2 types of Data2 types of Data

QualitativeQualitative

QuantitativeQuantitative

Page 4: Statistics in Science

Qualitative DataQualitative Data Information that relates to characteristics

or description (observable qualities) Information is often grouped by a

descriptive category Examples

– Species of plant– Type of insect– Shades of color– Rank of flavor in taste testingRemember: qualitative data can be “scored” and

evaluated numerically

Page 5: Statistics in Science

Qualitative data, manipulated Qualitative data, manipulated numericallynumerically

Survey results, teens and need for environmental actionSurvey results, teens and need for environmental action

Page 6: Statistics in Science

Quantitative dataQuantitative dataQuantitative – Quantitative – measuredmeasured using using

a a naturally occurringnaturally occurring numerical scale numerical scale

ExamplesExamples– Chemical concentrationChemical concentration– TemperatureTemperature– LengthLength– Weight…etc.Weight…etc.

Page 7: Statistics in Science

Quantitation Quantitation

Measurements are often displayed Measurements are often displayed graphicallygraphically

Page 8: Statistics in Science

Quantitation = MeasurementQuantitation = Measurement In data collection for Biology, data must be In data collection for Biology, data must be

measured carefully, using laboratory measured carefully, using laboratory equipment equipment

((ex. Timers, metersticks, pH meters, balances , pipettes, etc)ex. Timers, metersticks, pH meters, balances , pipettes, etc) The limits of the equipment used add The limits of the equipment used add

some uncertainty to the data collected. All some uncertainty to the data collected. All equipment has a certain magnitude of equipment has a certain magnitude of uncertainty. For example, is a ruler that is uncertainty. For example, is a ruler that is mass-produced a good measure of 1 cm? mass-produced a good measure of 1 cm? 1mm? 0.1mm?1mm? 0.1mm?

For quantitative testing, For quantitative testing, you must you must indicate the level of uncertainty of indicate the level of uncertainty of the tool that you are using for the tool that you are using for measurement!!measurement!!

Page 9: Statistics in Science

How to determine uncertainty?How to determine uncertainty? Usually the instrument manufacturer will Usually the instrument manufacturer will

indicate this – read what is provided by the indicate this – read what is provided by the manufacturer.manufacturer.

Be sure that the number of significant Be sure that the number of significant digits in the data table/graph reflects the digits in the data table/graph reflects the precision of the instrument used (for ex. If precision of the instrument used (for ex. If the manufacturer states that the accuracy the manufacturer states that the accuracy of a balance is to 0.1g – and your average of a balance is to 0.1g – and your average mass is 2.06g, be sure to round the mass is 2.06g, be sure to round the average to 2.1g) Your data must be average to 2.1g) Your data must be consistent consistent with your measurement tool with your measurement tool regarding regarding significant figuressignificant figures..

Page 10: Statistics in Science

Any lab you design for AP/IB Biology Any lab you design for AP/IB Biology must have both quantitative and must have both quantitative and qualitative dataqualitative data

Page 11: Statistics in Science

Quick Review – 3 measures of Quick Review – 3 measures of “Central Tendency”“Central Tendency”

Quantitative dataQuantitative data meanmean: : sum of data points divided by sum of data points divided by

the number of pointsthe number of points

Quantitative or qualitative dataQuantitative or qualitative data modemode: value that appears most : value that appears most

frequentlyfrequently medianmedian: When all data are listed from : When all data are listed from

least to greatest, the value at which least to greatest, the value at which half of the observations are greater, half of the observations are greater, and half are lesser. and half are lesser.    

Page 12: Statistics in Science

Comparing MeansComparing Means

Once the means are calculated for each set of data, the average values can be plotted together on a graph, to visualize the relationship between each set of data.

Page 13: Statistics in Science

Gro

wth

in m

ete

rs

Type of Trees Measured

beech maple hickory oak0

4

8

12

16

The Average Rate of Growth On Various Types of Trees

Page 14: Statistics in Science

Error BarsError Bars

Are a graphical representation of the Are a graphical representation of the variability of data.variability of data.

Page 15: Statistics in Science

Drawing error barsDrawing error bars

The simplest way to draw an error The simplest way to draw an error bar is to use the mean as the central bar is to use the mean as the central point, and to use the distance of the point, and to use the distance of the measurement that is furthest from measurement that is furthest from the average as the endpoints of the the average as the endpoints of the data bardata bar

Page 16: Statistics in Science

Average value

Value farthest from average

Calculated distance

Page 17: Statistics in Science

Gro

wth

in m

ete

rs

Type of Trees Measured

beech maple hickory oak0

4

8

12

16

The Average Rate of Growth On Various Types of Trees

Page 18: Statistics in Science

What do error bars suggest?What do error bars suggest? If the bars show extensive overlap, it If the bars show extensive overlap, it

is likely that there is is likely that there is notnot a significant a significant difference between those valuesdifference between those values

Error bars present evidence so readers can verify that the authors' reasoning is correct.

Page 19: Statistics in Science
Page 20: Statistics in Science

How can leaf lengths be displayed How can leaf lengths be displayed graphically?graphically?

Page 21: Statistics in Science

Simply measure the lengths of each and plot how Simply measure the lengths of each and plot how many are of each lengthmany are of each length

Page 22: Statistics in Science

If smoothed, the histogram data If smoothed, the histogram data assumes this shapeassumes this shape

Page 23: Statistics in Science

This Shape?This Shape?

Is a classic bell-shaped curve, AKA Is a classic bell-shaped curve, AKA Gaussian Distribution Curve, AKA a Normal Gaussian Distribution Curve, AKA a Normal Distribution curve.Distribution curve.

Essentially it means that in all studies with Essentially it means that in all studies with an adequate number of data points (>30) an adequate number of data points (>30) a significant number of results tend to be a significant number of results tend to be near the mean. Fewer results are found near the mean. Fewer results are found farther from the mean farther from the mean

Page 24: Statistics in Science

The The standard deviationstandard deviation is a is a statistic that tells you how tightly all statistic that tells you how tightly all the various examples are clustered the various examples are clustered around the mean in a set of dataaround the mean in a set of data

Page 25: Statistics in Science

Standard deviationStandard deviation

The STANDARD DEVIATION is a more The STANDARD DEVIATION is a more sophisticated indicator of the sophisticated indicator of the precision of a set of a given number precision of a set of a given number of measurementsof measurements– The standard deviation is like an The standard deviation is like an

average deviation of measurement average deviation of measurement values from the mean. The standard values from the mean. The standard deviation can be used to draw error deviation can be used to draw error bars, instead of the maximum deviation.bars, instead of the maximum deviation.

Page 26: Statistics in Science

A typical standard distribution curveA typical standard distribution curve

Page 27: Statistics in Science

According to this curve:According to this curve:

One standard deviationOne standard deviation away from the away from the mean in either direction on the mean in either direction on the horizontal axis (the red area on the horizontal axis (the red area on the preceding graph) accounts for preceding graph) accounts for somewhere around somewhere around 68 percent68 percent of the of the data in this group. data in this group.

Two standard deviationsTwo standard deviations away from the away from the mean (mean (the redthe red and and green areasgreen areas) account ) account for roughly for roughly 95 percent of the data. 95 percent of the data.

Page 28: Statistics in Science

Three Standard Deviations?Three Standard Deviations?

three standard deviations (the red, three standard deviations (the red, green and blue areas) account for green and blue areas) account for about 99 percent of the dataabout 99 percent of the data

-3sd -2sd +/-1sd 2sd +3sd

Page 29: Statistics in Science

How is Standard Deviation How is Standard Deviation calculated?calculated?

With this formula!With this formula!

Page 30: Statistics in Science

AGHHH! AGHHH!

DO I NEED TO DO I NEED TO KNOW THIS FOR KNOW THIS FOR THE TEST?????THE TEST?????

Page 31: Statistics in Science

Not the formula!Not the formula! This can be calculated on a scientific calculatorThis can be calculated on a scientific calculator OR…. In Microsoft Excel, type the following code OR…. In Microsoft Excel, type the following code

into the cell where you want the Standard into the cell where you want the Standard Deviation result, using the "unbiased," or "n-1" Deviation result, using the "unbiased," or "n-1" method: =STDEV(A1:A30) method: =STDEV(A1:A30) (substitute the cell (substitute the cell name of the first value in your dataset for A1, and name of the first value in your dataset for A1, and the cell name of the last value for A30.)the cell name of the last value for A30.)

Page 32: Statistics in Science

You DO need to know the concept You DO need to know the concept & use it in your lab reports!& use it in your lab reports!

Standard deviationStandard deviation is a statistic that tells is a statistic that tells how tightly all the various data points are how tightly all the various data points are clustered around the mean in a set of data. clustered around the mean in a set of data.

When the data points are tightly bunched When the data points are tightly bunched together and the bell-shaped curve is steep, together and the bell-shaped curve is steep, the standard deviation is small.(precise the standard deviation is small.(precise results, smaller sd)results, smaller sd)

When the data points are spread apart and When the data points are spread apart and the bell curve is relatively flat, a large the bell curve is relatively flat, a large standard deviation value suggests less standard deviation value suggests less precise resultsprecise results

Page 33: Statistics in Science

Height of bean plants in the sunlight in cm (+0.01 cm)

Height of bean plants in the shade in cm (+0.01 cm)

124 131

120 60

153 160

98 212

123 117

142 65

156 155

128 160

139 145

117 95

Total 1300 Total 1300

What is the mean for each sample?

Both are 130 cm

Now look at the variations of each sample.

The plants in the shade are more variable than the ones in the sunlight. What does this suggest?

Other factors may be influencing the growth in addition to sunlight and shade.

SD allows you to mathematically quantify the variation observed.

SD: 17.68 cm SD: 47.02 cm

Usefulness of SDUsefulness of SDLook at the data given for bean plants

Page 34: Statistics in Science

The high SD of the bean plants in the The high SD of the bean plants in the shade indicates a very wide spread shade indicates a very wide spread of data around the mean.of data around the mean.– This should make you question the This should make you question the

experimental design.experimental design.EX: The plants in the shade are growing in EX: The plants in the shade are growing in

different soil types.different soil types.

So…don’t just look at the means; So…don’t just look at the means; they don’t offer the full picture they don’t offer the full picture

Page 35: Statistics in Science

Try this question…Try this question…

The lengths of a sample of tiger canines were measured. 68% of the lengths fell within a range between 15 mm and 45 mm. The mean was 30 mm. What is the standard deviation of this sample?

15mm

Page 36: Statistics in Science

Let’s do this…Let’s do this…

Page 37: Statistics in Science
Page 38: Statistics in Science

The t-test Used to determine whether or not the

difference between 2 sets of data is a significant (real) difference.

Used to test the statistical significance between the means of two samples

When given the calculated value of t, you can use a table of t values (handout).

On the left hand column is “Degrees of Freedom”.– This is the sum of sample sizes of each group

minus 2.

Page 39: Statistics in Science

If the degrees of freedom is 9, & if the given value of t is 2.60, the table indicates that the t value is

greater than 2.26.

WHAT DOES THIS MEAN??? When you look at the bottom of the table,

you will see that the probability that chance alone could produce the result is only 5% (0.05).

This means that there is a 95% chance that the difference is significant.

Page 40: Statistics in Science

SO…

Large t-values mean little overlap between two sets of data; difference between them

Small t-values mean much overlap and probably no difference

Calculated t<critical t value = differences between data are not significant = null hypothesis not rejected

Calculated t>critical t value = differences are significant = null hypothesis rejected.

Page 41: Statistics in Science

Compare 2 groups of barnacles living Compare 2 groups of barnacles living on a rocky shore.on a rocky shore.

You are measuring the width of their shells to see if a You are measuring the width of their shells to see if a significant size difference is found depending on how close significant size difference is found depending on how close they live to the water.they live to the water.– One group lives 0-10 meters from waterOne group lives 0-10 meters from water– The other group lives 10-20 meters.The other group lives 10-20 meters.– 15 shells from each group were measured.15 shells from each group were measured.

The mean of the group closer to the water indicated that The mean of the group closer to the water indicated that living closer to the water causes the barnacles to have a living closer to the water causes the barnacles to have a larger shell.larger shell.

If the value of t is 2.25, is that a significant difference?If the value of t is 2.25, is that a significant difference?

The degree of freedom is 28. So the p =0.05, which means the probability that chance alone could produce this result is 5%.

The confidence level is 95%. So, barnacles living nearer the water have a significantly larger shell than those living 10meters or more away from the water.

Page 42: Statistics in Science

CORRELATION AND CAUSATION

Page 43: Statistics in Science

EX: Africanized Honey Bees EX: Africanized Honey Bees (AHBs)(AHBs)

These bees have migrated to the These bees have migrated to the southwestern states of the US.southwestern states of the US.

They have not migrated to the They have not migrated to the southeastern states.southeastern states.

The edge of the areas where AHBs are The edge of the areas where AHBs are found coincides with the point where there found coincides with the point where there is an annual rainfall of 55inches. is an annual rainfall of 55inches.

This seems to be a barrier to the migration This seems to be a barrier to the migration of the bees.of the bees.

This is an example of a mathematical correlation & is not evidence of a cause.

Page 44: Statistics in Science

Correlation and causeCorrelation and cause

Observations without Observations without experimentation show experimentation show correlationcorrelation

Experimentation is necessary to Experimentation is necessary to show show causecause

Page 45: Statistics in Science

Using A Mathematical Using A Mathematical Correlation TestCorrelation Test

r value is the correlationr value is the correlation Value of r can vary:Value of r can vary:

– r=1 means completely positive r=1 means completely positive correlationcorrelation

– r=-1 means completely negative r=-1 means completely negative correlationcorrelation

– r=0 means no correlationr=0 means no correlation

Page 46: Statistics in Science

Say we were trying to determine, among cormorant birds, if there is a correlation between the sizes of males & females which breed together. Data is collected and an r value of 0.88 is

determined. What does this mean? It shows a positive correlation between the

sizes of the 2 sexes.– In other words, large females mate with large males.

Page 47: Statistics in Science
Page 48: Statistics in Science

Remember Correlation is not Causation

How would cause be determined?