topic 1- statistical analysis
DESCRIPTION
Topic 1- Statistical Analysis. Why?. The scientific method involves making observations and collecting measurable data. When measuring data from a sample, the sample must be representative of the entire population of that sample - PowerPoint PPT PresentationTRANSCRIPT
Series1
Topic 1- Statistical Analysis
Why?O The scientific method involves
making observations and collecting measurable data.
O When measuring data from a sample, the sample must be representative of the entire population of that sample
O Statistics allows us to sample small populations and draw conclusions of the larger populations
Why?O It allows us to measure differences
and relationships between sets of data.
O All conclusions drawn from an experiment have a certain level of confidence, but nothing in science is 100% certain.
What is a representative sample?
O A small group whose characteristics accurately reflect those of the larger population from which it is drawn.
O A representative sample is needed in order to make more accurate generalizations of the larger population
O Example: If approximately 15% of the United States’ population is of Hispanic descent, a sample of 100 Americans also ought to include around 15 Hispanic people to be representative.
How do we get a representative sample?
O Avoiding selection bias- when sampling is not representative as a result of convenience sampling (using just mpsj students) , undercoverage (not targeting a specific group of a population), judgement sampling (targeting individuals you pre-assume to fit a criteria) and non-response (people choose not to complete the experiment)
O Larger sample sizes- ensures the sample is more similar to the original population
O Random Sampling- selecting individuals from random areas ,times or with different methods
O This results in better data collection quality and experimenter bias or placebo effect
Reliable and valid dataO reliability is used to describe the overall
consistency of a measure. A measure is said to have a high reliability if it produces similar results under consistent conditions. For example, measurements of people’s height and weight are often extremely reliable.
O validity is the extent to which a concept,
conclusion or measurement is well-founded and corresponds accurately to the real world. “You are measuring what you’re supposed to measure”
RangeO Measures the spread of dataO The difference between the largest
and smallest observed valuesO If one data point is unusually
large/small, it has a great effect on the range and is called an outlier (Outliers can often indicate an error in the experiment and are often eliminated).
AveragesO Averages are the central tendencies
of the data. There are three types;O Mean- sum of all the results divided
by the number of resultsO Median- the middle value of a range
of resultsO Mode: the value that appears the
greatest number of times
ExampleO Find the mean, median and mode of the
following data setO 1, 2, 2, 5, 6, 7, 11, 11, 11, 12
O Mean- = 6.8O Median- = 6.5O Mode-11
O When no numbers repeat then you do not have a mode
O If the mean, median and mode are all approximately the same then we can assume a normal distribution
AveragesO Averages do not tell us everything
about a sample.O May not be representative of the
entire populationO Two samples of a populations could
be different from one another. Bound to have natural variation
Sample 1
Sample 2
5 round1 oval
2 round3 oval
Standard DeviationO Samples can be very uniform-
bunched around the mean or spread out a long way from the mean
O The statistic that measures this spread is called the standard deviation
Standard deviationO A measure of how the individual data
points are distributed around the meanO Allows us to compare the means\spread
of data between two or more samplesO Tells us how tightly the data points are
clustered around the mean and therefore how many outliers there are in the data.
O When the data points are clustered, the SD is very small and when spread apart the SD is large
Standard deviation and error bars
O A graphical representation of variability
O Can be used to show range of data or SD
O In design labs, students often use their SD to represent their error bars on their graphs
O A large SD indicates large error or non-valid results
ExampleO Calculate the SD of a sample- Four
children are aged 5; 6; 8 and 9.O Step 1: find the mean x= O x1= 5, x2=6, x3=8, x4=9 and N(population
=4)
O x= (x1 + x2 + x3 + x4)O x=7
O Step 2 Find the SD σ:O σ =
O σ =
O σ = (5-7)2 + (6-7)2 + (8-7)2 + (9-7)2
O σ= 1.58
O Therefore the average age of the children is 7
DistributionO Consider a population of bean plants with a mean
height of 7cmO Normal Distribution- A spread of data that is
equally distributed before and after the meanO A flat bell curve- data widely spreadO A tall and narrow curve- data is very close to the
meanO Standard normal curve- 68% of all values lie within
+/- 1 SD from the mean and 95% of all values lie within +/- 2 SD from the mean
O As the distribution of a bell curve changes the SD value will change to account for the 68% and 95% of the data set.
= 68% or +/- 1
The t-testO To assess whether the means of two
groups are statistically different from each other
O Used when you want to compare the means of two groups
O Ex. Is there a statistical difference in the mean height between a group of boys and girls at the age of 12?
The t-test• Notice that all three examples below have the
same difference between means• Yet they all tell different stories. They all have
different variability.• The two groups with low variability from their
mean are visibly most different from each other and the groups with high variability are most similar to each other
T-testO We can judge the difference between
means relative to their spread or variability using the t-test
O The formula is a ratio;
The formula
ExampleO Problem: Sam Sleepresearcher hypothesizes
that people who are allowed to sleep for only four hours will score significantly lower than people who are allowed to sleep for eight hours on a cognitive skills test. He brings 8 participants into his sleep lab and randomly assigns them to one of two groups. In one group he has participants sleep for eight hours and in the other group he has them sleep for four. The morning after he administers the SCAT (Sam's Cognitive Ability Test) to the participants. (Scores on the SCAT range from 1-9 with high scores representing better performance).
SCAT scores
8 hours sleep group (X)
5 7 5 3 5 3 3 9
4 hours sleep group (Y)
8 1 4 6 6 4 1 2
Step 1- calculate degree of freedomDf (paired t-test)= sample size-1Df (unpaired) = n1+n2 - 2
Mx8hours= 5My4hours= 4SD8=4.571SD4=6.571
N8hours=8 and n4=8
Step 1: Find the means for both groups and subtract
Step 2: Calculate the variance (SD2)Step 3: Divide each variance by the sample
sizeStep 4: Square root the denominator
Step 3- use t-tableO Once the t-value is calculated you look it up in
a table of significance to see whether the ratio (t-value) is large enough to say that the difference between the groups is not likely due to chance
O Statisticians like to be 95% confident that their conclusions are significant. So we use the risk value or pvalue of p<0.05. Differences are due to chance 5% of the time vs. p=0.1 where error occurs 10% of the time
O If p>0.05, this indicates the means are not statistically different
O according to the t sig/probability table with df =n-1= 7, t must be at least 1.895 to be significant
O since our t=0.847 and therefore p>0.05, (it would fall at a lower confidence level between .25 and .1) this difference is not statistically significant
Correlation vs. Causation
O “correlation does not imply causation”- means that correlation cannot be used to infer a causal relationships, but rather that the causes underlying the correlation may be indirect or unknown
O Cause: a carefully designed experiment and its evidence can determine that A causes B
O Correlation: observations, without a controlled experiment, can only show that A and B are related
Fallacy ExamplesO Ice cream sales correlate with the number
of people who drown at sea. Therefore ice cream causes people to drown.
O Children who sleep with a light on are more likely to develop myopia (nearsightedness)O Does light cause myopia?
O Atmospheric CO2 has been climbing in conjunction with increased crimeO Does CO2 cause crime?
O A mathematical correlation test produces a value r, which signifies the correlation between two eventsO r+1 positive correlation (as X
increases so does Y)O r =0 no correlationO r -1 negative correlation (as X
increases Y decreases)
Accuracy & PrecisionO Accuracy: how close a measured
value is to the true valueO Precision: how close the measured
values are to each other
Errors and Uncertainties
O Examples:O Human errors- can occur when tools
or instruments are used or read incorrectly. (E.g a thermometer reading must be taken after stirring and the bulb still in the liquid but not touching the bottom)
O Systematic- experimenter does not know how to use the equipment or something wrong with equipment.
O Random – unknown or unpredictable changes
SystematicO Note that systematic and random errors refer to
problems associated with making measurements. Mistakes made in the calculations or in reading the instrument are not considered in error analysis. It is assumed that the experimenters are careful and competent! (Not acceptable in your design lab)O Can be reduced if equipment is regularly checked
or calibrated to ensure proper functionO Procedural systematic errors are acceptable. I.e.
identifying a problem with your procedure/controls.
RandomO Random errors are statistical fluctuations
(in either direction) in the measured data due to the precision limitations of the measurement device. Random errors usually result from the experimenter's inability to take the same measurement in exactly the same way to get exact the same number.
O In biology this can be a result of changes in the materials used, changes in conditions
O Controlled by carefully selecting material and careful control variables and repeating trials
Uncertainties & Significant Figures
O Uncertainties – used in biology since they are the best choice for quantitative lab work
O Sig Figs- are useful when doing calculations from a textbook and you do not know the accuracy of the measuring device.
O They are mutually exclusive systems…you use one of the other!
Things to RememberO When adding or subtracting add
uncertaintiesO When dividing convert to percent
uncertainty, then add percent uncertaintiesO If units are for ex. g/ml convert back to
uncertaintyO If units are percent change then convert back
then multiply by 100 to get back to % unitsO When taking an average divide your
uncertainty by N
The act of measuringO When a measurement is taken, this
can affect the environment of the experiment.
O Ex. When a cold thermometer is used to measure warm water. The thermometer may cool the water
O Ex. The presence of the experimenter influences the behaviour of the animal being observed
Replicates and SamplesO Biological systems because of their
complexity and variability require replicate observations and multiple samples of material.
O In IB you can choose to do a 5X5 or a 2X10O 5 changes to the independent variable
measured 5 timesO 2 changes to the independent variable
measured 10 times
Degrees of precisionO If it is digital the use the value of
the least known digit (e.g the mass on the scale says 1.01g, then your uncertainty is +/- 0.01g)
O If it is analog like in the case of a thermometer then use least known digit divided by 2
O Always include you degrees of precision for every measuring device in your lab (especially in your tables)