lab 1 intro

39
Labs & assignments Lab activities will parallel lecture material (to all extent possible) and handout materials will be used as appropriate. All lab assignments must be submitted via Blackboard one week after the assigned dates unless otherwise noted by the instructor. No duplicated Lab!!!!!!!!!

Upload: erik-d-davenport

Post on 20-Feb-2017

2.177 views

Category:

Education


1 download

TRANSCRIPT

Page 1: Lab 1 intro

Labs & assignments Lab activities will parallel lecture material

(to all extent possible) and handout materials will be used as appropriate.

All lab assignments must be submitted via Blackboard one week after the assigned dates unless otherwise noted by the instructor.

No duplicated Lab!!!!!!!!!

Page 2: Lab 1 intro

Biol 205: Lab 1

Ecological Data &

Descriptive Statistics

Dr. Davenport

Page 3: Lab 1 intro

Objectives Why and what is statistics? What is data? Basic principle of statistics --

relationship between (statistical) population and sample?

Descriptive Statistics Assignment and Questions

Page 4: Lab 1 intro

Why statistic?How to draw the intelligent judgment in the

presence of uncertainty?

Page 5: Lab 1 intro

Statistics

is a branch of applied mathematics that helps us to make intelligent judgements and informed decisions in the presence of uncertainty and variation.

• Useful in the planning of experiments and studies that will result in meaningful data.

• Provides a set of tools to extract and understand information resulting from experiments.

Page 6: Lab 1 intro

Data is :

collection of facts from which conclusions may be drawn

representation of facts, concepts, or instructions in a formal manner suitable for communication, interpretation, or processing by human beings or by computers.

formal representation of raw material from which information is constructed via processing or interpretation.

Page 7: Lab 1 intro

Why you need data?Basic principle of statistics

The data is very important to present, summary and interpret the ecological phenomena.

However, it usually is impossible or impractical to monitor the entire habitat or obtain measurements of all the organisms in a given area.

So most time, only part of the population will be sampled when you acquire a set of data.

Page 8: Lab 1 intro

8

Population The entire group of individuals is

called the population. For example, a researcher may be

interested in the relation between class size (variable 1) and academic performance (variable 2) for a population of third-grade children.

Page 9: Lab 1 intro

9

Sample Usually populations are so large that

a researcher cannot examine the entire group. Therefore, a sample (subset of population) is selected to represent the population in a research study. The goal is to use the results obtained from the sample to infer information about the population.

Page 10: Lab 1 intro

Basic principle of statistics

Page 11: Lab 1 intro

SummaryPopulation: the set of all measurements of interest.

Sample: a subset of measurements of interest to the investigator.

Population Sample

Statistics

Page 12: Lab 1 intro

Selecting Samples Sample should be taken at a random

order. Why?

Random sampling implies that each measurement in the population has an equal opportunity of being selected as part of your sample.

Otherwise, your samples could be biased.

Page 13: Lab 1 intro

Sampling Replication Why do we need replication?

Single measurement generally is insufficient to draw a conclusion about a population.

Page 14: Lab 1 intro

DefinitionsDescriptive Statistics: basic tools for summarizing and presenting numerical data.

Page 15: Lab 1 intro

Low Birth Weight DataVariable Abbreviation

Identification Code ID Low Birth Weight (0 = Birth Weight >= 2500g, LOW 1 = Birth Weight < 2500g) Age of the Mother in Years AGE Weight in Pounds at the Last Menstrual Period LWT Race (1 = White, 2 = Black, 3 = Other) RACE Smoking Status During Pregnancy (1 = Yes, 0 = No) SMOKE History of Premature Labor (0 = None, 1 = One, etc.) PTL History of Hypertension (1 = Yes, 0 = No) HT Presence of Uterine Irritability (1 = Yes, 0 = No) UI Number of Physician Visits During the First Trimester FTV (0 = None, 1 = One, 2 = Two, etc.) Birth Weight in Grams BWT

Page 16: Lab 1 intro

Low Birth Weight Data ID LOW AGE LWT RACE SMOKE PTL HT UI FTV BWT

85 0 19 182 2 0 0 0 1 0 2523 86 0 33 155 3 0 0 0 0 3 2551 87 0 20 105 1 1 0 0 0 1 2557 88 0 21 108 1 1 0 0 1 2 2594 89 0 18 107 1 1 0 0 1 0 2600 91 0 21 124 3 0 0 0 0 0 2622 92 0 22 118 1 0 0 0 0 1 2637 76 1 20 105 3 0 0 0 0 3 2450 77 1 26 190 1 1 0 0 0 0 2466 78 1 14 101 3 1 1 0 0 0 2466 79 1 28 95 1 1 0 0 0 2 2466 81 1 14 100 3 0 0 0 0 2 2495 82 1 23 94 3 1 0 0 0 0 2495 83 1 17 142 2 0 0 1 0 0 2495 84 1 21 130 1 1 0 1 0 3 2495

Hosmer and Lemeshow (2000) Applied Logistic Regression: 2nd Edition; John Wiley & Sons

N=189

Page 17: Lab 1 intro

Data Presentation

Three ways to summarize, or describe data:

1. Tables

2. Graphics

3. Basic Summary Statistics

Page 18: Lab 1 intro

TabulationsTables are used to describe qualitative data. The tables simply present the counts, or frequencies, observed in each category of a variable of interest.

Race

White

Black

Other

Count

96

26

67

%

51

14

35

Page 19: Lab 1 intro

Tabulations

None

One

Two

Three

Four or More

Visits Count Percent

100

47

30

7

5

52.9

24.9

15.9

3.7

2.6

Physician Visits During the 1st Trimester

Page 20: Lab 1 intro

No Visits One Visit Two Visits Three Visits Four or More

020

4060

8010

0

Bar ChartPhysician Visits During First Trimester

Page 21: Lab 1 intro

No V

isits

One V

isit Two Visits

Three Visits

Four or More

Number of Physcian VisitsPie ChartPhysician Visits During First Trimester

Page 22: Lab 1 intro

Summary StatisticsMeasures of Center (Central Tendency)

MeanMedianModeMeasures of Spread (Variability)RangeVarianceStandard Deviation

Page 23: Lab 1 intro

MeanThe mean of a data set is the average of all

the data values. If the data are from a sample, the mean is

denoted by

If the data are from a population, the mean is denoted by “mu”.

x xni

xNi

x

Page 24: Lab 1 intro

Measures of CenterMean (average): sum of sampled values divided by the number of samples taken.

n = sample sizeXi = sampled value = symbol for summation

= population mean

X = sample mean

Page 25: Lab 1 intro

Measures of CenterExample:

30, 26, 26, 36, 48, 50, 16, 31, 22, 27, 23, 35, 52, 28, 37

1

1 1 30 26 ... 37 32.4715

n

ii

X Xn

Note: The mean is sensitive to extreme values.

30, 26, 26, 36, 48, 50, 16, 31, 22, 27, 23, 35, 52, 28, 37, 113

37.50X

How do extreme values affect the mean?

Page 26: Lab 1 intro

( 1)2

2 2

[ ]

[ ] [ 1]

if n is odd

if n is even2

n

n n

xX x x

Measures of CenterMedian: the value of a set of measurement that falls in the middle position when the data are ordered from smallest to largest.

Page 27: Lab 1 intro

Measures of Center

16, 22, 23, 26, 26, 27, 28, 30, 31, 35, 36, 37, 48, 50, 52

N = 15 is odd, so the 8th value is the median:

The 8th valueWhy 8? (15 + 1)/2 = 8

16, 22, 23, 26, 26, 27, 28, 30, 31, 35, 36, 37, 48, 50, 52, 113How do extreme values affect the median?

Now N=16, so the average of the 8th and 9th value is the median, which is 30.5 ... not much different from the original data!

Page 28: Lab 1 intro

Measures of CenterMode: the value of a set of measurements that occurs most frequently.In our example data, the mode is 26.

16, 22, 23, 26, 26, 27, 28, 30, 31, 35, 36, 37, 48, 50, 52

26 is the modeFact: For data that is symmetric and unimodal, the mean, median and mode are similar.

Page 29: Lab 1 intro

Measures of SpreadRange: the difference between the largest and smallest sample measurements.In our example, the range is 36.

16, 22, 23, 26, 26, 27, 28, 30, 31, 35, 36, 37, 48, 50, 52

Note: Two data sets may have the same range, but very different shape and variability.

R = 52-16 = 36

Page 30: Lab 1 intro

Measures of SpreadSum of squared deviations from the mean, which is referred to simply as the sum of squares (SS)

_SS = ∑(Xi - X)2

Page 31: Lab 1 intro

Measures of SpreadVariance (s2): the sum of the squares of the deviations divided by the sample size minus one.Standard Deviation (s): the square root of the variance.

22 ( )

1ix x

sn

2s s

Page 32: Lab 1 intro

Measures of Spread Degree of freedom (DF):

DF = n-1

Page 33: Lab 1 intro

Measures of SpreadA computationally more convenient formula to calculate the variance:

2

22 2

2

1 1

ii

i

xxx nx ns

n n

Page 34: Lab 1 intro

Measures of Spread

The variance and standard deviation for our example are:

16, 22, 23, 26, 26, 27, 28, 30, 31, 35, 36, 37, 48, 50, 52

2 510.822.6

ss

Page 35: Lab 1 intro

Normal Distribution

https://en.wikipedia.org/wiki/Normal_distribution#/media/File:Normal_Distribution_PDF.svg

Page 36: Lab 1 intro

Lab 1: AssignmentAs a fishery scientist working for

NOAA, you did lots research on the strip bass (rockfish) population in the Chesapeake Bay. In one of your studies, you gathered data about the age structure for rockfish population in the Chesapeake Bay, and you need to do some statistical analysis before you can present your data to the public. 

The fish samples you collected were in 3 age groups: age1 (1 year old); age2 (2 year old), and age 3 (3 years old).

Page 37: Lab 1 intro

Lab 1: Questions1. What is statistical population (N)? What is sample (n)? What is the relationship between statistical population and sample? What information does the sample (n) infer about the statistical population (N)?2. Write the definition (formulas) for variance and standard deviation3. Draw a bar chart and a pie chart about the number of the fishes from different age groups (the age structure about your sample).

Page 38: Lab 1 intro

Lab 1: Questions (continued)

4. What is the average weight of the fishes in your entire sample?5. What are the average weights of the fishes in different age groups (age1, age2, and age3)?6. What is the median weight of the fishes for age 1 group? And, What is the median weight of the fishes for age 3 group?7. What is the range of the weight for the fishes at age 2 group?

Page 39: Lab 1 intro

Lab 1: Questions (continued)

8. Calculate the variance of the weight of the fishes at age 2 group.9. Calculate the standard deviation of the weight for fishes at age 1 group.