welcome to stat201 – statistics for life sciences today’s …jackd/stat201/lecture_wk01.pdf ·...

71
Welcome to STAT201 – Stascs for Life Sciences Today’s agenda: - Introducon - Policies - Data types

Upload: others

Post on 17-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Welcome to STAT201 – Statistics for Life Sciences

Today’s agenda:

- Introduction- Policies- Data types

Page 2: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Contact:

E-mail: [email protected]

Course website: http://www.sfu.ca/~jackd/Stat201

Office Hours: TBA, but probably...

Tues 12-1pm in the learning commons,

Thursday 3-4pm in my office TBA

Page 3: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

By the end of this course, successful students will be able to:

1. Interpret the results and graphs of a range of popular statistical methods.

2. Determine which of these methods, if any, is appropriate fora given data-based problem.

3. Evaluate the validity of key assumptions for each of these methods.

4. Communicate their needs when problems are beyond their statistical expertise.

Page 4: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

These goals could describe many statistical courses. The differences between each course are the methods in question.

For Stat 201 and 203, these methods are

One Sample T-tests,

Two Sample T-tests,

THREE+ Sample T-tests (In other words ANOVA),

FOUR Samp Regression,

And Contingencies.

Page 5: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Also, there are some fundamental concepts that are common to the methods we’re looking at (and almost every other statistical method outside of this course too)

These are…

Descriptive statistics,

Probability,

Sampling,

Hypothesis testing.

Page 6: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Grading Scheme

- 5 assignments, worth 3x5 = 15%- 2 midterms, worth 2x20 = 40%- Final Exam worth 40%- Tophat participation worth 5%

Page 7: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

My assumption is at the beginning of the semester you are…

- Fresh from the break, but probably not super enthusiastic about class.

- Possibly apprehensive about doing a quantitative class awayfrom your major.

- Mildly interested in statistics, but not as much your own field.

Page 8: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

My hope is at the end of the semester you are…

- Less intimidated by stats than at the beginning of the semester.

- Able to handle the most common kinds of statistical problems, and know what kinds of questions to ask of a specialist when something more complex comes up.

- 3 credits wiser.

Page 9: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

But what are YOUR hopes?

Are my assumptions correct? What are you hoping to get out of this course? Is there a particular topic you want to see? Are you just looking to pass the dreaded mandatory stats class and don’t want extra complications?

Over the next week, I’ll be giving you access to a survey with questions like these.

The answers to these questions will affect the theme of the examples that I use in class and on assignments.

Page 10: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Regarding TopHat

I’ll be using a student feedback / learning management system called Tophat.

It’s similar to an iClicker in that it can be used to poll the entire class, but open answers like numbers and writing can also be done because it uses responses from your phones/tables/laptops instead of a separate device.

…it does, however cost $24/semester.

Page 11: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

About the textbook and other information sources.

You’re much better off with the textbook than without it. Statistics is a vast field, and looking particular topics up online isn’t always effective.

For example, if you look up ‘two-sample t-test’, you’ll find all sorts of mathematical proofs, as well as examples done painstakingly by hand. Neither of these will help for this course.

The focus of this course is NOT mathematics.

Page 12: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Having a copy of The Basic Practice of Statistics (BPS) is highly recommended for this course because it does a very good job in delivering what you need to know about the concepts and little else.

Also, since I will be including the wording of any assignment questions you need, older versions (6th and 5th editions) of the textbook will be viable for this course.

BPS is used in several courses, and has been for a while, so you may be able to find such a copy at a thrift store.

Page 13: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

There is a statistics workshop in K9501, Shrum Science CenterK on the 9000 (main) level. On the way towards the Bus Stop / Club Ilia / Cornerstone Mews from here.

- The workshop has R ready computers and on-site tutors that can help you Mon-Fri. Use the workshop early, and use it often.

Page 14: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

- Stay ahead. Read the material before class so that you’re seeing it a second time here. This saves more time than you think it does. (Also gives you a buffer for papers in other classes).

- STAY AHEAD. This course gets MUCH harder after the first midterm, so if you have the spare time to read ahead to t-tests in your textbook now, do it.

- DON'T FALL BEHIND. In previous years, I’ve had people askfor help saying they need 80-100% on the final to pass the course. So far, all of them have failed.

Page 15: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

About the assignments

The whole point of assignments is to give a means of practicingDOING statistics instead of just reading about them. That’s where the learning happens. The assignments are worth a lot less than exams so that you have a chance to make your own mistakes without penalizing you.

If you’re just going to copy someone else’s assignment, don’t bother.

The value of doing assignments isn’t for the 15% they are worth. The value is in the higher exam scores you will get.

Page 16: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Regarding software

For assignments, you will need to interpret output from the

statistical software R.

I will also be providing the data sets and R code in order to

produce this output.

This way if you want to explore the data to a greater depth

or get better acquainted with R, you can, but for the

assignments, you need only copy and paste R code.

Page 17: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

You may also use SAS, JMP, SPSS, or Excel to get these results

if you are already proficient in one of these programs.

R is open source and freely available for Windows, Mac, and

Linux at

https://cran.r-project.org/

It is also available on the Play Store for Android phones.

Page 18: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Regarding collaboration, honesty, and plagiarism None of the assignments or exams for this course are recycled from previous sources. Anyone claiming to have a test bank for this offering of this course is lying.

Please include the names of your collaborators on your assignments. This way, the markers will understand when somesolutions look very similar that there wasn’t blind copying.

Page 19: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

You are encouraged to work together to do the computational and analytical portions of the assignments. However, all written work is expected to be solely yours.

Copying the writing of another student, or using services to write assignments on your behalf will be considered academically dishonest and will be dealt with as appropriate in SFU’s academic dishonesty policy.

The use of proofreading and essay skills services, such as those in the Student Learning Commons, is perfectly fine.

Page 20: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Course Schedule

Tuesdays 1:30 – 2:20 in Surrey 5280

Thurdsays 12:30 – 2:20 in Surrey 5280

Wk 1 – Hr 2-3 (Thur, Jan 5)

Schedule and policies.

Types/levels of data.

Descriptive statistics:

- Measures of centrality

(Mean, median, mode, trimmed mean)

Page 21: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Wk 2 – Hrs 1 (Tue, Jan 10)

Wk 2 - Hr 2 and 3 (Thur, Jan 12)

Descriptive statistics:

- Measures of centrality

(Mean, median, mode, trimmed mean)

- Measures of spread

(MAD, Standard deviation, variance)

- Other measures

(Quantiles, skewness, shape parameters)

Page 22: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Wk 3 - Hr 1 (Tue, Jan 17)

Graphs:

- Bar chart, histogram, time plot, pie chart.

- Scatterplots

Correlation

Wk 3 - Hr 2 and 3 (Thur, Jan 19) ASSIGN 1 DUE at NOON

Probability:

- What is it?

- Basic rules, complement rule

- Weather example.

Page 23: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Wk 4 - Hr 1 (Tue, Jan 24)

Probability:

- Independence and mutual exclusion

- Multiplication rule

- Addition rule

Wk 4 - Hr 2 and 3 (Thur, Jan 26)

Probability:

- Conditional probability

- Binary trees

- Bayes Rule (as time permits)

Page 24: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Wk 5 – Hr 1 (Tue, Jan 31)

Sampling

- Simple random sample

- Stratified sample

- Non-random / convenience sample

- Snowball / recruitment sample (as time permits)

Wk 5 – Hr 2 and 3 (Thur, Feb 2) ASSIGN 2 DUE at NOON

Experiments and observational studies

- Examples and exam prep

Page 25: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Wk 6 - Hr 1 Midterm 1 (Tue, Feb 7)

Wk 6 - Hr 2 and 3 (Thur, Feb 9)

Overflow from previous weeks

Causality (as time permits)

Wk 7 Family day and reading week (Feb 11-19)

Page 26: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Wk 8 – Hr 1 (Tue, Feb 21)

Normal distribution

T-distribution

Degrees of freedom

Wk 8 – Hr 2 and 3 (Thur, Feb 23) ASSIGNMENT 3 DUE at NOON

Hypothesis testing

P-values

Page 27: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Wk 9 (Tue Feb 28, Thur Mar 2)

One-sample t-test (T-test of a single mean)

One-sided and two-sided tests

Confidence intervals

T-test for correlation

Page 28: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Wk 10, Hr 1 (Tue, Mar 7)

Two-sample t-test (unpaired)

- Equal variance vs unequal variance

- The sources of variance

- n=infinity, the connection to one-sample T

Wk 10, Hr 2 and 3 (Thur, May 9) ASSIGNMENT 4 DUE at NOON

More examples of the two-sample T-test

Page 29: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Wk 11 - Hr 1 Midterm 2 (Tue, Mar 14)

Wk 11 – Hr 2 and 3 (Thur, Mar 16)

Paired vs. Unpaired T-test

Exam Prep

Page 30: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Wk 12 – (Tue, Mar 21; Thur, Mar 23)

ANOVA (Analysis of Variance)

- … is a 3+ sample t-test

- The problem of multiple testing

- ANOVA tables

- The equal variance assumption

- Extra examples

- Tukey’s test (as time permits)

Page 31: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Wk 13, Hr 1 (Tue, Mar 28)

Contingency tables

- The test for independence

- Expected vs observed counts

- The chi-squared test

- Limitations

Wk 13, Hr 2 and 3 (Thur, Mar 30) ASSIGN 5 DUE at NOON

Regression

- Slopes and intercepts

- Making predictions

- Interpolation vs. Extrapolation

Page 32: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Wk 14, Hr 1 (Tue, Apr 4; Thur, Apr 6)

Regression

- Non-linearity

- Outliers

- Case study: Anscombe’s Quartet.

Extra examples and overflow.

Wrap up.

FINAL EXAM APRIL 9, SUNDAY, 3:30 PM.

Page 33: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

This schedule is a plan,

Dune not assume it is sealed in stone.

Page 34: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

DATA TYPES

Nominal Data

- Nominal means ‘name’, as in the name is the most important part.

- Example: Sex – Male, Female, Other.

- Example: Favourite Ice Cream – Chocolate, Vanilla, Pistachio, Toenail, Anthrax, Rum-Raisin.

Page 35: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Nominal data can be expressed as a pie chart because we’re most interested in the relative frequency of each response (i.e.the relative size of each group)

Man; 48%

Woman; 48%

Other; 4%

Gender

Chocolate; 42%

Vanilla; 25%

Pistachio; 15%

Toenail; 9%Anthrax; 6%Rum-Raisin; 3%

Favourite Ice Cream

Page 36: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Ordinal Data

- Means ‘order’, because the order of the data is the most important.

- Example: Opinion - Strongly Agree, Agree, Neutral, Disagree, Strongly Disagree

- Example: How much did you drink over the break? – None at all, A little, moderate amount, enough to drop a grizzly bear.

Page 37: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

- Both ordinal and nominal data can be expressed as bar charts, but for ordinal data, the order of the categories is in implied in the placement of the bars.

Page 38: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Interval Data

- Like ordinal data, but the different categories are evenly spaced.

- Example: Grades as percent. The 83% category could include anything in the interval from 82.5% to 83.5% or from 82.1% to 83% depending on grades.

- Example: Number of bearded dragons owned. (0, 1, 2, 3, 4, …) The numbers are discrete, meaning separated, but the difference between each category is still one dragon.

Page 39: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

- Interval data will be our first focus, because many classic summary statistics can be done on them like the…

-omean, omedian, ostandard deviation, ointerquartile range, andoskewness.

Page 40: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Ratio data is very similar to interval data, except that since it’s usually a representation of ratio between two positive numbers, it’s usually positive itself.

Examples of ratio data include anything in a ‘per-capita’ format,like

Live births per 1000 people (Ratio between births and people)

Murders per 100000 people (Ratio between murders and people)

Page 41: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

In business, ratio data could be…

- Cost per unit (ratio between dollars spent and number of objects made)

- Earnings per share (Ratio between dollars earned and existingshares)

For our purposes, we won’t make any distinction between interval and ratio data. The same methods work on both of them.

Often interval/ratio data are called NUMERIC data.

Page 42: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Histogram

- Unlike a bar chart, a histogram is drawn with no gaps between the bars.

- The lack of gaps emphasizes the evenly spaced categories that cover all the values in a range.

Page 43: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Blurred line between ordinal and interval

- If the distance between categories is constant or makes numerical sense then ordinal data can be treated like interval data.

- Example (either Ordinal OR Interval) Distance: 0-200km, 200-400km, 400-600km, 600-800km.

- Example (Ordinal but NOT Interval) Distance: 0-20km, 20-50km, 50-200km, more than 200km.

Page 44: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Word Cloud (for interest)

- Recently more creative graphs like word clouds are used to show frequencies in many categories at once. (thanks to http://www.tocloud.com/ )

- Next: A cloud of the word frequencies of http://en.wikipedia.org/wiki/British_Columbia_history

Page 45: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Word Cloud (for interest)

- The larger a word is, the more often it appears. This graph isdominated by oBritish (used 116 times), oColumbia (97 times), andoThe phrase “British Columbia” (86 times) in red.

Page 46: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Word Cloud (for interest)

- We can see subtler patterns by ignoring “British” and “Columbia”.

(Also for interest)

Visualization is one of the more active topics within Statistics.

Florence Nightingale [Joy of Stats 23:40 – 27:00]

Page 47: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Hour break

Page 48: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Frequency

A frequency distribution, like a histogram shows the number ofobservations in a particular range or of a particular value.

Frequency means ‘how often’

In this age histogram, about 2.5 million Canadians are between45 to 54 years old, inclusive. That bump represents the baby boom.

Page 49: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1
Page 50: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1
Page 51: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Modes

- A local high point or maximum in a distribution is called a mode.

- Distributions with one mode are called unimodal.

- ...with two modes are called bimodal

, and more modes are called multimodal (these are rare).

Page 52: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

A lot of distributions are naturally unimodal, so seeing a bimodal distribution often implies there are two distinct populations being measured. (Weight of people? Running speeds of novice and pro joggers?)

Page 53: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

The ‘skew’ is the mass of extreme values.

A distribution is positively skewed if the mass of observations are at the low end of the scale. Examples: Income, Drug use, word frequency.

Page 54: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Most of the observations from a negatively skewed distributionare near the top of the distribution with a few low exceptions.

Examples: Birth Weight, Olympic Running Speeds.

Page 55: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Mean

-The mean is generally referred to as the “average”.

- It is calculated by adding up all the values you observe and dividing by how many observations there are

- (Total of all observed values) / (number of values observed)

-

Page 56: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

The mean is:

(Note: The capital letter Sigma, ∑ , means ‘add up all the...’, x refers to the observed value, and n is the number of observations.

Page 57: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Mean

- You can only take the mean of interval/ratio data. (There’s no

such thing as the average gender, or the average flavour

of ice cream)

- You CAN however, have a mean age, mean income, mean height, or mean mg/L concentration. All of THESE are numeric quantities.

Page 58: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

-Example 1: The mean of 4,5,6,7,30 is

∑x = 4 + 5 + 6 + 7 + 30 = 52

mean = 52/5 = 10.4

(Note that the mean is higher than 4/5 of the numbers. It's being 'pulled' towards the value '30' because it's so much higher than the others)

Page 59: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

-Example 2: The mean of -2, 0, 3, 3, 8, 10, 10, 10 is

∑x = (-2) + 0 + 3 + 3 + 8 + 10 + 10 + 10 = 42

mean = 42/8 = 5.25

(Note that the value 'zero' is included in the count of values)

Page 60: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

- If you could make a sculpture of a

distribution, you could balance the sculpture on your

finger if your finger was at the mean.

Page 61: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Median- The median is the middle value. There is an equal number of observations that are above the than the median as there are below it.

-This does NOT mean that the median is in the middle of

the range.

(Example: If you numbers range from 0 to 10, the median is notautomatically 5)

-

Page 62: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

To find the median, arrange the observations in order and

take the middle value (if there are an odd number of values).

If there are an even number of values, take the mean of the middle two values. The mean of these two will be halfway between the middle two values.

IMPORTANT NOTE! The values MUST be shorted before taking the middle value. To see why, try the following examples without sorting.

Page 63: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Example for an odd number of values:

Find the median of 15,20,17,14,16

- Start with 15,20,17,14,16

- Sorted: 14,15,16,17,20

The middle value is the 3rd smallest out of 5,

which is the value 16.

Page 64: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Example for an odd number of values:

Find the median of -2,999,51,-5,10,20,4,50

-

Start with -2,999,51,-5,10,20,4,50

Sorted: -5,-2,4,10,20,50,51,999

The middle values are the 4th and 5th out of 8.

These values are 10 and 20.

The mean of these is 15.

The median of this dataset is 15.

Page 65: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Mean vs. Median – Which is better?

That depends on what you want.

If you're interested in the total, then the mean is better because it involves the total.

If you're interested in the 'typical' value, both measures work for symmetric data. The median works better for skewed data, or data with a lot of extreme values.

By default, the mean is used because a wider range of analysis tools apply to the mean.

Page 66: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

- Example 1: The height of women is typically symmetric, so by

default we use the mean.

-Example 2: You find the amount of cocaine people use has a

strong positive skew. For the typical amount used, the median

is best, which will be at zero (or near zero if only drug users

are considered). In other words, people typically don't use cocaine.

Page 67: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

- Example 3: Income also has a strong positive skew. If you want to assess the general health of the labour force, you would look at the MEDIAN household income, not the mean.

-Example 4: If you’re the one SELLING the cocaine, the mean is more interesting because you’ll want to know the total demand, not the amount that the casual user is taking.

Page 68: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

The mechanical difference between the mean and median is how they react to extreme values (e.g. skew).

If the data is skewed, the mean will be influenced, or ‘pulled’ by the extreme values. In other words, the mean is SENSITIVE to extreme values.

Page 69: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

The median is not pulled like this.

Because the median only cares about how many values are above or below it, a value far above the median affects it just as much as one slightly above it. In other terms, the median is ROBUST (not sensitive) to extreme values.

Page 70: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

Trimmed Mean (advanced)

-

One centrality measure to compromise between the mean andthe median is the trimmed mean.

Compared to the mean, the trimmed mean has a less sensitivity to extreme values, but still has some.

The trimmed mean adds up the values like the mean does, but first it discards some of the data on either end of a dataset.

Page 71: Welcome to STAT201 – Statistics for Life Sciences Today’s …jackd/Stat201/Lecture_Wk01.pdf · 2017-01-22 · By the end of this course, successful students will be able to: 1

- Example: A 10% trimmed mean is the mean of something

that ignores the lowest 10% and the highest 10% of the

values and THEN takes the mean.

- This method is not very common because it tosses away potentially good data. It also can't be used for every analysis method that the mean can.