anna bombak, chuck humphrey, angie mandeville, leah vanderjagt and amanda wakaruk winter institute...

62
Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25, 2011 The Winter Institute on Statistical Literacy for Librarians Demystifying statistics for the practitioner

Upload: jade-parsons

Post on 20-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda WakarukWinter Institute on Statistical Literacy for Librarians, February 23-25, 2011

The Winter Institute on Statistical Literacy for Librarians

Demystifying statistics for the practitioner

Page 2: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Outline

Introductions A framework for understanding statistics Statistics shaped by geography Official statistics: national Official statistics: international Non-official statistics Applying what you have learned

Page 3: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Introductions: your backgrounds

Please introduce yourself Your name Your institutional affiliation Your job responsibilities

Page 4: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Introductions: your backgrounds

A little over two-thirds are from academic libraries. In the past, the split has been closer to 50/50.

The largest group, with 16, is from universities other than the U of A.

The second largest group, with 6, is from the U of A.

Page 5: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Introductions: your backgrounds

Geographically, 18 of you are from Alberta and 12 are from other provinces.

Seven are from B.C., which is the largest participation we have ever had from B.C.

Thirteen are from the Edmonton region.

Page 6: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Uses of quantitative evidence

To provide a description This typically entails answering the question about the

scale or scope of something observable and its characteristics.

To make a comparison This usually involves establishing the degree of

similarity or dissimilarity among observables. To identify a relationship

This method looks at the correlation among characteristics of observables, that is, how are things related?

Page 7: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Statistics are ubiquitous

“Statistics are generated today about nearly every activity on the planet. Never before have we had so much statistical information about the world in which we live. Why is this type of information so abundant? For one thing, statistics have become a form of currency in today’s information society. Through computing technology, society has become very proficient in calculating statistics from the vast quantities of data that are collected. As a result, our lives involve daily transactions revolving around some use of statistical information.”

Data Basics, page 1.1

Page 8: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Statistics: what are we talking about?

Statistics and data are related but different

Page 9: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Statistics• numeric facts & figures • derived from data, i.e, already

processed• needs definitions & classifications• presentation-ready• published

Data• numeric files created and

organized for analysis or processing

• requires processing• needs detailed documentation• not display-ready• disseminated, not published

How statistics and data differ

Page 10: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

A statistic can’t be real without data A ‘real’ statistic requires a data source. If the

publisher of a statistic can’t tell you the data source behind a statistic, you should question that the statistic is ‘real.’ After all, people do make up statistics.

Recent example: In an interview with Meredith Whitney on the December 19, 2010 episode of CBS’ 60 Minutes, she claimed that 50 to 100 “sizable” cities and counties in the U.S. would default on billions of dollars of municipal bonds. Her estimate sparked a mini-panic on the bond market. She refused to release the report behind these predictions on the grounds that her research is proprietary. Bloomberg revealed on February 1, 2011 that she “doesn’t have any numbers to back up her assertions -- she pulled the numbers out of thin air.”

Page 11: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

A statistic can’t be real without data

A statistic may have been derived from poor quality data and, consequently, may be of questionable value. But nevertheless, it remains a ‘real’ statistic.

The desire is to have quality statistics that are derived from quality data.

Recent example: A long-standing debate erupted over a Lancet article published in 2004 that estimated the number of civilian deaths in Iraq, following the 18 months after the invasion, to be around 98,000. The Iraq Body Count project compiled a database of reported civilian deaths showing between 11,000 and 13,000 deaths in this same period. The UK government embraced statistics from the Iraq Ministry of Health, which reported 3,853 civilian deaths and 15,517 injuries over six months in 2004.

Page 12: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Statistics Canada’s quality criteria Statistics Canada uses the following criteria to define

quality statistics or statistics “fit for use” Relevance: addresses issues of important to users Accuracy: degree it describes what it was designed to

measure Timeliness: the delay between when the information was

collected and when it is made available Accessibility: the ease to which the information can be

obtained by users Interpretability: access to metadata that facilitates

interpretation and use Coherence: the fit with other statistical information through

the use of standard concepts, classifications and target populations

Page 13: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Statistics are about definitions

Page 14: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Six dimensions or variables in this tableThe cells in the table are the number ofestimated smokers.

Geography

Region

TimePeriods

Social Content

Smokers

Education

Age

Sex

Concepts and definitions

Page 15: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Statistics are about definitions!

Statistics are dependent on definitions. You may think of statistics as numbers, but the numbers represent measurements or observations based on specific definitions.

Tables are structured around geography, time and social content based on attributes of the unit of observation. These properties all need definitions.

Page 16: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Statistics are about definitions! Statistics are dependent

on definitions. You may think of statistics as numbers, but the numbers represent measurements or observations based on specific definitions.

Tables, a common tool for displaying statistics, are structured around geography, time and content based on the attributes of the unit of observation. These properties all depend on definitions.

Page 17: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Statistics are about definitions! Consider the following example from the Canadian

Census on the data behind statistics about visible minorities. This table displays the size of the visible minority population in Canada from the 2006 Census.

Visible Minority Groups (15), Generation Status (4), Age Groups (9) and Sex (3) for the Population 15 Years and Over of Canada, Provinces, Territories, Census Metropolitan Areas and Census

Agglomerations, 2006 Census - 20% Sample Data

Page 18: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Statistics are about definitions! How is visible minority status identified in the

Census? Are aboriginals among the visible minority in Canada? What is the definition of visible minority?

Page 19: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,
Page 20: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,
Page 21: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

ClassificationsSex

Total

Male

Female

Periods

1994-1995

1996-1997

Statistics involve classifications

Page 22: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Some classifications are based on standards while others are based on convention or practice.

For example, Standard Geography classifications

Statistics involve classifications

Page 23: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Statistics involve classifications The definitions that shape statistics specify the

metric of the data they summarize (for example, Canadian dollars) or the categories used to classify things if a statistic represents counts or frequencies. In this latter case, classification systems are used to identify categories of membership in a concept’s definition.

Examples of standard classifications include the North American Industrial Classification System (NAICS), the National Occupational Classification (NOC-S) and the International Classification of Diseases (ICD). Look at these examples and describe the coding systems used.

Page 24: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Statistics are presentation ready Tables and charts (or graphs) are typically used

to display many statistics at once. You will find statistics sprinkled in text as part of a narrative describing some phenomenon; but tables and charts are the primary methods of organizing and presenting statistics.

Page 25: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

A quick review

To this point, we have established that: Statistics are ‘real’ only if they are derived from

data; Statistics are dependent of definitions of the

concepts they summarize; Statistics that represent counts of things in the

data employ classification systems, which are based either on standards or convention; and

Statistics are typically organized for display using tables or charts.

Page 26: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Pre-institute crime statistics exercise

Questions 1 & 5 deal with criticisms by Scott Newark about crime statistics published in a report by the Macdonald-Laurier Institute.

Questions 2 & 6 address Statistics Canada’s response to these criticisms.

Questions 1 & 2 are from the Globe & Mail article; Questions 5 & 6 are from CTV News.

Questions 3 & 4 are from the Edmonton Journal article and deal with unreported crimes.

Questions 7 & 8 deal with the data sources behind the statistics in Juristat.

Page 27: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Pre-institute crime statistics exercise Criticisms listed in the Globe & Mail and CTV

articles:1a. Revisions mess with rates based on initial releases1b. Crime categories change year after year1c. Unreported crime isn’t factored into Juristat1d. Don’t know from Juristat rates of crime committed by offenders

out on bail or parole, though data are available5a. Crime under-reported because only “most serious offense” is

counted5b. Statistics once reported in Juristat are no longer published5c. The Crime Severity Index is a subjective measurement

influenced by lenient judges5d. Upward revisions to annual statistics tend to exaggerate falling

crime rates from year to year (close to 1a above)

Page 28: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Pre-institute crime statistics exercise Responses listed in the Globe & Mail and CTV

articles:2a. Crime counts are revised but the results are insignificant over

time2b. Crime data are available allowing comparisons before and after

categories are changed6a. The “most serious offense” is problematic but used by other

jurisdictions and reduces variations in charging practices between police forces

6b. All 30 years of data are available but can’t go in one publication6c. The Crime Severity Index uses national sentencing averages to

avoid local judicial variations and was developed in conjunction with the Canadian Association of Chiefs of Police

6d. Annual revisions in past 10 years have resulted in increases six times and in decreases four times; revisions cut both ways

Page 29: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Pre-institute crime statistics exercise

Issues reflected in these criticisms and rebuttals Statistics are published and revisions are part of the

release of official statistics. A user must pay attention to the production-cycle of the statistics being released. [1a, 2a, 5d and 6d]

Statistics released in tables present one view of the data. Other views exist in the data. [1d, 5b and 6b]

Definitions of the unit of observation, measurement concepts and categories of classification are fundamental to understanding statistics. [1b, 1c, 2b, 5a, 5c, 6a and 6c]

Page 30: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Pre-institute crime statistics exercise “Chief among Newark’s concerns” reported in the

Edmonton Journal article3. Juristat only reports crimes from police records to

determine how many crimes are reported each year despite another Statistics Canada survey that investigates unreported crimes, typically property related

4. The General Social Survey (GSS)

Bonus answer: every five cycles of the GSS (Cycles 3-1988, 8-1993, 13-1999, 18-2004 and 23-2009)

Page 31: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Pre-institute crime statistics exercise Issues reflected in the questions from the Edmonton

Journal article: Knowing the unit of observation is essential to interpreting

statistics. Data from police records show crimes that have been reported, while the General Social Survey provides an indicator of crimes that have and have not been reported. The data source from police reflects crimes that have gone through the police system, just as health statistics based on the health care system reflect the performance of Canada’s health system. Similarly, the GSS victimization survey reflects population experiences with crime, just at the Canadian Community Health Survey reflects population experiences with health conditions. [3 & 4]

Page 32: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Pre-institute crime statistics exercise Concepts and surveys discussed in the Uniform

Crime Reporting (UCR) surveys documentation7. The incident is the underlying observational unit for

counting reported crime in both surveys8. The two UCR surveys are: (a) the aggregate survey where

the incident is used with the “most serious offense” rule for form aggregate counts; (b) the incident-based survey (UCR2) counts information individually for each incident and has more detailed offense categories compared to the aggregate survey

Page 33: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Pre-institute crime statistics exercise Issues arising from the documentation from the

Uniform Crime Reporting surveys The observational unit serves as the entity about which

statistics are produced. With the UCR surveys, two units of observation are used. UCR is based on the “most serious offense” identified for each crime incident. As a consequence, other offenses within an incident are not counted. This is similar to cause of death statistics where one primary cause is identified from possibly multiple causes. UCR2 is based on all offenses within an incident. Offenses becomes the observable unit counted. Similarly with multiple causes of death, each cause is counted, resulting in more causes of death than actual deaths. [7 & 8]

Page 34: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

• Who published this statistic? Can you name the producer or distributor of the data? Does the publisher identify a data source for this

statistic? Do you have enough information to cite this statistic?

• What view of the data is shown in this statistic? What level of geography is shown? What time period is shown? What social characteristics are shown? Why was this view shown?

Being a critical user of statistics

Page 35: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

• What concepts are represented in this statistic? Are definitions provided with the statistic for

geography, time or the social characteristics? Was a standard classification system used for the

categories of the statistic?

• Can you identify a data source for the statistic? Is there enough information provided with the

statistics to find its data source? Is there a name for the data source? Is there a distributor for the data source?

Being a critical user of statistics

Page 36: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Critique a statistical table

To practice critiquing statistics, we will use a table published by Statistics Canada about the average undergraduate tuition fees for full-time students by field of study.

Refer to the handout entitled, “Tips for Reading a Statistical Table,” to find a full list of the information that I expect to find in a statistical table.

Page 37: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,
Page 38: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,
Page 39: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Statistics• numeric facts & figures • derived from data, i.e, already

processed• needs definitions & classifications• presentation-ready• published

Data

• numeric files created and organized for analysis or processing

• requires processing• needs detailed documentation• not display-ready• disseminated, not published

Data as a focus

Page 40: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

WHERE ARE THE DATA!

Page 41: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Microdata

Page 42: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Microdata record layout

Page 43: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Microdata data dictionary

Page 44: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

What about data?

While we are not focusing our attention on data in this workshop, it is helpful to understand some basics about the origins of data, especially since statistics are derived from data. As we will see later, having a good understanding of data can greatly help in the search for statistics.

There are three generic methods by which data are produced. One will find statistics generated from the data arising out of all of these methods.

Page 45: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Methods producing data Observational

MethodsExperimental

MethodsComputational

MethodsFocus is on developing observational instruments to collect data

Focus is on manipulating causal agents to measure change in a response agent

Focus is on modeling phenomena through mathematical equations

Correlation Causation Prediction

Replicate the analysis (same data or similar)

Replicate the experiment

Replicate the simulation

Statistics summarize observations

Statistics summarize experiment results

Statistics summarize simulation results

Page 46: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Methods producing data

A particular discipline or field of study will tend to be dominated by one of these three methods, although outputs may also exist from the other two methods. Consequently, the knowledge disseminated within a field is often fairly homogeneous in the way statistical information is used and reported.

We will see later how knowing the method from which data are derived and the life cycle in which statistics are produced can help in the search for statistics.

Page 47: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Life cycle of survey statistics

1 Program objective

2 Survey unit organized

3 Questionnaire & sample

4 Data collection

5 Data production & release

6 Analysis

7 Findings released

8 Popularizing findings

9 Needs & gaps evaluation

12

3

4

56

7

8

9

Access to Information

Page 48: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Life cycle of survey statistics

1 Program objective

2 Survey unit organized

3 Questionnaire & sample

4 Data collection

5 Data production & release

6 Analysis

7 Official findings released

8 Popularizing findings

9 Needs & gaps evaluation

12

3

4

56

7

8

9

Preserving Information

Page 49: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Life cycle applied to health statistics

1 Program objectives

increased emphasis on health promotion and disease prevention;

decentralization of accountability and decision-making;

shift from hospital to community-based services;

integration of agencies, programs and services; and

increased efficiency and effectiveness in service delivery.

12

3

4

56

7

8

9

Health InformationRoadmap Initiative

Page 50: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Life cycle applied to health statistics

12

3

4

56

7

8

9

Health InformationRoadmap Initiative

2 Survey unit organized

3 Questionnaire & sample

4 Data collection

5 Data production & release

6 Analysis

7 Official findings released

Page 51: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Reconstructing statistics

One way to see the relationship between statistics and the data upon which they were derived is to reconstruct statistics that someone else has produced from data that are publicly accessible.

Page 52: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Reconstructing statistics

12

3

4

56

7

8

9

Health InformationRoadmap Initiative

1 Program objective

2 Survey unit organized

3 Questionnaire & sample

4 Data collection

5 Data production & release

6 Analysis

7 Official findings released

8 Popularizing findings

9 Needs & gaps evaluation

Page 53: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

The statistics that we will reconstruct are reported in “Health Facts from the 1994 National Population Health Survey,” Canadian Social Trends, Spring 1996, pp. 24-27.

The steps we will follow are: identify the characteristics of the respondents in the

article; identify the data source; locate these characteristics in the data documentation; find the original questions used to collect the data; retrieve the data; and run an analysis to reproduce the statistics.

Reconstructing statistics

Page 54: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

The findings to be replicated

Page 26

Page 55: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Summary of variables identified Findings apply to Canadian adults

Likely need age of respondents Men and women

Look for the sex of respondents Type of drinkers

Look for frequency of drinking or a variable categorizing types of drinkers

Age Look for actual age or age in categories

Smokers Look for smoking status

Page 56: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Identify the data source

Survey title is identified: National Population Health Survey, 1994-95

Public-use microdata file is announced

Page 25 of the article

Page 57: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Locate the variables

Examine the data documentation for the National Population Health Survey, 1994-95

PDF version is on-line Use TOC and link to “Data Dictionary for Health” Identify the variables from their content

NOTE: check how missing data were handled Trace the variables back the questionnaire Did sampling method require weighting cases?

NOTE: in addition to the other variables, is a weight variable needed to adjust for the sampling method?

Page 58: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Retrieve and analyze the data

For universities subscribed to the Statistics Canada Data Liberation Initiative (DLI), the public use microdata from the NPHS can be downloaded without additional cost. See the Statistics Canada Online Catalogue for further cost details.

Make use of local data services to retrieve data from the NPHS.

Page 59: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Lessons from the NPHS example

This example demonstrates the distinction between producing statistics and interpreting statistics that have been published by others.

This is an important distinction because: Choices are made in creating statistics. Interpreting statistics requires an ability to

understand the choices that were made. Searching for statistics that others have

published can be facilitated by understanding these points.

Page 60: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Search strategies for statistics

Over the next two days, we will talk about two general search strategies for finding statistics.

The government publications strategy is to identify an agency that would produce and publish such a statistic. This approach relies on knowledge of governmental structure and on the content for which agencies are responsible.

The data strategy is to identify a data source from which the statistics were derived. This approach replies on knowledge of data sources produced by agencies or organizations.

Page 61: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

The data strategy What data source or sources could produce such

statistics? What observational unit would be needed to

produce such statistics? What would the structure of the table look like given

time, geography and attributes of the observational unit?

Who would produce such data? Would the source be an official agency? Use the literature trail and its indexes to see if a

data source can be found (official and non-official publications)

Page 62: Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,

Framework

AGENCYPUBLICATIONS

DATASOURCES

OFFICIALSTATISTICS

ORGANIZATIONPUBLICATIONS

DATASOURCES

NON-OFFICALSTATISTICS

STATISTICAL INFORMATION