misinterpretation of data, the importance of metadata and stc math misinterpretation of data, the...

16
Misinterpretation of Misinterpretation of data, data, the importance of the importance of metadata metadata and STC math and STC math DLI Atlantic Training DLI Atlantic Training April 2005 April 2005 Vicki Crompton and Mike Sivyer Vicki Crompton and Mike Sivyer

Upload: ernest-brown

Post on 12-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Misinterpretation of data, the importance of metadata and STC math Misinterpretation of data, the importance of metadata and STC math DLI Atlantic Training

Misinterpretation of data,Misinterpretation of data, the importance of metadata the importance of metadata

and STC mathand STC math

DLI Atlantic TrainingDLI Atlantic Training

April 2005April 2005Vicki Crompton and Mike SivyerVicki Crompton and Mike Sivyer

Page 2: Misinterpretation of data, the importance of metadata and STC math Misinterpretation of data, the importance of metadata and STC math DLI Atlantic Training

Data Misinterpretation: Data Misinterpretation: Crime RatesCrime Rates

Ebert & Roeper review of Ebert & Roeper review of Michael Wilson movie “Michael Michael Wilson movie “Michael Moore hates America” Ebert Moore hates America” Ebert doubted claim that Canadian doubted claim that Canadian crime rate 2X the USA ratecrime rate 2X the USA rateMoorelies.com | News: Whoa; Moorelies.com | News: Whoa; Stuart Didn't See That One CoStuart Didn't See That One ComingmingEbert conceded that the Ebert conceded that the statistics supported claim - statistics supported claim - figures were rightfigures were rightBUT - comparison of STC and BUT - comparison of STC and US Bureau of Justice website US Bureau of Justice website shows how statistics shows how statistics misinterpretedmisinterpreted

Crimes per 100,000 population - 2003Crimes per 100,000 population - 2003

   CanadaCanada USAUSA

All CrimesAll Crimes 8,5308,530 4,2674,267

Violent crimesViolent crimes 958958 523523

Property Property crimescrimes 4,2754,275 3,7443,744

Page 3: Misinterpretation of data, the importance of metadata and STC math Misinterpretation of data, the importance of metadata and STC math DLI Atlantic Training

Comparative Crime RatesComparative Crime Rates

Simplistic comparisonSimplistic comparison– Similar category titles on Similar category titles on

violent and property crimes violent and property crimes but different definitionsbut different definitions

– Violent crime 2-3 times Violent crime 2-3 times higher in US, property higher in US, property crimes closecrimes close

– Bureau of Justice StatisticsBureau of Justice Statistics Crime & Justice Data Onli Crime & Justice Data Onlinene

– Canadian Statistics - CrimeCanadian Statistics - Crimes by type of offences by type of offence

Crimes per 100,000 population - 2002Crimes per 100,000 population - 2002

   CanadaCanada USAUSA

Violent crimeViolent crime      

homicidehomicide 1.91.9 5.65.6

robberyrobbery 8585 146146

comparison of US (rape and aggravated comparison of US (rape and aggravated assault) difficult with Cdn sexual assault assault) difficult with Cdn sexual assault and assaults) and assaults)

Property CrimeProperty Crime      

B & E (Cdn) – Burglary B & E (Cdn) – Burglary (US)(US) 879879 746746

Theft (Cdn) - Larceny Theft (Cdn) - Larceny & Theft (US)& Theft (US) 2,1912,191 2,4462,446

Motor Vehicle theftMotor Vehicle theft 516516 432432

Page 4: Misinterpretation of data, the importance of metadata and STC math Misinterpretation of data, the importance of metadata and STC math DLI Atlantic Training

US Crime DataUS Crime Data

Page 5: Misinterpretation of data, the importance of metadata and STC math Misinterpretation of data, the importance of metadata and STC math DLI Atlantic Training

Canadian Crime DataCanadian Crime Data

Page 6: Misinterpretation of data, the importance of metadata and STC math Misinterpretation of data, the importance of metadata and STC math DLI Atlantic Training

Data Misinterpretation:Data Misinterpretation:Drinking Habits of CanadiansDrinking Habits of Canadians

Initial analysis of the 1990 Health Initial analysis of the 1990 Health Promotion Survey, indicated Canadians Promotion Survey, indicated Canadians enjoyed an average 60 drinks per day….enjoyed an average 60 drinks per day….

Page 7: Misinterpretation of data, the importance of metadata and STC math Misinterpretation of data, the importance of metadata and STC math DLI Atlantic Training

Data Misinterpretation:Data Misinterpretation:Importance of MetadataImportance of Metadata

1990 Health Promotion Survey there were a series of questions about alcohol consumption. 1990 Health Promotion Survey there were a series of questions about alcohol consumption.

First they asked if the respondent EVER drank alcohol, First they asked if the respondent EVER drank alcohol, and if YES asked if they drank within the last 12 monthsand if YES asked if they drank within the last 12 monthsand if YES asked for number of drinks for each day for the past 7 days. and if YES asked for number of drinks for each day for the past 7 days. The code book showed number of drinks per day as:The code book showed number of drinks per day as:

81 F4MON 2 0096‑0097 HOW MANY DRINKS DID YOU HAVE ON: MONDAY81 F4MON 2 0096‑0097 HOW MANY DRINKS DID YOU HAVE ON: MONDAY 00 00 NONE NONE 4651 4651 7334907 7334907 01:40 01:40 NUMBER OF DRINKS NUMBER OF DRINKS 403 403 25850802585080 41 41 MORE THAN 40 DRINKS MORE THAN 40 DRINKS 1 1 106106 98 QUESTION NOT ASKED 98 QUESTION NOT ASKED 7648 7648 05679100567910 99 NOT STATED 99 NOT STATED 89 89 155377155377 82 F4TUE 2 0098‑0099 HOW MANY DRINKS DID YOU HAVE ON: TUESDAY 82 F4TUE 2 0098‑0099 HOW MANY DRINKS DID YOU HAVE ON: TUESDAY

00 NONE 00 NONE 4608 4608 7306101 7306101 01:40 NUMBER OF DRINKS 01:40 NUMBER OF DRINKS 1447 1447 26139912613991

98 QUESTION NOT ASKED 98 QUESTION NOT ASKED 76487648 1056791010567910 99 NOT STATED 99 NOT STATED 89 89 155377155377

((Raw Raw WeightedWeighted))

Page 8: Misinterpretation of data, the importance of metadata and STC math Misinterpretation of data, the importance of metadata and STC math DLI Atlantic Training

Metadata for PUMFSMetadata for PUMFS

With Public Use Microdata Files, the code With Public Use Microdata Files, the code book is very importantbook is very important– Gives questions asked and codes used for Gives questions asked and codes used for

responsesresponses– ““Missing values”, “refusals”, “don’t know” and Missing values”, “refusals”, “don’t know” and

“not applicable” numeric codes are often “not applicable” numeric codes are often assignedassigned

– Not consistent in the numeric codes usedNot consistent in the numeric codes used– Numeric codes that to most software would Numeric codes that to most software would

seem to be valid responseseem to be valid response

Page 9: Misinterpretation of data, the importance of metadata and STC math Misinterpretation of data, the importance of metadata and STC math DLI Atlantic Training

MetadataMetadata

STC Policy on Informing Users of Data QualitySTC Policy on Informing Users of Data Quality

In place since 1978In place since 1978

Tightened up 2000 in response to 1999 AG reportTightened up 2000 in response to 1999 AG report

Recognition that “All statistics are to some extent Recognition that “All statistics are to some extent estimates”estimates”

Statistics to be used with awareness of strengths and Statistics to be used with awareness of strengths and weaknesses – “fitness for use”weaknesses – “fitness for use”

Key tool is the Integrated Meta DatabaseKey tool is the Integrated Meta Database

(Definitions, data sources and methods)(Definitions, data sources and methods)

Page 10: Misinterpretation of data, the importance of metadata and STC math Misinterpretation of data, the importance of metadata and STC math DLI Atlantic Training

MetadataMetadata

Important to find STC metadata and use itImportant to find STC metadata and use it

Definitions, Data Sources and Methods Definitions, Data Sources and Methods – Questionnaire and reporting guides Questionnaire and reporting guides

Survey DescriptionSurvey Description

Data sources and methodologyData sources and methodology

Data AccuracyData Accuracy

DocumentationDocumentation

Contact usContact us

Page 11: Misinterpretation of data, the importance of metadata and STC math Misinterpretation of data, the importance of metadata and STC math DLI Atlantic Training

Definitions, Data Sources and MethodsDefinitions, Data Sources and Methods

Page 12: Misinterpretation of data, the importance of metadata and STC math Misinterpretation of data, the importance of metadata and STC math DLI Atlantic Training

Online CatalogueOnline Catalogue Canadian Community Health Survey: public use microdata Canadian Community Health Survey: public use microdata file: Product main pagefile: Product main page

Page 13: Misinterpretation of data, the importance of metadata and STC math Misinterpretation of data, the importance of metadata and STC math DLI Atlantic Training

DLI WebsiteDLI WebsiteDLI - Canadian Community Health Survey Cycle 1.1DLI - Canadian Community Health Survey Cycle 1.1

DLI listserv: Ask and we will find out from the Division!DLI listserv: Ask and we will find out from the Division!

Page 14: Misinterpretation of data, the importance of metadata and STC math Misinterpretation of data, the importance of metadata and STC math DLI Atlantic Training

Data Quality SymbolsData Quality Symbols

. not available for any reference period

.. not available for a specific reference period

... not applicable

p preliminary

r revised

x suppressed to meet the confidentiality requirements of the Statistics Act

A, B, C, D

specific levels of data quality*

E use with caution

F too unreliable to be published

0 true zero or a value rounded to zero

0s value rounded to 0 (zero) where there is a meaningful distinction between true zero and the value that was rounded

Page 15: Misinterpretation of data, the importance of metadata and STC math Misinterpretation of data, the importance of metadata and STC math DLI Atlantic Training

Use metadata to avoid key pitfallsUse metadata to avoid key pitfalls

Collection methodologyCollection methodologyQuestionnaireQuestionnaireData quality: sample size, response ratesData quality: sample size, response ratesDefinitionsDefinitionsConceptual changes Conceptual changes Survey coverageSurvey coverageReweighting/rebasingReweighting/rebasing

Page 16: Misinterpretation of data, the importance of metadata and STC math Misinterpretation of data, the importance of metadata and STC math DLI Atlantic Training

STC MathSTC Math

Random roundingRandom rounding

Percentages and percentage pointsPercentages and percentage points

Central tendencies Central tendencies (mean, median and mode)(mean, median and mode)

Current vs constant dollarsCurrent vs constant dollars

Raw vs seasonally adjustedRaw vs seasonally adjusted