(policy) research with confidential micro data eric j. bartelsman vrije universiteit amsterdam...

33
(Policy) research (Policy) research with confidential micro data with confidential micro data Eric J. Bartelsman Eric J. Bartelsman Vrije Universiteit Amsterdam Vrije Universiteit Amsterdam Tinbergen Institute Tinbergen Institute Expertenworkshop Ondernemingsdata in België Expertenworkshop Ondernemingsdata in België Brussels, September 25 2009 Brussels, September 25 2009

Upload: doreen-mosley

Post on 29-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

(Policy) research (Policy) research with confidential micro datawith confidential micro data

Eric J. BartelsmanEric J. BartelsmanVrije Universiteit Amsterdam Vrije Universiteit Amsterdam

Tinbergen InstituteTinbergen Institute

Expertenworkshop Ondernemingsdata in BelgiëExpertenworkshop Ondernemingsdata in BelgiëBrussels, September 25 2009Brussels, September 25 2009

Page 2: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

OverviewOverview

• Benefits of using linked longitudinal firm-Benefits of using linked longitudinal firm-level datasetslevel datasets

• International experienceInternational experience• Modes of access to confidential firm-level Modes of access to confidential firm-level

datasetsdatasets

Page 3: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

Benefits of using firm-level Benefits of using firm-level datadata

• Improving quality of statistics Improving quality of statistics • Testing of theories at firm-levelTesting of theories at firm-level• Providing ‘moments’ for modellingProviding ‘moments’ for modelling• Policy evaluationPolicy evaluation

Page 4: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

Benefits of using firm-level Benefits of using firm-level datadata

• Improving quality of statistics Improving quality of statistics • Assessing quality of published statsAssessing quality of published stats

• New uses for old dataNew uses for old data

• Uncovering new collection methods and new data Uncovering new collection methods and new data needsneeds

• Testing of theories at firm-levelTesting of theories at firm-level• Providing ‘moments’ for modellingProviding ‘moments’ for modelling• Policy evaluationPolicy evaluation

Page 5: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

Data QualityData Quality

• In-house use at National Stats Office (NSO):• Consistency in x-sect and longitudinal• Integration: top-down vs bottoms-up

• External users: • quality improvement criteria• Systematic learning from external users

Page 6: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

New uses for ‘old’ dataNew uses for ‘old’ data

• Linking of multiple sourcesLinking of multiple sources• link NSO surveys to Business Registerlink NSO surveys to Business Register• cross-linking with other registerscross-linking with other registers

• Housing, transport, labor, taxHousing, transport, labor, tax

• Linking with external surveysLinking with external surveys

• Creation of new indicators from linked dataCreation of new indicators from linked data• Gross FlowsGross Flows• Higher moments; CorrelationsHigher moments; Correlations• New disaggregationsNew disaggregations

• Subsamples: region, industry, size, typeSubsamples: region, industry, size, type

Page 7: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

New collection methodsNew collection methods

• Links to registers allows for mass imputation of Links to registers allows for mass imputation of small samplessmall samples

• Collection of data at ‘transactions’ siteCollection of data at ‘transactions’ site• New types of info from linking disparate sourcesNew types of info from linking disparate sources

• Example: linked geographic info for disaster planning.Example: linked geographic info for disaster planning.

Page 8: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

Uncovering data needsUncovering data needs

• Micro-level research reveals useful indicatorsMicro-level research reveals useful indicators• Employment gross flows (US/BLS)Employment gross flows (US/BLS)• Firm demographics (Eurostat)Firm demographics (Eurostat)

• Interactions with external researchers Interactions with external researchers improves understanding of users needs at improves understanding of users needs at NSOsNSOs

• Gaps in available data are identified through Gaps in available data are identified through researchresearch

Page 9: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

Benefits of using Firm-level dataBenefits of using Firm-level data

• Improving quality of statistics• Testing of theories at firm-level

• Firm-level data now used in many fields: IO, Trade, Labor, Finance, Management, Organization, Macro

• Recent improvements in modelling heterogeneous firms• Variation in costs (… of learning, transport, etc)• Usually representative consumer, constant mark-up

• Application of econometric techniques (GMM, clever instruments) to cope with endogeneity

• Providing ‘moments’ for modellingProviding ‘moments’ for modelling• Policy evaluationPolicy evaluation

Page 10: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

Benefits of using Firm-level dataBenefits of using Firm-level data

• Improving quality of statistics• Testing of theories at firm-level

• Providing ‘moments’ for modellingProviding ‘moments’ for modelling• Information drawn from linked longitudinal firm-level

distributions can be used to calibrate models.

• Especially the ability to do cross-country comparisons is promising

• Policy evaluation

Page 11: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

Benefits of using Firm-level dataBenefits of using Firm-level data

• Improving quality of statistics• Testing of theories at firm-level• Providing ‘moments’ for modelling• Policy evaluationPolicy evaluation

• Individual decision making units respond to policyIndividual decision making units respond to policy• Track decisions and outcomes from longitudinal micro dataTrack decisions and outcomes from longitudinal micro data• No need to infer result from movement in aggregateNo need to infer result from movement in aggregate

• Identification requires a control groupIdentification requires a control group• Implementation of policy differ across cells (locations, between types of units, Implementation of policy differ across cells (locations, between types of units,

or over time)or over time)• Effect of policy differs across cells (ie highways affect transport-intensive Effect of policy differs across cells (ie highways affect transport-intensive

firms)firms)

• Cross-country comparisons for identificationCross-country comparisons for identification

Page 12: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

International Experience

History of micro data access:– Stats Norway: early 1970s– US Census: late 1980s

Typical attitude of NSO before allowing access– Micro data is too difficult, You can’t really do that with data, and,

we don’t trust you to use the data, Absolute security is required– Well, maybe we can think of how to allow access….

Now: At least 25 NSOs have facilities for micro data research– Also, they use the backbone as basis of statistical process:

enormous gains in productivity

Page 13: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

International Experience

Situation in EU countries– Business Register, VAT register, SS register, Business

Surveys

– Some have on-site, others have remote access: Fin, Swe, Dnk, UK, Nld, Slo, Est, Some have excellent in house research: Fra

In other countries a variety of situations: ad-hoc sharing of data, on-site, trusted third part)

Page 14: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

Modes of access to confidential Modes of access to confidential micro datamicro data

• Research shop within stats agencyResearch shop within stats agency• On-site facility with access rules for On-site facility with access rules for

external researchersexternal researchers• Secure remote-access for external Secure remote-access for external

researchersresearchers• Remote executionRemote execution• Distributed micro data analysisDistributed micro data analysis

• how to share unsharable datahow to share unsharable data

Page 15: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

Issues to considerIssues to consider

Absolute certainty about confidentiality of Absolute certainty about confidentiality of datadata

Uniqueness of published official statisticsUniqueness of published official statistics Requirements for accessRequirements for access Resource cost sharingResource cost sharing

Page 16: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

Confidentiality

Must weigh costs and benefits – What is ‘cost’ of confidential data being released

Relate to costs of not allowing access to data: Increasing irrelevance of stats agency and hopefully extreme budget cuts

– Don’t just look at technical side of disclosure What is likelihood of malice or fraud Look at ease of getting same or better confidential data

elsewhere

Page 17: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

Uniqueness

The ‘one published number’ view of stats agencies conflicts with reliability– We all know numbers don’t add up and that different

assumptions generate different stats. So, openness, replicability, review, robustness testing by others will enhance reputation of stats agency publications

Research output can be labelled as such with a disclaimer

Page 18: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

Requirements for access

Create (legal) framework for allowing access by external researchers– Screening of projects and research teams – Special employee status

Create technical facilities– Database architecture– Meta data– On-site laboratory– Remote-access facilities

Page 19: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

Distributed Micro Data ResearchDistributed Micro Data Research

• Distributed Micro Data research was developed to Distributed Micro Data research was developed to allow cross-country research using confidential allow cross-country research using confidential firm-level data that could not be combinedfirm-level data that could not be combined

• The key is to ‘micro-aggregate’ underlying micro The key is to ‘micro-aggregate’ underlying micro data into cells that pass disclosure and data into cells that pass disclosure and • Provide enough information for further analysis, and/orProvide enough information for further analysis, and/or• Can be merged at cell-level with other sourcesCan be merged at cell-level with other sources

• DMD can be viewed as system to allow customer-DMD can be viewed as system to allow customer-driven publication of statisticsdriven publication of statistics

• ‘‘Moments’ are useful for economic modellingMoments’ are useful for economic modelling

Page 20: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

•SC LMD

EUKLEMS

Longitudinal Micro Data

National Accounts Industry Data

Single countryMacro and

Sectoral Timeseries

Surveys,Business Registers

Multiple countries

N.A.

Data for Cross-countryFirm-level Analysis

•DMD

EUKLEMS+

Page 21: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

Provision of metadata.Approval of access.Execution of CodeDisclosure analysis

of DMD tables.Disclosure analysis of Publication

Res

earc

her

Policy QuestionResearch Design Program Code

Publication

Res

earc

hN

etw

ork Metadata

Networkmembers

DMDTables

NS

Os

Distributed Micro Data Analysis

Page 22: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

DMD ProjectsDMD Projects

OECD 2000-2003OECD 2000-2003 World Bank 2006World Bank 2006

– Followup 2009-2011Followup 2009-2011

EU/NL 2007EU/NL 2007 Eurostat ICT Impacts 2008-2009Eurostat ICT Impacts 2008-2009

– Followup 2010Followup 2010

Page 23: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

Analytical uses of DMD datasetsAnalytical uses of DMD datasets

• Creation of new indicators from linked dataCreation of new indicators from linked data• Definition of cells based on complex longitudinal characteristicsDefinition of cells based on complex longitudinal characteristics

• e.g.Employer-employee matchede.g.Employer-employee matched

• ‘‘Event’ studies (tracking sub-populations based on prior characteristics)Event’ studies (tracking sub-populations based on prior characteristics)• Indicators may be high-moments, correlations, regression coefficients, etc.Indicators may be high-moments, correlations, regression coefficients, etc.

• e.g. correlation of profitability and employee gender-ratio, by industry, region e.g. correlation of profitability and employee gender-ratio, by industry, region and timeand time

• Linking of outside data sources at cell-levelLinking of outside data sources at cell-level• Generate custom tabulations of data to match cells of other published or Generate custom tabulations of data to match cells of other published or

DMD datasetsDMD datasets• e.q. labor force gender-ratio by region and timee.q. labor force gender-ratio by region and time

• Cross-country analysis with panels with the same cell level Cross-country analysis with panels with the same cell level definitionsdefinitions

Page 24: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

Uses of DMD for Policy Uses of DMD for Policy EvaluationEvaluation

• Individual decision making units respond to Individual decision making units respond to policypolicy

• Track decisions and outcomes from longitudinal micro dataTrack decisions and outcomes from longitudinal micro data• No need to infer result from movement in aggregateNo need to infer result from movement in aggregate

• Identification requires a control groupIdentification requires a control group• Implementation of policy differ across cells (locations, Implementation of policy differ across cells (locations,

between types of units, or over time)between types of units, or over time)• Effect of policy differs across cells (ie highways affect Effect of policy differs across cells (ie highways affect

transport-intensive firms) transport-intensive firms)

Page 25: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

Implementing efficient firm-level Implementing efficient firm-level data analysisdata analysis

• Technical facilitiesTechnical facilities• Meta-data librariesMeta-data libraries• Disclosure analysis and rules for re-use of Disclosure analysis and rules for re-use of

extracted datasetsextracted datasets

Page 26: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

Technical FacilitiesTechnical Facilities

• Back-bones for universe of statistical unitsBack-bones for universe of statistical units• Firms, Households, Dwellings, etcFirms, Households, Dwellings, etc

• Relational database organisation of data and meta-Relational database organisation of data and meta-datadata

• Statistical tools inside relational database Statistical tools inside relational database programming environmentprogramming environment

• Remote access or remote executionRemote access or remote execution• Remote access allows data visualisation, interactive data checkingRemote access allows data visualisation, interactive data checking

Page 27: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

Meta-dataMeta-data

Ideal application of meta-dataIdeal application of meta-data– Be able to write generic code remotelyBe able to write generic code remotely

– Convert code to run locally, using meta-dataConvert code to run locally, using meta-data

Meta-data set up to describeMeta-data set up to describe– available datasetsavailable datasets

– unique record identifiersunique record identifiers

– classificationsclassifications

– ‘‘economic variables’economic variables’

Page 28: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

Necessary meta-dataNecessary meta-data

list of available forms and scheduleslist of available forms and schedules info on record identifiers (Firm_id, person_id) info on record identifiers (Firm_id, person_id) info on ‘economic variables’info on ‘economic variables’ info on classificationsinfo on classifications concordances between units concordances between units concordances between variablesconcordances between variables concordances to standard classificationsconcordances to standard classifications

Page 29: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

Underlying Metadata: Underlying Metadata: datasourcesdatasources

Survey Survey TypeType

NameName Unique keysUnique keys LocationLocation

BRBR GenBusRegGenBusReg FID, yearFID, year G:\dirxG:\dirx

PSPS SBS_yyyySBS_yyyy FIDFID G:\diryG:\diry

ECEC ECS_yyyyECS_yyyy FIDFID G:\dirzG:\dirz

ISIS InvS_yyyyInvS_yyyy FIDFID G:\dirzG:\dirz

Page 30: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

Underlying Metadata: Underlying Metadata: variables in surveyvariables in survey

NameName DescriptionDescription UnitsUnits DomainDomain

FIDFID Unique FirmIDUnique FirmID stringstring GBRGBR

IndCIndC Detailed Detailed industry codeindustry code

stringstring ISIC r3 ISIC r3

Q1Q1 Use of ITUse of IT integerinteger YNMYNM

PurchSPurchS Software ExpSoftware Exp Eur (1000)Eur (1000)

ECS_1999

Page 31: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

Underlying Metadata: Underlying Metadata: classifications of domainsclassifications of domains

IndCIndC DescriptionDescription

TOTTOT Total EconomyTotal Economy

AGAG Agriculture, Fishing, ForestryAgriculture, Fishing, Forestry

0101 FarmsFarms

MFGMFG ManufacturingManufacturing

27t3527t35 DurablesDurables

2727 Basic MetalsBasic Metals

ISICr3

Page 32: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

Underlying Metadata: Underlying Metadata: ConcordancesConcordances

IndCIndC ICTindICTind

0101 OtherOther

……

1212 OtherOther

……

2727 27a827a8

2828 27a827a8

IndC_ICTind

Page 33: (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in

Disclosure AnalysisDisclosure Analysis

• Can be fairly automated, based on cell-count and Can be fairly automated, based on cell-count and ‘concentration’‘concentration’

• Further, rules may be instated about further use of Further, rules may be instated about further use of DMD dataset. For example, requirement that DMD dataset. For example, requirement that dataset be erased after use will reduce worries dataset be erased after use will reduce worries about secondary disclosure.about secondary disclosure.

• Checking may also be required on final Checking may also be required on final publicationpublication