the challenge of integrating new surveys into an existing business survey infrastructure
DESCRIPTION
The Challenge of Integrating New Surveys into an Existing Business Survey Infrastructure. Éric Pelletier Statistics Canada ICES-III Montréal, Québec, Canada June 18-21, 2007. Outline. Introduction to the Unified Enterprise Survey (UES) Culture surveys environment Integration steps to UES - PowerPoint PPT PresentationTRANSCRIPT
The Challenge of Integrating New Surveys into an Existing Business Survey
Infrastructure
Éric Pelletier
Statistics Canada
ICES-III Montréal, Québec, Canada
June 18-21, 2007
2
Outline
Introduction to the Unified Enterprise Survey (UES)Culture surveys environmentIntegration steps to UESFrom Culture to UES: Frame, sampling, etc.
Special case: Film Production surveyUES Estimation process
Back-casting for the previous two yearsConclusion and future work
3
Unified Enterprise Survey (UES)
UES comprises many business surveys which use unified concepts and processes
1997: 7 surveys
…
2005: 45 surveys
2006: 54 surveys
2007: 62 surveys
The goal of UES: produce reliable estimates at the provincial and industrial levels
4
Objectives of the UES
Promote an increasing use of tax data
Reduce the cost of the surveys
Reduce the response burden
Produce estimates for the financial variables (revenue, expenses, salaries and wages, etc.) and non-financial variables for all UES industrial sectors
5
UES Sampling Process
Sampling frame: Business Register of Statistics Canada (list of establishments)
Sampling unit: Within a given enterprise, a cluster of establishments within the same province and industrial group
For example: establishments A and B in the same province and industry sampling unit
Simple units (activity in one province and one industry) and complex units
6
UES Sampling Process
Stratification: Province, Industry, Revenue
Strata1 take-all stratum
2 take-some strata
1 take-none stratum below thresholds, tax data
Exclusion thresholdsDelimit the take-none units from the take-some units (no questionnaire is sent to the take-none)
7
UES Sample Design
T2 (corporations)T1 (unincorporated)
Take-alls
Take-some
Take-none
Survey
Tax
Stratum=2
Stratum=1
8
UES schedule
For example, for reference year 2006 (RY2006):Sampling: October 2006
Collection: February to October 2007
Edit & Imputation: July 2007 to December 2007
Estimation: November 2007 to March 2008
The estimates are produced within 15 months (January 2007 to March 2008)
The estimation is done one year after the selection of the sample
9
Culture surveys environment
‘Activity’ based frames (e.g. list of books)
Census surveysOccasional surveys (annual surveys, not necessarily every year) Maintained by Culture Division
The Culture Streamlining Initiative was put in place to reduce the duplication in annual survey processes while promoting the use of the business survey infrastructure
10
Culture environment versus UES environment
In the UES, the frame is based on industrial structure (economic survey) rather than activity (e.g. list of books, list of films, etc.)
For the analysts, it’s a change in the way they are analysing the data
More flexibility in the UES environment
All the steps of a survey were compared to facilitate the integration
11
Advantages of the integration to UES
Common methodologies for all annual enterprise surveys
Possible to adapt some of the parameters for the needs of the surveys (at the sampling, imputation or estimation process)
Infrastructure was established in 1997 with the Enterprise Statistics Division
Relatively easy to integrate new surveys
12
Integration of surveys into UES
Two sets of surveys:“Wave 1” surveys in RY2006 (Book Publishers, Heritage Institutions and Performing Arts)
“Wave 2” surveys in RY2007 (Film Distribution, Film Production, Film Post-Production, Movie Theatres and Sound Recording)
Integration in two steps:Step 1: From culture environment to industry-based survey, the years before UES (called “UES_lite”)
Step 2: Integration to UES
13
Integration schedule
RY2004 RY2005 RY2006 RY2007
“Wave 1” surveys
UES_lite UES_lite UES UES
“Wave 2” surveys
Culture UES_lite UES_lite UES
14
“UES_lite” environment
Concepts are similar to the UES surveys
The processing is done outside the UES infrastructure
The surveys are processed by the subject matter division and the methodology division
As opposed to UES processing, which is primarily handled by another Statistics Canada division called the Enterprise Statistics Division
15
From Culture to UES
Sampling, Frame:1. Culture: Census - ‘Activity’ based
2. “UES_lite”: Sample - Establishments
3. UES: Sample - Establishments within the same enterprise, same province, same industry code
The analysts were able to create reconciliation files between the frames
Some other minor differences
16
Special case:Film Production survey
Collection: Special case with the Film Production survey for RY2005
The Business Register (BR) is not up-to-date enough for this survey
Links were discovered between the sampled establishments and establishments outside the sampling frame
17
Special case:Film Production survey
Pre-contact was done for all the units
Approximately 400 units were added to the sample (these units were not on the Business Register)
Indirect sampling was used to address this problem
A different estimation program was created for this survey
18
UES and “UES_lite”Estimation Process
Total estimate = Survey portion + Tax portion
Survey portion:Horvitz-Thompson estimator
Outlier detection and treatment
Final weight calculation
Tax portion (take-none portion):Below the exclusion thresholds: Tax data
Domain estimations: Industry, Province, etc.
Variance and coefficient of variation (CV)
19
Special case:Film Production survey
Estimation: The Film Production survey RY2005 was a special case
Due to the application of indirect sampling, the inverse probability method was implemented (see Choudhry (2006))
Without going into all the details,The inverse probability method determines the probability that at least one sampling unit on the frame which leads to the reporting unit would be sampled
The base weight is computed as the inverse of the selection probability
20
Special case:Film Production survey
The complex weighting procedure led to the use of replicates in estimating the variance of the estimates
More precisely, the jackknife replication method is used to calculate the variance
The estimates will be produced within the next few weeks: the release date for RY2005 is July 2007 (same release date as the other Wave 2 surveys), a little bit behind schedule…
21
Special case:Film Production survey
The Film Production survey for RY2007 (integration year in UES) could not be put into the UES process because:
Cost of the post-selection additionsTimelinessDifferent processes, like the jackknife replication method for the variance calculations
Instead of the inverse probability method, the weight share method will be used
With this method, we assign an average weight based on the sampled units and the number of links
22
Special case:Film Production survey
The weight share method cannot be integrated directly into the UES process
A way to integrate the weight share method into the UES process was derived (see Beaumont (2007))With this, it will be adaptable to the regular UES estimation programThe difference from the inverse probability method is that with the weight share method, we expect a slight increase in the varianceThis “special” integration will be done at the end of 2007 / beginning of 2008
23
Estimation – Back-casting
For RY2005 (first year in “UES_lite) for the “Wave 2” surveys, the previous estimates were produced in the Culture environmentAs was previously shown, the frame is different for RY2005 (Business Register)Potential break in the seriesBack-casting procedure is used to reproduce historical estimates using the Business Register
24
Estimation – Back-casting
Back-casting is done for the two previous reference years (for example, RY2003 and RY2002)A match between the units from the RY2005 sample and the units from the previous culture files is done using the reconciliation filesIf the unit is not matched to the previous year’s culture files, the data is imputed
25
Estimation – Back-casting
Adjustments to the weights will be done based on the population counts from the Business Register for the two back-casting years (for example, RY2003 and RY2002)
Estimates are produced by domains, and the CV are calculated for the two back-casting years for the “Wave 2” surveys (released date is July 2007)
26
Infrastructure - Processing
One of the main challenges in the integration of those surveys is the communication between the three parties:
Methodology division (responsible for the survey methods)Subject matter division (responsible for the content, the analysis and the publication)Enterprise Statistics Division (responsible for the business survey infrastructure)
Started in October 2006, the process will be completed in March 2009
27
Conclusion and future work
Presently, three “Wave 1” surveys are being integrated into UES for RY2006 (sample was selected in October 2006, estimation is being prepared)Next year, for RY2007, the “Wave 2” surveys will be integratedBecause of the infrastructure, some modifications will be made to the UES estimation program for the Film production survey, in order to integrate this survey into UES
28
Thanks
Special thanks to everyone who worked on those surveys, and who helped me in the preparation of this presentation
Pour plus d’information, veuillez contacter
For more Information please contact
Visit our web site atwww.statcan.ca
Éric Pelletier(613) 951-5213