statistical methodology research and development program

21
Catalogue no. 12-206-X ISSN 1705-0820 Statistical Methodology Research and Development Program 2013-2014 Achievements Statistical Methodology Research and Development Program Achievements

Upload: others

Post on 03-Jan-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical Methodology Research and Development Program

Catalogue no. 12-206-X ISSN 1705-0820

Statistical Methodology Research and Development Program 2013-2014 Achievements

Statistical Methodology Research and Development Program Achievements

Page 2: Statistical Methodology Research and Development Program

Standard table symbolsThe following symbols are used in Statistics Canada publications:

. not available for any reference period

.. not available for a specific reference period

... not applicable 0 true zero or a value rounded to zero 0s value rounded to 0 (zero) where there is a meaningful distinction between true zero and the value that was rounded p preliminary r revised x suppressed to meet the confidentiality requirements of the Statistics Act E use with caution F too unreliable to be published * significantly different from reference category (p < 0.05)

How to obtain more informationFor information about this product or the wide range of services and data available from Statistics Canada, visit our website, www.statcan.gc.ca. You can also contact us by email at [email protected] telephone, from Monday to Friday, 8:30 a.m. to 4:30 p.m., at the following numbers:

• Statistical Information Service 1-800-263-1136 • National telecommunications device for the hearing impaired 1-800-363-7629 • Fax line 1-514-283-9350

Depository Services Program

• Inquiries line 1-800-635-7943 • Fax line 1-800-565-7757

Note of appreciationCanada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the publicStatistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, Statistics Canada has developed standards of service that its employees observe. To obtain a copy of these service standards, please contact Statistics Canada toll-free at 1-800-263-1136. The service standards are also published on www.statcan.gc.ca under “Contact us” > “Standards of service to the public.”

Published by authority of the Minister responsible for Statistics Canada

© Minister of Industry, 2014

All rights reserved. Use of this publication is governed by the Statistics Canada Open Licence Agreement.

An HTML version is also available.

Cette publication est aussi disponible en français.

Page 3: Statistical Methodology Research and Development Program

Statistical Methodology Research and Development Program2013-2014 AchievementsThis report summarises the 2013-2014 achievements of the Methodology Research and Development Program (MRDP) sponsored by the MethodologyBranch at Statistics Canada. This program covers research and development activities in statistical methods with potentially broad application in theAgency’s survey programs; these activities would not otherwise be carried out during the provision of methodology services to those survey programs.The MRDP (Methodology Research and Development Program) also includes activities that provide client support in the application of past successfuldevelopments in order to promote the use of the results of research and development work. Contact names are provided for obtaining more informationon any of the projects described. For further information on the MRDP (Methodology Research and Development Program) as a whole, contact

Mike Hidiroglou (613-951-0251, [email protected]).

Statistics Canada, Catalogue no. 12-206-X 3

Page 4: Statistical Methodology Research and Development Program

Research Projects

Research, Development and consultation in SRID (Statistical Research and Innovation Division)The Statistical Research and Innovation Division (SRID) was created within the Methodology Branch on June 21, 2006. SRID (Statistical Research andInnovation Division) is responsible for researching, developing, promoting, monitoring and guiding the adoption of new and innovative techniques instatistical methodology in support of Statistics Canada’s statistical programs. Its mandate also includes the provision of technical leadership, advice andguidance to employees elsewhere in the Methodology and Research Development. This assistance is in the form of advice for methodological problemsthat arise in existing projects or during the development of new projects. SRID (Statistical Research and Innovation Division) is also jointly involved withother employees in research projects sponsored by the Methodology.

The Statistical Research and Innovation Division (SRID) was involved in many research, development and consultation projects in 2013-2014. Inparticular, SRID (Statistical Research and Innovation Division) staff has made significant contributions in estimation, small area estimation and timeseries techniques. Details of the progress made are incorporated into the review of the research topics reported later.

Mike Hidiroglou, Victor Estevao, and Yong You completed the specifications for the development of small-area estimation (SAE) software underHierarchical Bayes (HB) methods was completed. Based on these specifications, two new HB (Hierarchical Bayes) methods of estimation (anunmatched log census undercount model and a log linear model (with known variances) were programmed and tested. These procedures were added tothe suite of procedures included in the small area prototype being developed by SRID (Statistical Research and Innovation Division). The documentationsupporting small-area estimation was updated by adding the methodology of these two new HB (Hierarchical Bayes) methods and creating a user guidefor them with examples.

Francois Laflamme continued his work on data collection along several fronts that included: i. the use of responsive collection design for CATI Surveys;ii. The monitoring of Electronic Questionnaire (EQ) based surveys in a multi-mode context; iii. Building a new dashboard that provides a summary of thesurvey progress and performance using keys indicators broken out into a categories (i.e.: regional office, response rates, budget system time, systemtime spent, % of the budget system time spent,..); iv, Carrying out an analysis using the address register to measure the impact of using the new frameon responses rates for GSS27SI (General Social Survey Social Identity, 2013 Main Survey) and GVP (General Social Survey - Giving, Volunteering andParticipating) A number of presentations were made at conferences and to various committees: a number of articles were written and presented on theresulting research.

Takis Merkouris worked on estimation along several fronts that included: i. optimal linear estimation in two-phase sampling. Here, an estimation methodof optimal efficiency has been developed, involving a single-step calibration of the first-phase and second-phase sampling weights; ii. Data integration. Acalibration procedure for “integrating” efficiently data across several files was developed; iii. Pseudo-optimal regression estimation. The objective of this

Statistics Canada, Catalogue no. 12-206-X 4

Page 5: Statistical Methodology Research and Development Program

project is to provide a practical alternative to the intractable optimal regression estimation, through a calibration procedure that is more efficient than thecustomary GREG calibration. This work is done in collaboration with Mike Hidiroglou.

Susana Rubin-Bleuer was involved in several Small Area Estimation (SAE) research and consultation projects, including a feasibility study to produceestimates for 212 small domains required by the System of National Accounts with data from the Research and Development in Canadian Industry. Forthis study Susana developed and implemented a SAE (small-area estimation) specific method to remove outliers and then used the SAE (small-areaestimation) system to obtain the estimates. Susana developed and delivered a one-day intensive course on Small Area Estimation with the objective tohelp survey methodologists understand basic issues in SAE (small-area estimation) and to learn how to implement SAE (small-area estimation) in therespective workplaces. Susana also worked on a cross-sectional and time-series model for business surveys and (see the section on Small AreaEstimation) on Extensions of the Pseudo-EBLUP estimator with application to Survey of Employment, Hours and Earnings (SEPH) and on Ensuring apositive variance estimate for the Fay-Herriot area level model with co-authors.

Harold Mantel and Mike Hidiroglou developed ideas for using one-stage sampling in the Canadian Labour Force Survey. They put forth proposals forone stage sampling designs that addressed efficiency problems that would not disturb estimates of change.

Jean-François Beaumont was involved in a number of research projects. In particular, with co-authors, he studied a simplified variance estimator for two-phase sampling and he developed a Winsorization method. The latter can be viewed as a way of implementing the estimator of Beaumont, Haziza andRuiz-Gazen (2013), which is used to limit the impact of influential values. Jean-François was also involved in two bootstrap projects and a project onweighting late respondents (see the section on Sampling and Estimation) as well as a project on an adaptive call prioritization procedure (see section onData Collection).

Benoit Quenneville (2013) developed a procedure for estimating the variance of the de-seasonalized series that are produced by X-12-ARIMA. Thisvariance is both model based (it accounts for the X-12 ARIMA processes) and design based (it incorporates the sampling aspect of the series).

In addition to participating in the research activities of the Methodology Research and Development Program (MRDP) as research project managers andactive researchers, SRID (Statistical Research and Innovation Division) staff was involved in the following activities:

Staff advised members of other methodology divisions on technical matters on the following topics:Mike and his staff continued to provide advice on the sampling and estimation procedures of the Annual Business Surveys. This advice wasgiven at the regular meetings of the Integrated Business Statistics Program (IBSP) Steering Committee as well as at meetings of theTechnical committee on business surveys.IBSP: several consultations (J.-F. Beaumont) were made on how to use SEVANI/G-EST for IBSP. A short document for the issue ofestimating the sampling variance with a two-phase sample when imputation has been used to fill in missing values.Canadian Community Health Survey (CCHS)/Health surveys: SRID (Statistical Research and Innovation Division) staff was consulted on howto account for the sample design when carrying out complex analyses.Many consultations or methodological support were made (François Laflamme) to senior management in Field 7, to Collection, Planning andManagement Division (CPMD) projects managers (that require immediate attention) to special and important corporate projects (ICOS) andto methodologists/subject with paradata research (e.g., LFS EQ pilot) or data collection analysis (on average 4-5 consultation demands eachmonth).Victor Estevao and Yong You were consulted on small area estimation by a Chinese delegation from the Chinese Central Bureau ofStatistics. They provided advice, and made a number of presentations to the delegation. The SAE (small-area estimation) prototypedeveloped in SRID ( Statistical Research and Innovation Division) was demonstrated to the delegation, and copy of the prototype was sent tothem so that they could experiment with it on their data sets.Jean-François Beaumont (as well as Christian Nadeau, Johanne Tremblay and Wisner Jocelyn) was consulted by a Chinese delegation fromthe Central Bureau of Statistics on the use of tax data for calibration purposes.Jean-Francois Beaumont provided advice on the generalized bootstrap for CCHS and postcensal surveys.Harold Mantel was consulted on how to account for the sample design when carrying out complex analyses in the CCHS / Health survey.Harold Mantel co-operated with members of the other methodology divisions on how to best produce confidence intervals for proportions.This cooperation resulted in a document that reviewed methods and empirical evaluations and comparisons in the literature, and discussedthe suitability of the methods for use with complex survey data. A tentative recommendation was to use bootstrap percentile method wherefeasible, and to use a modified Clopper-Pearson method when the observed proportion is 0 or 1.Mike Hidiroglou and Victor Estevao collaborated with Christian Nambeu (Census and Household Survey Division) to adjust the Census 2011counts using the Reverse record results. They used the SAE (small-area estimation) prototype to produce the required adjustments.

SRID (Statistical Research and Innovation Division) participated in three presentations at the Advisory Committee on Statistical Methods:Mike Hidiroglou and Victor Estevao presented a three-phase procedure for dealing with nonresponse using follow-up in April 2013. Thispaper was stimulated for the handling non-response to the National Household Survey.Francois Laflamme presented a unified procedure for carrying out Responsive Collection Design Framework for Multi-Mode Surveys inNovember 2013.Jean- Francois Beaumont, Cynthia Bocci, and Mike Hidiroglou studied a number of weighting procedures for late respondents. This paperwas stimulated by the handling of late respondents to the National Household Survey. It was presented at the Advisory Committee onStatistical Methods in early May 2014.

SRID (Statistical Research and Innovation Division) consulted with members of the Methods and Standards Committee, as well as with a numberof other Statistics Canada managers, in determining the priorities for the research program.

Staff continued activities in different Branch committees such as the Branch Learning and Development Committee, and the Methodology BranchInformatics Committee. In particular, they have participated actively in finding and discussing papers of the month.Mike Hidiroglou participated as member of the Census Bureau Scientific Advisory Committee (CSAC) in the fall of 2013 and spring of 2014.Jean-François Beaumont made two presentations on the analysis of survey data using the bootstrap. The first presentation was given at the ItalianConference on Survey Methodology in Milan and the second presentation was given at the Joint Statistical Meetings in Montréal.Susana Rubin-Bleuer presented an invited paper on the micro-strata methodology to control sample overlap at the 2013 Annual Statistical Societyof Canada Conference.

Statistics Canada, Catalogue no. 12-206-X 5

Page 6: Statistical Methodology Research and Development Program

Jean-François Beaumont co-presented two workshops at the ISI conference in Hong Kong. The first workshop was on business survey methodswhereas the second workshop was on edit and imputation of survey data.SRID (Statistical Research and Innovation Division) continued to actively support the Survey Methodology Journal. Mike Hidiroglou has been theeditor of the Survey Methodology Journal since January 2010. Five people within SRID (Statistical Research and Innovation Division) contribute tothe journal: one is associate editor, three others are assistant editors.Jean-François Beaumont is Chair of the Scientific Committee of the Symposium 2014. A preliminary program has been established.Mike Hidiroglou and Jean-François Beaumont co-authored a paper on the contributions of Statistics Canada to survey methodology that waspublished in a Chapman & Hall book edited by J.F. Lawless.Staff authored or co-authored several papers that are listed at the end of this report (RESEARCH PAPERS SPONSORED BY THE MRDP).

For further information, contact: Mike Hidiroglou (613 951-0251, [email protected] ).

Edit and ImputationThe main goals of the edit and imputation research are to: i) develop new edit and imputation methods to handle issues encountered in statisticalprograms; ii) compare existing methods theoretically or empirically; and ii) develop computer tools that make it possible to implement best methods instatistical programs. Two projects were conducted this fiscal year. The first project consisted of improving a computer tool for outlier detection. Thesecond project studied re-sampling variance estimation when missing values are filled in through nearest-neighbour imputation.

1. Tool for comparing various outlier detection methods

We have continued discussions in the past year on the possible developments of our tool for comparing outlier detection methods, and we havecontinued to provide support to users. Among other things, we discussed adapting the tool to the needs of the Consumer Price Index (CPI), for which weare interested in targeting the most influential data on the basket weights. Several methods were determined that could be useful to the CPI (ConsumerPrice Index). There is also interest in detecting outliers in time series. Consequently, we are planning on developing a detection method using timeseries in the near future. It will be developed jointly with the Time Series Section. In addition to the CPI (Consumer Price Index), other users at StatisticsCanada have shown interest in this method. We will start with a generic method and improve it based on user feedback. The ‘generalized M-estimation’method was also developed to compare results with the ‘M-estimation’ method already available in the tool. Furthermore, a new graphic function is beingtested for the M-estimation method.

We also presented the tool to the individuals responsible for the International Travel Survey, which is especially interested in the Sigma-Gap method. Wealso plan on giving presentations for two other surveys. The quick reference guide was also updated, but it is only available in French for the time being.

Finally, New Zealand’s statistical agency has shown interest in our outlier detection tool, and they have been sent a licence.

2. Calculation of non-response variance using re-sampling methodsThe goal of this study was to propose a bootstrap method for estimating the non-response variance of an imputed estimator of the total when themissing data are imputed using the nearest-neighbour imputation methodology (NIM).

NIM (nearest-neighbour imputation methodology) is a non-parametric imputation method because the form of the imputation model is not explicitlydetailed. This feature of NIM (nearest-neighbour imputation methodology) is highly advantageous during imputation because the method is moreresistant to poor model specification, contrary to an imputation method such as regression imputation, for which the first two moments (median andvariance) must be correctly specified. However, NIM (nearest-neighbour imputation methodology) complicates variance estimation, and quite often,additional assumptions must be made to yield a valid variance estimate.

We proposed using the bootstrap to estimate the variance to avoid any additional assumptions. The inference framework used for variance estimation isbased on the imputation model (e.g., Särndal, 1992). Bootstrap replications of the initial sample size are selected based on the sample of respondents.Non-response error is estimated in each replication, and the bootstrap variance estimate can then be calculated. The proposed method tends tooverestimate the non-response variance, therefore, we are trying to identify the cause of the problem so that a solution can be found.

The results of this study could be used in many Statistics Canada surveys, such as the unincorporated enterprises program (T1), that use nearest-neighbour imputation to impute missing values.

For further information, contact: Jean-François Beaumont (613 951-1479, [email protected] ).

Reference

Särndal, C.-E. (1992). Methods for estimating the precision of survey estimates when imputation has been used. Survey Methodology, 18, 2, 241-252.Sampling and EstimationThis progress report includes the four research projects on sampling and estimation:

1. Generalized bootstrap method for three-phase designs.2. Weighting late respondents when a non-response follow-up subsample is taken.3. Incomplete calibration (or calibration with an estimated control total).4. Variance estimation and disclosure control using the bootstrap method.

1. Generalized bootstrap method for three-phase designs

Statistics Canada, Catalogue no. 12-206-X 6

Page 7: Statistical Methodology Research and Development Program

The aim of this research is to develop a bootstrap variance estimation method for the 2012 Aboriginal Peoples Survey (APS). The sample of individualsused in the APS (Aboriginal Peoples Survey) is from a three-phase sample design. The first two phases are the same as those used to produce thehousehold sample for the National Household Survey (NHS). The second phase of NHS (National Household Survey) sampling is required forsubsampling of non-respondents for non-response follow-up. Simple random sampling without replacement is used in each phase. There is currently noapplication of the bootstrap method for such a sample design. We therefore developed an extension of Langlet, Beaumont and Lavallée’s (2008)generalized bootstrap method for a two-phase design. This method is itself based on Beaumont and Patak’s (2012) methodology for single-phasedesigns.

The method being considered requires the calculation of simple and joint inclusion probabilities in each phase. Given the NHS (National HouseholdSurvey)’s non-response in the non-respondent subsample, the NHS (National Household Survey) was deemed to have a three-phase design. Inclusionprobabilities were calculated for each of the three phases, and those probabilities were combined to produce inclusion probabilities for all three phasescombined. By adding the APS (Aboriginal Peoples Survey)’s sampling as a second phase, we can then use the generalized bootstrap method for two-phase designs.

A prototype (SAS macro) for calculating the initial bootstrap weights was developed, and final bootstrap weights were produced. The prototype could beused for any similar survey with a design of three (or fewer) phases, such as the post-NHS (National Household Survey) surveys. In fact, the methodwas also used for the Canadian Survey on Disability (CSD). The method could also be used on the NHS (National Household Survey) itself.

It was also established that it is impossible to always find positive bootstrap weights while meeting the constraints on the first two moments.

A way was found to eliminate dependence of the second-phase adjustment on the first phase adjustment. This can prevent extreme bootstrap weightsand could be used in a future post-census survey.

Within the framework of the APS (Aboriginal Peoples Survey) and the CSD (Canadian Survey on Disability), all weighting adjustments were applied tothe initial bootstrap weights to obtain the final bootstrap weights, taking into account all adjustments to the survey weights. In particular, the first-phaserandom adjustment factors (NHS) associated with the two-phase generalized bootstrap method were used to calculate the variable control totals (NHS(National Household Survey) estimations) during post-stratification.

The method was documented in the section on variance estimation of the APS methodological report. The research project is now completed.

2. Weighting late respondents when a nonresponse follow-up subsample is takenNonresponse is frequent in surveys and is expected to lead to biased estimates. A useful way to control nonresponse bias is to follow-up a randomsubsample of nonrespondents after a certain point in time during data collection. Nonresponse bias can be eliminated through a proper weightingstrategy, assuming that all the units selected in the subsample respond. Selecting a subsample of nonrespondents is sometimes done in practicebecause it can be less costly than following up with all the nonrespondents.

Nonresponse bias cannot be completely eliminated in practice because it is unlikely that all the units selected in the subsample will respond. However,nonresponse in the follow-up subsample can be treated using standard techniques such as nonresponse weighting or imputation.

The goal of this project is to focus on a more delicate weighting issue resulting from the occurrence of late respondents. The late respondents are thosewho eventually respond to the survey but after the point in time where the subsample was selected. These late respondents may or may not have beenselected in the follow-up sample. We develop and investigate nonresponse weighting strategies that can be used to handle late respondents without justthrowing them away.

We have developed two weighting approaches that lead to consistent estimators of totals provided that we can find consistent estimators of the unknownprobabilities of being a late respondent. The theory has been developed and documented. We have performed a simulation study to evaluate these twoapproaches. A two-page summary and a 10-page background paper needed for the Advisory Committee on Survey Methods (ACSM) have also beendrafted.

3. Incomplete calibration (or calibration with an estimated control total)Singh and Raghunath (2011) proposed an estimator based on a regression model in which one of the totals, in this case population size, is estimatedusing the sample weights of the survey. Under certain conditions, this estimator can be more effective than a regression estimator that uses the truetotal. We compared this estimator in the specific case of two auxiliary variables (the constant variable and one independent variable) to the simple linearregression estimators; the ratio estimator and two optimal estimators (the first using only the independent variable and the second using both variables).

A simulation study was conducted using different sample designs (fixed and random size): the Midzuno design (Midzuno, 1952), the Sampford designand the Poisson design. It showed that depending on the sample design, the estimator examined could perform better than its competitors. Furthermore,especially in the Midzuno design, the performance of the optimal estimator using both auxiliary variables was below expectations: the estimator washighly unstable. Furthermore, under the Poisson design, the estimator examined by Singh and Raghunath (2011) performed well below other estimators.

We gave two presentations during the current period: the first at the 2013 international symposium and the second at the 2013 Statistical Society ofCanada (SSC) annual conference. We are currently considering writing an article that would be submitted to a scientific journal.

4. Variance estimation and disclosure control using the bootstrap methodA large number of household surveys use the bootstrap method to estimate the variance of estimations produced from their survey data. The Rao-Wu-Yue (1992) method is typically used to replicate the sampling process in generating bootstrap weights. The Rao-Wu-Yue method results in thedevelopment of bootstrap weight series that can naturally derive the underlying variables of the sample design when the bootstrap weights areproduced, which causes problems on disclosure of these weights. In fact, such bootstrap weights can only accompany public use microdata files since

Statistics Canada, Catalogue no. 12-206-X 7

Page 8: Statistical Methodology Research and Development Program

the characteristics of the sample design can be tracked via the bootstrap weights.

The aim of this project is to examine different bootstrap variance methods based on three criteria in the hopes that one will produce quality varianceestimations while protecting the information of the sample design. The three criteria are:

1. Produce a quality estimation of the true variance;2. Perform even if the sampling fraction is high;3. Protect disclosure of the underlying variables on the sample design.

An analysis was conducted of the four variance estimation methods from the perspective of accuracy of the different methods at estimating truevariance. Since the previous semi-annual review, the small-area variance estimation results were analyzed. The complete results were presented to theHousehold Survey Methods Division (HSMD) Technical Committee in November 2013.

Another project stage consists in applying the methods received to the existing survey data. Hence, the generalized method is to be put in place in thecontext of the Canadian Community Health Survey. This survey already uses the Rao-Wu-Yue method and recently tested the Poisson method. Thesemethods will be compared in terms of accuracy and then in terms of capacity to control disclosure of confidential information (more specifically, sampledesign information).

Documentation of the results and some recommendations will be incorporated into the research project report.

A research proposal was submitted and follows on somewhat from the idea of this project, meaning producing bootstrap weights that do not discloseconfidential information. It involves exploring the method suggested by Kim and Wu (2013) and seeing whether this method meets the required criteriaand whether information (on the sample design) is kept confidential. The proposal was selected and work will begin in fiscal year 2014-15.

For further information, contact: François Brisebois (613 951-5338, [email protected] ).

ReferencesKim, J.K., and Wu, C. (2013). Sparse and efficient replication variance estimation for complex surveys. Survey Methodology, 39, 1, 91-120.

Midzuno, H. (1952). On the sampling system with probability proportional to sum of size. Annals of the Institute of Statistical Mathematics, 3, 99-107.

Singh, S., and Raghunath, A. (2011). On calibration of design weights. METRON International.

Small Area EstimationSmall Area Estimation (SAE) is more relevant now due to increasing costs of data collection and due to growing demand for reliable small area statisticsand the need to reduce burden. Direct estimators for an area use only data from the sample in the area and do not provide adequate precision for smallareas because sample size is small. On the other hand, indirect estimators (small area estimators) borrow data from related areas to increase theeffective sample size. Data from other areas is borrowed through a series of assumptions or “model” and model-based estimates are produced. Ourmain research objectives are to help answer the following questions: 1) Is there an SAE (small-area estimation) method that provides estimates of goodenough quality for publication? Can we make available a design-based measure of quality (based on sampling properties) 2) Can we implement thismethod for production (development & operational costs, timeliness, reputation of the Bureau versus client demand, burden, collection costs) 3) How theoutcomes and issues arising from this project can help towards shaping a Statistics Canada strategy on the production of model-based estimates? Inthis period we advanced work and presented papers based on the following ten projects with applications to social and business surveys.

Extensions of the Pseudo-EBLUP estimator with application to Survey of Employment, Payrolls and Hours (SEPH)We conducted a comparison of various cross-sectional small area estimators for possible application to SEPH. In addition to the extended pseudo-EBLUP estimator necessary for the application to SEPH, we added the EBLUP estimator to the comparison and used graphs to show that the estimatorof the model-based MSE of the various EBLUP estimators underestimate the design-based MSE. We completed the analysis of the results and theywere included in the Small Area Workshop held at Statistics Canada on October 15, 2013.

A positive variance estimator for the Fay-Herriot small area modelThe Empirical Best Linear Unbiased Predictor (EBLUP) obtained from fitting the Fay-Herriot model (Fay and Herriot, 1979), is a weighted average of thedirect survey estimator and the regression- synthetic estimator. The weights depend on the variance of the random area effects. Standard methods ofvariance estimation often yield negative estimates, which are set to zero and the EBLUP estimator becomes a regression-synthetic estimator. However,most practitioners are reluctant to use synthetic estimators for small area means, since these ignore the survey based information and are oftensignificantly biased. This problem gave rise to a series of variance estimation methods that yield always positive estimates. We propose the MIXvariance estimator, which is not only positive, but its bias has a faster convergence rate than other positive variance estimators. In 2023-2014 weinvestigated theoretical and empirical properties of the Li and Lahiri (2011) and Yoshimori and Lahiri (2013) variance estimators and compared them withthe MIX variance estimator (jointly with Yong You).

Small area estimation for RDCI

The survey of Research and Development in Canadian Industry (RDCI) uses administrative data together with a sample of 2000 “firms” or collectionentities. RDCI is required to produce a fully imputed micro data file, and to produce estimates for 212 North American Industry Classification groups forthe System of National Accounts (SNA). In Integrated Business Statistics Program (IBSP), the current objectives and sample size of the RDCI do notsupport either of these. We conducted a feasibility study on the production of estimates for 212 small domains using small area estimation techniques.We investigated a variety of models and used the Small Area Estimation system to obtain the estimates. A report detailing the methodology anddiscussing the results was written.

Statistics Canada, Catalogue no. 12-206-X 8

Page 9: Statistical Methodology Research and Development Program

Small area estimation with unit-level models under an informative sample designThe aim of the project is to develop and study a simple method of small-area estimation that is based on unit-level models and produces reliableestimates when the sample design is informative. The method consists in adding variables from the sample design to the model as an explanatoryvariable, such as the sample weight or size, when dealing with a sample design with selection probabilities proportional to the size.

Simulations have shown that the fit of the augmented model yielded substantial precision gains in the point estimates under informative designs, withrespect to both bias and mean square error. An article was submitted to the Survey Methodology journal at the end of the 2011-2012 fiscal year and wasaccepted, conditional on revisions. Additional simulations were conducted in this fiscal year. The article was revised and submitted again (Verret, Raoand Hidiroglou, 2014).

The project will address some shortcomings in small area estimation with unit-level models due to the very strong assumption that the sample design isnot informative. In direct estimation, this assumption is not made in order to avoid large biases.

Small area estimation on the census and the National Household Survey

The aim of this project is to apply estimation methods to small areas in the 2011 Census of Population and the 2011 National Household Survey (NHS).On one hand, small area estimation has been used for several census cycles to estimate undercoverage by age group of the individuals living on nativereserves and Aboriginal identity individuals. In this research project, the SRID (Statistical Research and Innovation Division) small area estimationsystem was used to estimate undercoverage in 2011 (Nambeu and Hidiroglou, 2013).

On the other hand, before studying small area estimation using data from the NHS (National Household Survey), we attempted to adjust unit-levelmodels to the 2006 long-form questionnaire data. The goal was to obtain more accurate estimates of the income totals by dissemination area and bycensus tract. There were several challenges in adjusting small area estimation models. For example, it is difficult to the cluster structure of data take intoaccount in the definition of the unit under study and the explanatory variable of the model. Despite these difficulties, small area estimation yieldedsignificant precision gains compared to classic direct estimators.

For further information, contact: Susana Rubin-Bleuer (613 951-6941, [email protected] ).

ReferencesFay, R.E., and Herriot, R.A. (1979). Estimation of income from small places: An application of James-Stein procedures to census data. Journal of theAmerican Statistical Association, 74, 269-277.

Li, H., and Lahiri, P. (2011). An adjusted maximum likelihood method for solving small area estimation problems. Journal of Multivariate Analysis, 101,882-892.

Yoshimori, M., and Lahiri, P. (2013). A new adjusted maximum likelihood method for the Fay-Herriot small Area model. To be published in the Journal ofMultivariate Analysis.

Data Analysis Research (DAR)The resources assigned to Data Analysis Research are used for conducting research on current analysis-related methodological problems that havebeen identified by analysts and methodologists; the resources are also used for research on problems that are anticipated to be of strategic importancein the foreseeable future. People doing this research gain experience in knowledge transfer as well, through the writing of technical papers, and bygiving seminars, presentations, and courses.

Deoxyribonucleic acid (DNA) analysis for surveys

The main objective of this research was to prepare for analytical questions related to the analysis of DNA collected in complex surveys. StatisticsCanada now collects DNA from respondents of the Canadian Health Measures Survey. CHSM has begun collecting DNA samples from consentingparticipants 20 years of age and older. DNA samples are frozen and stored anonymously to protect participant confidentiality. The Biobank contains DNAsamples from about 6,500 participants 20 years or older. More information on Biobank is available at http://www.statcan.gc.ca/survey-enquete/household-menages/5071g-eng.htm. The availability of such biospecimens offers new opportunities and opens up new avenues for exploration.Such genetic data may be used to correlate genetic traits related to certain health outcomes to certain cities regions, to identify regions where certaintypes of health care services will be used heavily or lightly, link demographic or socio-economic factors to certain genetic traits and be of use in themircosimulation models built here at Statistics Canada. To analyse this new type of data, good knowledge of DNA analysis techniques is essential,especially since these techniques will have to be adapted to survey design-based analyses.

In order to purse research on the DNA analysis for complex samples we have first undertook a review of classical methods applied to epigenetics. Thisreview is being used to guide further research in the area of design based methods for DNA analysis. The reader may be wondering how epigeneticanalyses relates to the survey methodologies with which Statistics Canada is primarily concerned with. Consider that the analysis methods mentionedabove are really just data mining techniques and gene expression data is really just an example of big data. Big data analyses are becomingincreasingly important aspect as more and more information is digitally collected providing a wealth of information with the potential to change the waystatistical agencies collect and analyse data. The conversion of surveys to a web-based format means a plethora of paradata may be collected with eachquestionnaire which, if mined, could reveal information which may assist in improving the collection process to obtain more complete and accurate data.Some or all of the methods applied to analyse microarray data may also be applied to other types of big data analyses.

Extending Classical DNA analysis to Complex Surveys

Statistics Canada, Catalogue no. 12-206-X 9

Page 10: Statistical Methodology Research and Development Program

All DNA analysis is classically based. This research began to look at the issues of this type of analysis with survey data. The two-stage superpopulationand two-stage design model (by Susana Rubin-Bleuer) looks like a useful match to deal with some of the issues associated with epigenetic dataanalysis.

Selected topics in design-based methods for analysis of survey data: Bayesian Data Analysis for Complex Surveys

Each year, a number of topics related to design-based methods of analysis of survey data are identified, but each is too small to be proposed as a singleproject. However, each has an impact on the advice given by DARC with respect to appropriate methods for analysing survey data.

The purpose of this research is to review Bayesian analysis methods and compare these with classical and design based methods on a variety of datasources. Bayesian methods have not been used in the analysis of survey data at Statistics Canada. Frequentist approaches are generally applied. Thiswork first describes the two different statistical approaches to inference. Then different applications of the Bayesian methods to survey data found in theliterature are reviewed.

Bayesian methods may provide useful tools for the analysis of administrative or big data.

However there is not a great expertise or focus on these methods. Whereas, much work and emphasis are on design based estimation and analysis forcomplex surveys. Additionally, there is little literature comparing these areas and their utility in analyzing different data such as administrative or complexsurvey data.

For further information, contact: Karla Fox (613 951-4624, [email protected] ).

Data CollectionThe aim of collection research is to further our knowledge and put in place more cost-effective and quality collection processes.

Collection research projects cover three main aspects: three projects aimed at improving collection vehicles; 1) the establishment of guidelines for thedevelopment of electronic questionnaires; 2) the development of an innovative approach using the graph theory to analyze questionnaire complexity andpotentially reduce the response burden, and to facilitate post-collection processing; and 3) the development of an appropriate measurement instrumentto reduce potential response error for proxy collection, taking the collection mode into account. Three projects to develop mitigation measures to respondto corporate risk #6: 1) review of tracing methods to improve them; 2) assessment of the potential impact of incentives in household surveys; and 3)development of a theoretical framework to prioritize follow-up when collecting new collection strategies. Finally, one project to define and evaluate theeffects of collection modes on quality by developing a mathematical framework to assess the impacts of the mode.

Guiding principles for electronic questionnairesAs more and more Statistics Canada surveys transition to electronic questionnaires, survey-specific data requirements and associated functionalitiescontinue to emerge and evolve. One of the ongoing objectives of this project is to summarize and document the knowledge gain so far as well as identifyspecific e-questionnaire design issues that need to be further evaluated.

The electronic questionnaires (EQ) to Standards Committee was reconvened in May 2013. The questionnaire design resource center (QDRC)contributes to this interdisciplinary/interdivisional committee on a weekly basis. In addition, we have started writing a research paper focusing on currentmethods used to pretest electronic questionnaires.

The paper will propose possible guidelines for the QDRC continues to be directly involved in the development and implementation of usability andcognitive interview test strategies for business and social surveys that are moving to the EQ environment. Findings from this end-user testing areinstrumental in helping guide design and development decisions for existing and new EQ applications. The intent of these efforts is towards improvingdata quality by reducing possible measurement error and respondent burden.

Since the electronic questionnaire project is still developing, we will continue summarize and document the knowledge gain so far based on StatisticsCanada’s experiences with Internet collection as well as compare these findings with design efforts and research being carried out by others.

Graph Theory Approaches for Questionnaire DesignThe objective of the research is to automate and understand graph theory approaches for questionnaire design and development. Previous research bySchiopu-Kratina has demonstrated that graph theory can be used to successfully explain the complexity of questionnaire. Additionally, it wasdemonstrated that this method could be used to simplify a questionnaire. The work looks at extending that research to address issues with automatingthe method of calculating complexity scores and methods of simplifying graphs. In addition to the feasibility of the automation, it looks at the utility ofquality indicators of respondent burden using the graphs complexity measures. Work has been done to eliminate double counting in the complexityscore as well as to develop a test questionnaire using the Questionnaire Development Tool (QDT) to help assess the feasibility of automation. Work isongoing on informatics approaches to automation and complexity. Presentations and discussion with the Questionnaire Design Tool team have beenmade to help aid in the automation of this approach.

Future work will be on understanding both the overall distribution of complexity scores at STC but also for a few case studies; and will continue toexplore software options for automating the graph. The applications of this research can be used for reduce costs and/or improve data quality. Thisapproach could potentially create a useful indicator of respondent burden. It could also be used to assist in questionnaire design, testing, edit andimputation and analysis.

Understanding proxy responses

Statistics Canada, Catalogue no. 12-206-X 10

Page 11: Statistical Methodology Research and Development Program

The objective of this research is to understand the factors that are associated with proxy quality in a household survey context. Without a theoreticalunderstanding of the psychological, contextual and structural factors underlying quality of proxy responding, we are unlikely to be able to understandwhat circumstances and collection strategies are likely to elicit high quality proxy responses, and which ones will not.

In addition to the paper presented at the Symposium in the fall, additional funds were obtained to conduct some further research with collaboration froma cognitive psychologist. An environmental scan was completed. 29 articles were identified and selected from the many available in the literature, andused to input the pilot conceptual model. The factors identified in the review were used to define a pilot theoretical framework for proxy response.

Factors proposed in the literature to be related to quality of proxy response were grouped into classes, characteristics, operationalizations andoutcomes. The literature reviewed suggested 5 broad classes of characteristics related to response quality, including characteristics of:

the Target (i.e., the person whose response was desired);the Proxy (i.e., the person providing the information);the Target-Proxy relationship (e.g., how close the two are);the Question (e.g., sensitivity of questions), andthe Context (e.g., the extent to which the situation encourages socially desirable responses).

The framework provides the starting point for developing an instrument that increases the overall quality of response, correlates well with the self reportand has minimum bias. It can be used to determine what characteristics should be measured when proxy responses are used so as to determine thefinal responses’ fitness of use.

Future work should include a scoping review of the literature to complete the framework. There could be future cognitive testing sessions with a realsurvey with a larger sample of participants with different degrees of relationship and with participants with more chronic health problems.

Additionally, there are a number of secondary data analyses that could be done on existing data. Finally a Total Decision Framework (TDF) approach todeveloping interventions could be designed to improve survey responses. The results could be used in survey collection to help understand the potentialquality of proxy responses.

The use of incentives in household surveys

This research project follows on from the project on response rates in 2012-2013 funded by the Quality Secretariat, permitting a literature review on theuse of incentives.

A literature review was initiated during the period. The results were discussed during a presentation to the Technical Committee on Household Surveys,and were used by the Statistics Canada working group on incentives for their presentation to the Policy Committee. Several new relevant articles wereidentified and will complete the review of the literature and of international practices.

An Adaptive Data Collection Procedure for Call Prioritization

The objective of this research was to propose an adaptive data collection procedure for call prioritization in the context of computer-assisted telephoneinterview surveys. The procedure is adaptive in the sense that the effort assigned to a sample unit may vary from one unit to another and may also varyduring data collection. The goal of an adaptive procedure is usually to increase quality for a given cost or, alternatively, to reduce cost for a given quality.The quality criterion often considered in the literature is the nonresponse bias of an estimator that is not adjusted for nonresponse. Although the

reduction of the nonresponse bias is a desirable goal, it is not a useful criterion to use at the data collection stage of a survey because the bias that canbe removed at this stage through an adaptive collection procedure can also be removed at the estimation stage through appropriate nonresponse weightadjustments. Instead, a procedure of call prioritization that attempts to minimize the nonresponse variance of an estimator that is adjusted fornonresponse subject to an overall budget constraint was developed and evaluated in a simulation study.

A research article was submitted to a journal for publication during the last fiscal year. In this fiscal year, encouraging reports from the associate editorand referees were received. As a result, a few more exploratory simulations were performed and responses to reports were prepared. A revised versionof the article was submitted.

Extending to a Mathematical Understanding of Mode EffectsPrevious work at Statistics Canada on mode effects has established a framework for the factors that define mode effect and which factors in the surveyprocess are affected by changes in mode. The objective of the research was to create a simulation tool to understand the effect of changes in mode,design and collection had on estimation.

The simulation has been created in SAS and has been documented in a report. Design approaches similar to ones use by the General Social Survey(GSS) were used to simulate. Old and new sampling plans were linked theoretically to a model super-population through different mechanisms, alongwith the super-population assumptions; the simulation creates finite populations of interest.

The differences in household phone lines and cell were taken into account to create a synthetic frame of households and of access to households. Thepopulation has several estimated parameters that can be altered to see the impact of planning assumptions on a single categorical estimate. Furtherdetails on the construction of the simulation can be found in the report.

The simulation could be easily modified for other studies or the parameters can be used within the Modgen collection simulation.

For further information, contact: Michelle Simard (613 951-6910, [email protected] ).

Disclosure Control Resource Centre (DCRC)

Statistics Canada, Catalogue no. 12-206-X 11

Page 12: Statistical Methodology Research and Development Program

As part of its mandate, the DCRC provides advice and assistance to Statistics Canada programs on methods of disclosure risk assessment and control.It also shares information or provides advice on disclosure control practices with other departments and agencies. Continuing support on disclosurecontrol methods is also given to the agency’s data centres programs, mostly in terms of assistance for the implementation and interpretation ofdisclosure control rules for data centre holdings including survey, census, administrative and linked data.

Information and advice was provided internally and to the Treasury Board of Canada Secretariat, the Alberta Office of Statistics and Information andl’Institut de la statistique du Québec. A technical review was done for the book Anonymizing Health Data – Case Studies and Methods to Get YouStarted (El Emam and Arbuckle, 2013).

Development of disclosure control rules for administrative data

The aim of the project is to develop control rules for the disclosure of aggregate personal administrative data (tables and analytical results). We dividedthe types of administrative data into two groups: type A (health, justice, education, etc.) and type B (tax data) to better adapt the rules to the needs andchallenges specific to each type. The target was disclosure of administrative data controlled by the agency (with our G-Tab system) rather thandisclosure through the Real Time Remote Access (RTRA) program. Different approaches were examined and discussed (suppression, barnardisation,score method, data swapping, noise methods, etc.). Our approach tries to preserve the knowledge (rules already in place for RTRA and G-Tab) and isgenerally post-tabular (processing of outputs).

A new approach was developed for proportions (percentages) with a quality indicator; a rule developed for percentiles; options evaluated for addingprimary suppressions to the controlled rounding of counts; disclosure control rules developed for totals; theoretical calculation performed and empiricaldetermination of the exact impact of our rounding charts on original data; and a G-Tab “companion” was developed that analyzes the output table,calculates a score and informs the user by means of a scale and a variety of messages (e.g., presence of a high number of small cells, full cells, etc.) ofthe disclosure risk level and proposed manual and automated solutions. The companion is also a tool for learning and awareness of various disclosurecontrol issues. The approaches were presented to the internal client, and system specifications were prepared.

Strategies for Processing Tabular Data Using the G-Confid Software

Statistics Canada’s G-Confid system uses cell suppression to protect the values of confidential cells in tables of magnitude. Users would like it to handlespecial situations such as the treatment of weighted survey data, of negative values and of waivers. Waivers are used when, in an attempt todisseminate more data, a statistical agency obtains from certain large respondents the permission to release information that may disclose their value.Approaches that can be used to address these and other needs using G-Confid have been developed. They were presented at the 2013 Joint StatisticalMeetings (Tambay and Fillion, 2013). The approaches could be implemented in other cell suppression programs.

Data Perturbation to Reduce Cell SuppressionFor tabular magnitude data (e.g., aggregated sales by province and industry), the project explored alternatives to complementary cell suppression.These included noise addition at both the microdata and aggregate level, as well as various methods to maintain table margin totals and reduce overallnoise effects on non-sensitive cells. The use of these alternative methods was explored in a variety of primary suppression scenarios, including cases

with only one sensitive cell and cases with one province completely sensitive. Finally, the project explored the pros and cons of each technique andmade recommendations as to the suitability of their use under different circumstances. Results will be documented in the next fiscal year.

For further information, contact Jean-Louis Tambay (613 951-6959, [email protected] ).

ReferenceEl Emam, K., and Arbuckle, L. (2013). Anonymizing Health Data – Case Studies and Methods to Get You Started. O’Reilly Media, Inc., Sebastopol, CA.

Record Linkage ResearchRecord linkage brings together records from different files. It is an important tool for exploiting administrative data and may serve many purposes, e.g.,creating a frame or collecting data. The research has focused on exact record-linkage and it has developed in three directions. The first direction is theexploration of new methods for linking data, including the spectral covariance method and the use of new statistical models for linked data. The seconddirection focuses on the measurement of linkage errors and objective criteria for deciding the fitness for use of linked data. It includes the use of newmodels and the development of comprehensive quality indicators, measures and guidelines. The third direction is the development of a comprehensiveapproach for record-linkage including deterministic solutions such as MixMatch.

Spectral tokenization of names for ethno-linguistic clustering with application to probabilistic record linkageThe performance of the spectral covariance method and its application as a blocking strategy were discussed (Dasylva, 2013). Practical large scaleapplications are currently limited by practical issues such as the relatively large number of clusters produced by the spectral method and its prohibitiveprocessing requirements.

Deterministic Record Linkage Prototype

MixMatch v5.1 is a solution for deterministic record linkage with an existing clients base. It was successfully migrated to SAS and new features were alsoadded, such as tabular reports about the linkage performance (Lachance, 2014). Accompanying documentation was also produced including a datadictionary and a user’s guide.

Statistics Canada, Catalogue no. 12-206-X 12

Page 13: Statistical Methodology Research and Development Program

Nonparametric Methods for Probabilistic Record LinkageExact probabilistic record linkage brings together records from one or more datasets, which originate from the same individuals, by explicitly computingthe probability that selected records-pairs, are either matched pairs containing related records or unmatched pairs. It involves estimating the records-pairs distributions from a sample and classifying selected pairs as matched pairs or as unmatched pairs, according to Fellegi-Sunter’s optimal decisionrule. In practical applications, exact probabilistic linkage is mostly based on parametric models such as Fellegi-Sunter’s original model of conditionalindependence which do not consider correlations. New statistical models have been proposed for record linkage (Dasylva, 2014). They allowinteractions, and include nonparametric models, i.e., models where the correlation structure among linkage variables can be arbitrary. The theoreticalproperties of the proposed models have been studied, including the important question of their identifiability (Fienberg, 2007). Expectation Maximization(EM) algorithms have been proposed for their estimation. The new models enable the accurate estimation of linkage errors; a key requirement for theaccurate analysis of linked data. They are evaluated and compared to previous models, e.g., latent class log-linear models with interactions(Thibaudeau, 1993), through simulations.

Quality Indicators for Linked Data

Comprehensive quality indicators, measures, guidelines and best practices have been identified for linked data and the different steps of the linkageprocess, whether probabilistic or deterministic (Dasylva and Haddou, 2014). These guidelines are useful not only for maximizing the linkage quality, butalso for assessing the fitness for use of linked data and impact of record-linkage on other survey steps, including estimation and analysis. Theycomplement existing guidelines and check-lists on record-linkage, such as the ones found in Statistics Canada Quality Guidelines document.

For further information, contact: Abel Dasylva (613 951-7618, [email protected] ).

References

Abeysundera, M., Field, C. and Gu, H. (2012). Phylogenetic analysis of multiple genes using spectral methods. Mol. Biol. Evol., 29(2), 579-597.

Fellegi, I.P., and Sunter, A.B. (1969). A theory for record linkage. Journal of the American Statistical Association, 64(328), 1183-1210.

Fienberg, S., Rinaldo, A., Hersh P. and Zhou, Y. (2007). Maximum likelihood estimation in latent class models for contingency table data. Report,available at http://www.ml.cmu.edu/research/dap-papers/yizhou-kdd.pdf .

Thibaudeau, Y. (1993). The discrimination power of dependency structures in record linkage. Survey Methodology, 19, 1, 31-38.

Statistics Canada, Catalogue no. 12-206-X 13

Page 14: Statistical Methodology Research and Development Program

Support Activitie

Time SeriesThe objective of the time series research is to maintain high-level expertise and offer needed consultation in the area, to develop and maintain tools toapply solutions to real-life time series problems as well as to explore current problems without known or acceptable solutions.

The projects can be split into 7 sub-topics:

Consultation in Time Series (including course development)Time Series Processing and Seasonal AdjustmentSupport to G-Series (Benchmarking and Reconciliation)CalendarisationModeling and ForecastingTrend estimationOther R&D time series projects

ConsultationAs part of the Time Series Research and Analysis Centre (TSRAC) mandate, consultation was offered as requested by various clients. Topics mostfrequently covered in the reviewed period continued to be around modeling, forecasting and the use of seasonal adjustment tools for the purpose of datavalidation and quality assurance.

TSRAC members continue their participation to various analytical and dissemination groups such as the Forum for the Daily analysts and the newForum on seasonal adjustment and economic signals. An extended review of direct and indirect methods was presented at the Forum on seasonaladjustment and economic signals. TSRAC members also met with various international visitors to discuss time series issues and refereed papers forexternal journals.

An overview paper on seasonal adjustment for non specialists was completed (Fortier and Gellatly, 2014).

Time Series Processing and Seasonal Adjustment

This project monitors high-level activities related to the support and development of the Time Series Processing System. Seasonal adjustment is doneusing X-12-ARIMA (for analysis and development or production) or SAS Proc X12 (for production).

Statistics Canada, Catalogue no. 12-206-X 14

Page 15: Statistical Methodology Research and Development Program

Time Series Processing System was updated to include more robustness related to the installation platform, more diagnostics on the risks of seasonaloutliers and extreme as well as extra flexibility for labelling the various naming module. Other small adjustments and minor improvements were alsoadded and document in monthly reports.

The new guidelines for the selection of seasonal adjustment options in the context of deflation exercises were further tested and proven to be valid.

Various innovative approaches to deal with volatile economic situation were illustrated and documented in Matthews, Ferland, Patak and Fortier (2013).This paper includes recent examples of seasonal adjustments where interesting results were obtained by going beyond the usual options.

To answer questions related to the apparent increased volatility of important indicators such as the Labour Force Survey, various smoothness measureswere computed on the Labour Force Survey and presented (Le Petit and Fortier, 2014).

Support to G-Series (Benchmarking and Reconciliation)Benchmarking refers to techniques used to ensure coherence between time series data of the same target variable measured at different frequencies,for example, sub-annually and annually. Reconciliation is a method to impose contemporaneous aggregation constraints to tables of time series, so thatthe “cell” series add up to the appropriate “marginal total” series. SAS procedures to implement benchmarking and reconciliation solutions are developedunder the G-Series generalized software.

Support to the recently released G-Series 1.04.001 was provided as needed. Plans were established for the next version of the system to be developedin 2014-2016. The objective is to add a new balancing procedure or solution to the set of available procedures in G-Series. The balancing procedure willsatisfy both generalized linear constraints and non negativity constraints. It will then allow processing of more complex reconciliation rules and facilitatecalendarisation with benchmarking. The literature review related to the future development is underway, with emphasis on the Bureau of EconomicAnalysis approach as well as Statistics Netherlands’s experiences.

CalendarisationThis sub-topic deals with support and development of calendarisation methods both from a benchmarking point of view and the more recent splineinterpolation techniques. Recent experiences were presented at Statists Netherlands and at the ESSnet Admin Data workshop (Fortier, 2013).Illustrative examples from Quenneville, Picard and Fortier (2013) were also presented at the Joint Statistical Meeting 2013.

The links between the various methods (including the relationship between the Hodrick Prescott filter, the spline and the SSM models) were furtherestablished (Picard, 2014).

The benchmarking module in the Time Series processing System was adapted to perform calendarisation.

Trend Estimation

A new project was started to review the possibility of reintroducing trend lines in our official publications (especially in The Daily). Various methods werereviewed, first in the literature and then a smaller set of methods were used in a simulation study. The results—which favor the use of a trend-cycle linein our published graphs using a variant of the Dagum and Luarti (2009)—were presented to various internal committees (Fortier, Picard and Matthews,2013). Publication approaches will be tested in focus groups with our readers in the next year.

Other R&D Time Series ProjectsThe most important sub-project in this category is the construction of a confidence interval around seasonally adjusted data; many approaches havebeen investigated in the past but they were never/rarely used in official production. The promising approach currently investigated involves the use ofstate space models to mimic the long term smoother representing the X-11 algorithm, with extension to take into consideration a sampling errorcomponent. The key ideas and their relations to the existing literature were summarized in Quenneville (2013).

For further information, contact: Susie Fortier (613 951-4751, [email protected]).

ReferenceDagum, E.B., and Luati, A. (2009). A cascade linear filter to reduce revisions and false turning points for real time trend-cycle estimation. EconometricReviews, 28, 1-3. 40 to 59.

Record Linkage Resource Centre (RLRC)The objectives of the Record Linkage Resource Centre (RLRC) are to provide consulting services to both internal and external users of record linkagemethods, including recommendations on software and methodology and collaborative work on record linkage applications. Our mandate is to evaluatealternative record linkage methods and software packages for record linkage and, where necessary, develop prototype versions of software

incorporating methods not available in existing packages. We also assist in the dissemination of information concerning record linkage methods,software and applications to interested persons both within and outside Statistics Canada.

Consultation

We continued to provide consultation services on record linkage projects for internal users & external inquiries as needed. We also provided supportwork to the development team of G-Link, including participating in Methodology- System Engineering Division (SED) Record Linkage Working Group

Statistics Canada, Catalogue no. 12-206-X 15

Page 16: Statistical Methodology Research and Development Program

meetings and Record Linkage User Group meetings. The team helped resolve some issues in the G-Link version 3.0. We also supported the branchrecord linkage efforts as a resource on G-Link and updated the inventory of the linkages done at the branch level. We supported record linkages withhealth data and with justice data and helped bring G-Link performance and GRLS- G-Link migration issues up with health division and SED G-Link. Weworked on Social Domain Record Linkage Environment (SDRLE) linkages and use the linkages as an opportunity to field test G-Link 3.0 features anddevelop more systematic and theoretically coherent approaches to defining and adjusting record linkages.

Adjustment technique for unlinked recordsWe worked on developing an adjustment technique by reweighting to address the issue of unlinked records. Unlinked records are due to missing pairs ofrecords that do belong to the same unit. An empirical study was conducted between Canadian Community Health Survey (CCHS) 3.1 and DischargeAbstract Database (DAD) 2001-2009. We also started develop a method for adjusting the error for incorrect links in the case of the logistic regression.Incorrect links are due to erroneous acceptance of false links. An empirical study will be conducted between CCHS 4.1 and DAD 2001-2009.

For further information, contact: Abdelnasser Saidi (613 951-0328, [email protected]).

Data Analysis Resource Centre (DARC) SupportThe Data Analysis Resource Centre (DARC) is a centre of data analysis expertise within Statistics Canada whose purpose is to encourage, suggest andprovide good analytical methods and tools for use with data of all types. DARC (Data Analysis Resource Centre) provides services to statistics Canadaemployees (analysts and methodologists) as well as to analysts from Research Data Centers and on occasion to researchers and external clients. TheCentre investigates methodological problems which have been identified by analysts, methodologists or external clients. The Centre is also involved intransfer and exchange of knowledge and experience through reviewing and publishing technical papers and by giving seminars, presentations andcourses. This year we are also providing administrative support to BSMD to manage consultations on survey methods provided by Claude Girard.

The Center has been active in numerous consultations with internal analysts, external analysts and methodologists. Internally, we have consulted withstaff from Tourism and the Centre for Education Statistics Division, Health Statistics Division, Social Analysis Division, Economic Analysis Division,Health Analysis Division, Investment, Science and Technology Division and Consumer Prices Division. These various consultations covered topics onthe use of weights within a linked cohort study, how linkage affects analysis, how to do an impact assessment study, how to assess for multicolinearity ina regression, Hosmer and Lemeshow tests, Satterthwaite adjustments, STATA, and issues with stepwise regression.

In addition to internal analytical consults we have had external consultations with analysts at Cancer Care Ontario, Canadian Nuclear SafetyCommission, researchers at Queens University and analysts with the Public Health Agency of Canada, these consultations concerned comparingestimators, understanding design based sampling, data quality, Bayesian Small Area Estimation (and confidentiality), using weights in pair-wiseestimation and Satterthwaite type adjustments.

Additionally our client base has continued to include methodologists. We have had consultations with Business Survey Methods, Household SurveyMethods and Social Survey Methods, These consultations included questions on propensity score analysis for mode effect studies, the design andanalysis of mode effect studies, Satterthwaite adjustments, logistic regression, comparing proportions in domains, hierarchical linear models, comparingdifferent weighted estimators, rare event analysis (classical case), and stepwise regression analysis, simulation studies of bootstrap variance undermulti-phase sampling, entropy, non-response and logistic regression, SUDAAN bootstrap estimates, and Poisson regression.

In addition, we have reviewed and provided written comments on technical papers for Statistical Research and Innovation Division and InternationalCooperation Division. These papers were both on issues with rare events. We have reviewed two technical articles, one on bootstrap and one onvariance estimation under linearization. We have also reviewed two Analytical Projects Initiative (API) reports that were completed last fiscal year.

Two half day courses were presented to the new Data Interpretation Workshop. A course on collection methods for methodologists was developed.Several new communication strategies such as instructional videos have been explored (for Research Data Centres (RDC) clients) that may be a usefultool in the future for frequently asked questions for all staff. A reorganization of materials already on our drives to facilitate its retrieval and distributionand has begun.

For further information, contact: Karla Fox (613 951-4624, [email protected]).

Generalized SystemsThe main goal of this research is to explore new methods that we think have good potential for being included in one of the Generalized Systems.Research activities could include literature review, building and testing of prototypes, and comparison of the performance and functionality of differentmethods. This year the work was focused on disclosure control and automated coding.

Disclosure control: Score function and sequential method

Occasionally it happens that the suppression macro in G-Confid is unable to find a solution. In this case, the macro stops processing immediately andgives the user an error message. A score function and a sequential method were both analyzed with the goal of breaking the suppression problem intosmaller pieces so that a solution would be found.

The current method (breaking the initial problem into several tables) maintains the connection between the dimensions and takes hierarchical structureinto account. However, direct application of the Hitas approach can yield infeasible solutions. As such, some modifications had to be made. Given thatwe cannot apply the suppression pattern from one table to another (for publishable cells), several rounds are required to result in a final suppressionpattern. This change of strategy can occasionally result in exact disclosures and/or incomplete protections. To avoid these problems, we looked at thetwo following options:

Statistics Canada, Catalogue no. 12-206-X 16

Page 17: Statistical Methodology Research and Development Program

Option C

1. Suppression for each table.2. Suppression for all tables combined, taking the suppressions in (A) into account.

Option D

1. Suppression for each table.2. Validation of all tables together with an audit to check whether there are exact disclosures or incomplete protections for sensitive data.3. Suppression taking into account the results of (B), i.e., only for the sensitive cells without good protection.

Option D gives a lower or equal run time than Option C, depending on the problem to be resolved. For very complex problems, neither of these twooptions can be used because suppression/validation does not bring all of the tables together.

Therefore, three other approaches were considered:

1. Suppression for each table, but iteratively adding the constraints associated with the suppressed cells from other tables. This method gives a finalsuppression pattern without exact disclosure. However, several iterations are required, which can require a run time two to three times higher thanthe initial problem.

2. The Hypercube method consists of verifying all possible cell combinations, defining the summits of a cube that will protect sensitive cells. Thelinear programming (LP) optimization system does not have to be used with this approach.

3. Taking into consideration how much each combination of cells has moved in a table, and using that information in another table for the samecombination of cells.

Approaches 2 and 3 will be evaluated over the course of the next fiscal year. A document describing the different approaches examined and the resultswill also be produced.

Disclosure control: Weighted data

The current G-Confid system, consistent with the methods used worldwide to calculate the sensitivity of a cell, does not incorporate the informationsupplied by survey weights. Intuitively as the weight of an enterprise increases, the probability of a particular enterprise having been selected into thesurvey decreases. Any method should therefore ensure that the contributions of enterprises with higher weights (i.e., take-some units) decrease thesensitivity measure of the cell.

Current practice is to encourage G-Confid users to treat the enterprise as an anonymous contributor if its survey weight is more than 3. Its non weightedcontribution is still applied, though. Work in the past fiscal year established a set of alternatives, including: (1) using the weighted contribution ofanonymous enterprises in the calculation of the sensitivity measure; (2) apportioning the contribution of an enterprise beyond self-representation, (wi-1)yi, to reduce the sensitivity measure; (3) apportioning the contribution only when the weight is between 1 and 3, and then reducing the sensitivitymeasure with the entire contribution if the survey weight exceeds 3; and (4) creating a synthetic population in which each unit has a value thatcorresponds to a weighted mean (Gagné). For the last alternative, a prototype SAS macro was created that computes the weighted sensitivity. Giventhat the data include only non-negative contributions and no waivers, strong arguments were identified in favour of each of these options over the statusquo. Further investigation of the use of survey weights is needed in the presence of negative values and/or waivers.

Disclosure control: Waivers

Some business survey respondents have signed agreements called waivers, thereby allowing Statistics Canada to publish their data even though it isconfidential. In the case of data tables, waivers permit the publication of cells that would otherwise be suppressed. However, it is necessary to ensurethat the confidentiality is protected for all other respondents in the published cells. A major advance during the fiscal year demonstrated a feasiblesolution to use waivers effectively. The approach involved checking for waivers among the dominant enterprises that contribute to a cell that isconsidered sensitive. The solution, a set of SAS procedures and data steps, was developed and tested using the data set of a G-Confid client atStatistics Canada. Future work will involve the identification of a more generalized solution. Two alternatives are also in consideration: (1) to set thecontribution to zero (Tambay and Fillion), and (2) to reduce the contribution to equal the next highest contribution for which no waiver is available. Futurework involves assessing the merits and shortcomings of these proposals, and to test their applicability using realistic data sets.

Disclosure control: Negative values

G-Confid is designed only to calculate the cell sensitivity using non-negative values, and yet some economic variables (e.g., transfers, losses) includenegative values. Moreover, an enterprise may be dominant in a cell by contributing a “large” negative value. Current practice is to make use of absolutevalues when measuring the sensitivity; this heuristic method has been advocated worldwide for many years (see for example Willenborg and de Waal,2001). During this fiscal year, a new method was proposed that provided a theoretically sound basis by which to evaluate variables with negative values(Gagné). Following a review of both in-house documents as well as published articles by external researchers, a discussion document was prepared thatidentified the strengths and weaknesses of the methods under consideration to treat negative values (Wright). A subset of methods was identified assuperior to other methods. Future work involves testing each of the superior methods using realistic data sets. More work is needed to determine theoptimal approach in the presence of waivers and in connection with the use of survey weights.

Disclosure control: additive controlled rounding

The current G-Confid does only cell suppression to prevent disclosure of confidential data. Another method is to round certain values. A prototype ofadditive controlled rounding has been developed at Statistics Canada and has been used in several social surveys. The method should however beincorporated into G-Confid so that the functionality can be used by anyone at Statistics Canada and be corporately supported.

Statistics Canada, Catalogue no. 12-206-X 17

Page 18: Statistical Methodology Research and Development Program

Several methods and implementation options were tested, and the most appropriate method and its implementation was selected based on executionspeed, maximum size and quality of the results.

The general approach stays essentially the same as the one used by ACROUND, but some aspects were improved while others were simplified. Thenew method was implemented in an SAS macro, the results were tested and the interface documented. The prototype obtained, therefore, is fullyfunctional and can be used.

Technical specifications were produced and communicated to System Engineering Division (SED) following this work, for implementation in a G-Confidmodule.

Automated Coding

The generalized coding system G-Code has as its default method word matching based on the Hellerman algorithm. This year the programming wasdone to also include a string or phrase matching method based on the Levenshtein algorithm. Detailed specifications were provided and extensivetesting was undertaken. The software is expected to be released in July 2014. Guidelines will be written to provide assistance to G-Code users indeciding how to make the most effective use of the two methods. Testing to date indicates that the two methods are complementary, and that the stringmatching is most effective when the strings are short, for example in coding city or country names.

The methodology of the Levenshtein method was documented for inclusion in the G-Code suite of supporting documents.

A course on automated coding was developed and a pilot version was presented in French in March 2013. Many improvements were made to thecourse content following the pilot, and the material was translated. The course was presented in English in September 2013. More iterations of thecourse are planned for the future.

Support was provided to many G-Code clients both internal and external to Statistics Canada.

For further information, contact: Laurie Reedman (613 951-7301, [email protected]).

ReferenceWillenborg, L., and de Waal, T. (2001). Elements of Statistical Disclosure Control. Springer.

Quality AssuranceTraining in quality control methodology and its application is offered to both methodologists and non-methodologists. Training and advice on measuresand practices related to quality control and assurance are provided to individuals both internal and external to Statistics Canada.

Statistical training in quality control and consultationThe quality courses were offered four times this year. There is significant demand for the content to be expanded to cover quality management in abroader sense. This will be done next year. Some information on quality control processes in household surveys was provided to InternationalCooperation Division for inclusion on one of their missions to China. A module on quality assurance was presented in the Survey Skills DevelopmentCourse, in both French and English.

Administrative data

Statistics Canada is exploring ways to exploit data from administrative sources. The goal of this project is to produce an evaluation framework forassessing the quality of administrative data, as part of the process of deciding whether or not we should proceed with the acquisition of the data. Thisproject was jointly funded by the methodology research fund, the Quality Secretariat and the Administrative Data Secretariat. The project was dividedinto three parts.

Part 1 of this project was a literature review. Documents were reviewed from several other national statistical organizations. Similarities and key featureswere noted. Inspiration was drawn from these sources to form the basis for our own evaluation framework.Part 2 of this project was a proof of concept. The Manitoba Centre for Health Policy has developed a set of quality indicators and SAS programs toproduce them, for the purpose of assessing the quality of their data files. An analysis of their methods and tools was undertaken to see how they workand which techniques could be used in our own evaluation framework.

The VIMO indicator (valid, invalid, missing, outlier) used by the Centre for Health Policy was tested on a certain number of variables in the 2011 T1 fileand the results were documented. Its computation is specific to each variable evaluated and its implementation requires good knowledge of thesevariables. The VIMO indicator is considered pertinent and will be included in the tools in the administrative data quality evaluation framework that weredeveloped in accordance with Part 3 of the administrative data quality indicator project.

Part 3 of this project is the production of an evaluation framework. The framework will have three components. One component will be a documentdescribing the other two components. A draft version of this document has been produced detailing the three phases of evaluation; the discovery phase,the exploratory phase, and the acquisition phase. The second component will be a template. The program manager considering a particularadministrative dataset will use this template to objectively assess the dataset according to pre-set criteria, covering such things as the concepts anddefinitions of variables, frequency and timeliness one could expect for receiving the file, and the file format. The third component will be guidelinesrecommending how to validate the variables and content of the dataset once a file has been received.

Knowledge gained in the first two parts of this project was applied in the development of a prototype template. The prototype was given to severalprogram managers for their feedback. Adjustments are being made. The final version of the template will be delivered in the fall of 2014.

Statistics Canada, Catalogue no. 12-206-X 18

Page 19: Statistical Methodology Research and Development Program

The Manitoba Centre for Health Policy SAS programs served as a starting point for validations that could be performed on administrative files beingconsidered here at Statistics Canada. We applied a similar validation strategy to an admin file already in use here (2011 T1 data). The next step will beto generalize these validations into recommended practices and to formulate this as evaluation guidelines.

For further information, contact: Laurie Reedman (613 951-7301, [email protected]).

Statistical TrainingThe Statistical Training Committee (CFS) coordinates the development and delivery of 27 regularly scheduled courses in survey methods, samplingtheory and practice, questionnaire design, time series methods and statistical methods for data analysis. During the year 2013-2014, 33 scheduledsessions (96 days of training) were given, in either English or French. Note that the total number of training days increased by 39% compared to 2012-2013 (69 days of training).

The suite of courses continues to expand as the courses:

“0460: An Introduction to Collection Methods: A Total Survey Error Perspective” “0494: Introduction to Repeated Survey Data Analysis” “04381: Disclosure control methods for tabular data using G-Confid”

were offered for the first time in 2013-2014. New courses on “Outliers values” and on “Data Mining” are currently being developed.

For further information, contact: François Gagnon (613 951-1463, [email protected]).

Symposium 2013The 2013 International Symposium on Statistics Canada methodology issues took place October 16 to 18, 2013 at the Ottawa Conference Centre. TheSymposium, entitled “Producing reliable estimates from imperfect frames”, invited members of the statistical community, whether from private researchorganizations, governments, or universities, to attend, particularly if they had a special interest in methodological issues resulting from the use ofimperfect frames. In total there were 264 participants from Methodology, 15 from Statistics Canada outside of Methodology and 137 from outsideStatistics Canada.

Olivier Sautory from the Institut national de la statistique et des études économiques (INSEE) (France) gave the welcoming speech, and the WaksbergAward was awarded to Ken Brewer. The Scientific Committee developed a program of over 60 presentations from a dozen countries. The presentationsfocused on the following topics: multiple frames; data collection methods; two-phase sampling; modelling and estimation; alternative designs; use ofadministrative data to create sampling frames and for estimation; census alternatives; indirect/adaptive sampling; web surveys; challenges withimperfect sampling frames; surveys among hard to reach populations; development of sampling frames for censuses and large surveys.

The Logistics Committee looked after tasks such as translation, facilities and audiovisual components. The Registrar’s Committee managed registration.The Symposium Compendium should be published in the summer of 2014.

For further information, contact: Pierre Lavallée (613 951-2892, pierre.lavallee @statcan.gc.ca).

Survey Methodology JournalSurvey Methodology is an international journal that publishes articles in both official languages on various aspects of statistical development relevant toa statistical agency. Its editorial board includes world-renowned leaders in survey methods from the government, academic and private sectors. It is oneof very few major journals in the world dealing with methodology for official statistics and is available at the following URL:

www.statcan.gc.ca\SurveyMethodology.

This year was an important one for the journal as publication switched from a print and pdf version to an electronic only version. Survey Methodology isnow available online for free in both PDF and fully accessible HTML version. The scientific nature of the journal with its high content of mathematicalformulae presented extra challenges for the production of a fully accessible HTML version.

The PDF version of the June 2013 issue, SM 39-1, was released on June 28, 2012. The issue contains 9 papers.

The December 2013 issue, SM 39-2, was released on January 15, 2014. It contains 9 papers including the thirteenth in the annual invited paper seriesin honour of Joseph Waksberg. The recipient of the 2013 Waksberg Award is Ken Brewer.

In 2013, the journal received 73 submissions of papers by various authors.

For further information, contact: Susie Fortier (613 951-4751, [email protected]).

Statistics Canada, Catalogue no. 12-206-X 19

Page 20: Statistical Methodology Research and Development Program

Research papers sponsored by the Methodology Research andDevelopment ProgramAbeysundera, M. (2013). Epigenetics: A review of the literature. Internal document, Statistics Canada.

Beaumont, J.-F. (2014). The Analysis of Survey Data Using the Bootstrap. In Contributions to Sampling Statistics, (Eds., F. Mecatti et al.), SpringerInternational Publishing, Switzerland (to appear).

Beaumont, J.-F., Béliveau, A. and Haziza, D. (2014). Clarifying some aspects of variance estimation in two-phase sampling. Scandinavian Journal ofStatistics (submitted).

Beaumont, J.-F., Bocci, C. and Hidiroglou, M. (2014). On weighting late respondents when a follow-up subsample of nonrespondents is taken. Paperpresented at the Advisory Committee on Statistical Methods, Statistics Canada, May 2014, Ottawa.

Beaumont, J.-F., Fortier, S., Gambino, J., Hidiroglou, M. and Lavallée, P. (2014). Some of Statistics Canada’s Contributions to Survey Methodology. InStatistics in Action, A Canadian Outlook, (Ed., J.F. Lawless), Chapter 2, The Statistical Society of Canada, Chapman & Hall.

Beaumont, J.-F., Haziza, D. and Bocci, C. (2014). An Adaptive Data Collection Procedure for Call Prioritization. Journal of Official Statistics (underreview).

Beaumont, J.-F., Haziza, D. and Ruiz-Gazen, A. (2013). A unified approach to robust estimation in finite population sampling. Biometrika, 100, 555-569.

Bocci, C., Beaumont, J-F. and Hidiroglou, M.A. (2014). On weighting late respondents when a follow-up subsample of nonrespondents is taken.Presentation made at the Advisory Committee on Statistical Methods held in May 2014.

Dasylva, A. (2013). Applying the Spectral Covariance (SC) Method to Probabilistic Record Linkage. Internal report, Oct. 2013.

Dasylva, A. (2014). Reducing the Structure of Statistical Models for Probabilistic Record Linkage. Internal report, submitted to JSM, May 2014.

Dasylva, A., and Haddou M. (2014). Quality Guidelines for Record Linkage, Estimation and Analysis with Linked Data at Statistics Canada. Internalreport, May 2014.

Estevao, V., Hidiroglou, M.A. and You, Y. (2014). Small Area Estimation for the area level model with hierarchical Bayes estimation. Methodologyspecifications. Internal document, Statistics Canada.

Favre-Martinoz, C., Haziza, D. and Beaumont, J.-F. (2014). A method of determining the winsorization threshold, with an application to domainestimation. Survey Methodology (submitted).

Fortier, S. (2013). Benchmarking methods for calendarisation, presented at the ESSnet Admin Data workshop, Estonia, May 2013.

Fortier, S., and Gellatly G. (2014). Behind the Data: Frequently Asked Questions about Seasonally Adjusted Data, in Economic Insights, StatisticsCanada Catalogue 11-626-X, accepted for publication.

Fortier, S., Picard, F. and Matthews, S. (2013). Trend estimates for sub-annual surveys, presented to Statistics Canada’s Methods and StandardsCommittee, October 2013.

Fox, K., and Brehaut, J. (2014). Understanding proxy responses: A pilot cognitive test of factors potentially related to proxy quality. Internal document,Statistics Canada.

Gagné, C. (2013). Sur la sensibilité des données pouvant contenir des valeurs négatives. Internal document, Statistics Canada.

Statistics Canada, Catalogue no. 12-206-X 20

Page 21: Statistical Methodology Research and Development Program

Gagné, C. (2013). Usage des poids pour déterminer la sensibilité. Internal document, Statistics Canada.

Haddou, M. (2014). Bootstrap Variance Estimation Specifications—Aboriginal Peoples Survey. Internal document, March 2014.

Hidiroglou, M.A., and Nambeu, C.O. (2013). Calibration with an estimated calibration total. SSC Annual Meeting, May 2013. Proceedings of the SurveyMethods Section.

Hidiroglou, M.A., and Nambeu, C.O. (2013). Calibration with an estimated calibration total. Proceedings: Symposium 2013, Producing reliable estimatesfrom imperfect frames.

Lachance, M. (2014). MixMatch 1.0 for SAS User Guide May 2014.

Laflamme, F. (2013). Responsive Collection Design (RCD) Framework for Multi-Mode Surveys, International Nonresponse Workshop (London, UK) andAdvisory Committee on Statistical Methods.

Laflamme, F. (2013). Using Paradata to Manage Responsive Collection Design. Advances in Adaptive and Responsive Survey Design, Heerlen,Netherlands).

Le Petit, C., and Fortier, S. (2014). The Labour Force Survey - Trend Analysis in a Volatile Labour Market, presented to the Toronto Association ofBusiness Economists meeting, Toronto, February 2014.

Mach, L. (2014). Bayesian Methods with Survey Data. Internal document, Statistics Canada.

Mantel, H. (2013). Confidence Intervals Small Proportions Estimated from Complex Surveys (draft). Statistical Research and Innovation Division.

Mantel, H., and Hidiroglou, M. (2014). Options for a One-Stage Design for the Canadian Labour Force Survey. Technical note, Statistical Research andInnovation Division.

Matthews, S., Ferland, M. Patak, Z. and Fortier, S. (2013). Seasonal adjustment in volatile economic situations: Statistics Canada’s experience.Proceedings of the Business and Economic Section of the American Statistical Association, Montréal.

Merkouris, T. (2014). An efficient estimation method for matrix survey sampling. Accepted for publication in Survey Methodology.

Millville, M.-H., Abeysundera, M. and Fox, K. (2014). Evaluation of the Effect of a Change of Sampling Frame in Estimated Statistics through SimulationsTechnical Report. Internal document, Statistics Canada.

Moussa, S., and Fox, K. (2013). The use of proxy response and mixed mode data collection. Proceedings: Symposium 2013, Producing reliableestimates from imperfect frames.

Nambeu, C.O., and Hidiroglou, M. (2013). Programme des estimations de la population et utilisation du système d’estimation pour petits domaines.Internal document, Statistics Canada.

Picard, F. (2014). Relationship between Hodrick Prescott filter, spline and SSM models. Working paper in development.

Quenneville, B. (2013). Estimating Variance in X-12-ARIMA. Submitted for publication to Survey Methodology.

Quenneville, B., Picard, F. and Fortier, S. (2013). State Space Models for Temporal Distribution and Benchmarking. Presented at the Joint StatisticalMeeting of the American Statistical Association, Montréal.

Rivière, P., and Rubin-Bleuer, S. (2013). Microstrata. Proceedings of the Survey methods Section, 2013 Statistical Society of Canada AnnualConference.

Rubin-Bleuer, S., Godbout, S. and Jang, L. (2014). The Pseudo-EBLUP estimator for a weighted average with an application to the Canadian Survey ofEmployment, Payrolls and Hours. SRID-2014-001E.

Rubin-Bleuer, S., and Jang, L. (2013). Workshop on Small area Estimation, Statistics Canada booklet with slides.

Rubin-Bleuer, S., Julien, P.O. and Pelletier, E. (2014). Feasibility study on small area estimation for RDCI. Internal BSMD and SRID report, March 2014.

Saidi, A. (2014). Overview of record linkage at Statistics Canada. Presentation made at the Advisory Committee on Statistical Methods held in May2014.

Schiopu-Kratina, I., Fillion, J.-M., Mach, L. and Reiss, P.T. (2014). Maximizing the conditional overlap in business surveys. Journal of Statistical Planningand Inference, 98-115.

Tambay, J.-L., and Fillion, J.-M. (2013). Strategies for Processing Tabular Data Using the G-Confid Cell Suppression Software. Presented at the 2013Joint Statistical Meetings, Montreal, August 4-8, 2013.

Verret, F., Rao, J.N.K. and Hidiroglou, M.A. (2014). Model-based small area estimation under informative sampling.Paper submitted to SurveyMethodology.

Wright, P. (2014). Options pour traiter les valeurs négatives. Internal document, Statistics Canada.

Xu, C., Chen, J. and Mantel, H. (2013). Pseudo-likelihood-based Bayesian information criterion for variable selection in survey data. SurveyMethodology, 39, 2, 303-321.

You, Y. (2014). Small area estimation using unmatched spatial models with application in the Canadian Community Health Survey. Methodology BranchWorking Paper, SRID-2014-02E, Statistics Canada, Ottawa, Canada.

You, Y., Rao, J.N.K. and Hidiroglou, M. (2013). On the performance of self Benchmarked small area estimators under the Fay-Herriot area level model.Survey Methodology, 39, 1, 217-229.

Statistics Canada, Catalogue no. 12-206-X 21