1 a workshop on using r to select a sample for ehes susie cooper & johan heldal statistics...
TRANSCRIPT
1
A workshop on using R to select a sample for
EHES
Susie Cooper & Johan Heldal
Statistics Norway
2
Overview
• What is R and why use it?• Practical Exercises
1. Installing and loading R and packages2. Reading external files3. Calculating sample sizes4. Stage 1 - Selecting Primary Sampling Units (PSU)5. Stage 2 - Selecting Secondary Sampling Units (SSU)
• Where to get more information
3
Why use R for EHES?
• It has been agreed with EU because• It’s free - therefore available for all
countries involved.• Very flexible• Very powerful and fast tool for sampling
and analyses.
However…• There can be a steep learning curve to
using the program.• No user-friendly interface.
4
What is EHESsampling?
• A tool for planning the sampling design• Can be used to find good stratifications• Can calculate cost-variance optimal
sample sizes within PSUs.• Can calculate costs and variances of
alternatives.
• A tool for taking a probability sample from a sampling frame.
5
Using EHESsampling
• The EHESsampling manual• Before using EHESsampling you have to
prepare some input datasets from the main sampling frame. For sampling at stage 1 you need• A dataset describing the PSUs• A dataset describing the strata
For stage 2 you need • The main sampling frame describing the
individual units
6
1. Loading Packages
• Load the EHESsampling package and other necessary packages each time you re-open R:
library(EHESsampling)
7
2. Reading External Files
• Open a new script by selecting File and New script
8
2. Reading External Files
• Set the working directory where data files are stored by typing into the new script:
setwd("X:/120/EHES/R/Data")
• Then press + R to send the line to the console
Location on your computer where the data files are stored
9
2. Reading External Files
• Read in the chosen file and save it in the working environment.
PSUs.df<-read.table("post1000.csv", sep=";", dec=",", header=T)
• The file is now stored as PSUs.df for this session.
10
• To see the start of the data set type:
head(PSUs.df)
2. Reading External Files
Print the first 6 lines of this
24
Further Sampling Steps
• Read in the strata dataset• Calculate the PSU sample sizes• Take a sample of PSUs – stage 1• Merge the selected PSUs with the
main sampling frame containing individual units.
• Sample individual units – stage 2
25
Selected Individuals
26
Help!
• EHESsampling manual available at:www.ehes.info
• EHES participant manual – Part 1: Chapter 05
• R websites: • R official site: www.r-project.org• Quick R: www.statmethods.net
• Us:• [email protected]• [email protected]