(i can't get no) saturation: a simulation and guidelines for minimum sample sizes in...
TRANSCRIPT
Copernicus Institute of Sustainable Development
(I Can’t Get No) Saturation: A Simulation and Guidelines for Minimum Sample Sizes in Qualitative Research
Frank van Rijnsoever [email protected]
Copernicus Institute of Sustainable Development
A random conversation…• Question: How many interviews do I
need to do?
• Answer: It depends…
• Question: Depends on what?
• Answer: It depends on who you ask.
• Answer: But since you asked me, I will give you my version of events.
Copernicus Institute of Sustainable Development
Introduction (1)• Inductive qualitative research
Is becoming more popular (Bluhm, Harman, Lee, & Mitchell, 2011)• Innovation policy, transition studies• Useful for exploring new concepts, theories,
and processes of change in an in-depth manner, among other things…
Increased attention to methodology (Suddaby, 2006)
Sample size is a debated topic.• Laborious process, don’t oversample too much.• Typical recommended sizes: 15 - 25.• Little rules (Patton, 1990), except ‘experience’ and
‘judgement of the researcher’ (Sandelowski, 1995).
Copernicus Institute of Sustainable Development
Introduction (2)Aim• “this paper explores the sample size that is required to reach
theoretical saturation in various scenarios and to use these insights to formulate guidelines about purposive sampling.”
Simulation• Insights in mechanisms behind purposive sampling
Contributions
• Theoretical basis for sample size• Guidelines for practitioners
Copernicus Institute of Sustainable Development
My way of thinking
Copernicus Institute of Sustainable Development
Theoretical concepts• A population is the “universe of units of analysis” from which a sample can
be drawn. • Does not have to be the same as the unit from which information is
gathered.• Population size = N
• Codes emerge from information sources that are part of a population.• Informants for interviews, existing documents, etc.• Denoted as i
• At each sampling step an information source is sampled from the population.• Part of an iterative process that includes data collection, analysis, and
interpretation • Number of sampling steps = n
Copernicus Institute of Sustainable Development
Theoretical concepts• Codes represent information.
“tags” or “labels” on unique pieces of information (Bryman, 2013), e.g. concepts, properties, relationships between other codes.
Each code represents only one piece of information, there are no synonyms Denoted as c
• Theoretical saturation is reached when each code in the population is observed at least once. Two factors influence the number of sampling steps towards theoretical saturation: the number of codes and the mean probability of observing codes
Denoted as ns
• Purposive sampling implies informed estimation of these factors Complexity of the research question The likelihood of an information source actually containing the code, The willingness and ability of the source to let the code be uncovered, and The ability of the researcher to observe the code.
Copernicus Institute of Sustainable Development
Theoretical concepts• In this paper I test the number of sampling steps required for
saturation based on three typical theoretical ‘sampling scenario’s.’
Random chance: random sampling Minimal information: each sampling step yields an information
sources with at least one new code. Maximal information: each sampling step yields an information
sources with the largest possible number of new codes.
• I simulate hypothetical populations in which I vary the number of codes (k) and the mean probability of observing codes (.)
Copernicus Institute of Sustainable Development
Some mathematical notation• Codes are stored in a vector of 0 and 1 of length k. Information sources are denoted by
i. • -> for example: (0,1,1,1,0,0,1)
• The probability that a code is present is represented by a random Bernouli trial Φ. All codes probabilities together form a vector of length k.
• The probability that theoretical saturation is reached ) based on random chance is given by,
where n is the number of sampling steps
• If all values of are the same (), then this becomes:
• When is the number of sampling steps to reach theoretical saturation given , k and This can be rewritten to:
• If we add a minimum number of repetitive codes (v) the formulas become: • and
• Only under very specific assumptions can we calculate theoretical saturation. • Useful for calibrating my simulation!
Copernicus Institute of Sustainable Development
Methods• The distribution of probabilities of vector
can be represented by the beta-distribution.
• Input for simulations Simulate hypothetical populations• N by k matrices with values 0 and 1
Systematically vary and k• are 1, 2, 3, … 10• k = 1, 11, 21, 31, … 101• N=5000• 1100 hypothetical populations
For all three scenarios Set to 0.95 (probability reaching ns)• 500 trials per population
Copernicus Institute of Sustainable Development
Scenario’s
Copernicus Institute of Sustainable Development
Copernicus Institute of Sustainable Development
Results: sample size at ns
Copernicus Institute of Sustainable Development
Main findings• is more important than k to reach theoretical
saturation.
• Purposive sampling typically requires less than 50 sampling steps. A common value is around 20. This is the same range as in the literature.
• Little differences between minimal and maximal information.
Minimal information gives more repetitive codes. Trade-off between efficiency and repetition.
Copernicus Institute of Sustainable Development
Guidelines for purposive sampling
1. Identify a population of information sources, and subpopulations.
2. Estimate the number of codes per sub-population.3. Estimate the mean probability of a code being observed. 4. Set a degree of certainty to reach theoretical saturation. 5. Assess which scenario is most applicable to each sub-
population. 6. Choose a fitting sampling strategy 7. Account for these steps when reporting the research.
In general: working under the assumptions of minimal information seems reasonable.
Copernicus Institute of Sustainable Development
Limitations• Not empirical
Not possible. Not required.
• Mechanistic approach But in line with the assumptions of qualitative
research. Everyone is free to apply the results as he or she
wishes. Mixture of scenarios is possible.
• Not all possibilities are simulated But enough variation to capture plausible
conditions.