matching plasc and alspac plasc/npd user group workshop 13 th september 2006 andy boyd...

Post on 28-Mar-2015

221 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Matching PLASCand ALSPAC

PLASC/NPD User Group Workshop13th September 2006

Andy Boyd(a.w.boyd@bristol.ac.uk)

David Herrick(david.herrick@bristol.ac.uk)

What is ALSPAC?

“Avon Longitudinal Study of Parents and Children”

Cohort study of children and their parents, based in south-west England

Designed to determine ways in which the individual’s genotype combines with environmental pressures to influence health and development

Study design

Eligibility criteria: Mothers had to be resident in Avon and have an expected date of delivery between 1st April 1991 and 31st December 1992

Avon was broadly representative of the UK as a whole and has a relatively stable population

Enrolled sample of 14,541 pregnancies resulting in 14,062 live born children

Data

Self Completion Questionnaires Hands on Measurements Biological Samples Health Records Education Records Direct School Contact

ALSPAC at School

Educational Data - Primary

Contact with ~350 primary schools in the four local LEAs:• Bristol• South Gloucestershire• North Somerset• Bath and North East Somerset

Private & special schools included Parental contact for out of area cases

Educational Data - Primary

Questionnaires in Year 3 & Year 6:• School (Head teacher)• Class (Class teacher)• Child (Class teacher)

Year 4 test: Maths Year 6 tests: Maths, Spelling,

Science

Educational Data - Secondary

Questionnaire for maths teachers in 2002/3 (Year 7) & 2004/5 (Years 7, 8 & 9) and associated class lists

Year 6 maths test repeated in Year 8 Moving away from direct school

contact

Educational Data - SATS

Entry Assessment & KS1 data on eligible children at local schools acquired directly from the LEAs

Linkage to NPD:• Increased coverage• Easier linking (UPN)• PLASC as well

Study Approval & Cohort Matching

Ethics & study approval The Fischer Trust Validating the cohort match Anonymizing the data set Issues encountered

Ethics & Study Approval

ALSPAC Ethics & Law committee LREC (NHS research ethics committee)

‘Eligible’ vs. ‘Enrolled’ cohort Final research file to be anonymous DfES commissioned a third party,

The Fischer Trust, to conduct the cohort/data match

The Fischer Trust

An intermediately between ALSPAC and the DfES

FT received both ALSPAC and NPD datasets and conducted the cohort match.

FT created it’s own ID (however we were also provided with UPN)

Cohort match variables

Details for 20551 children provided: Child Surname Child Forename Child Date of Birth Home Postcode School Indicator (name & address) from

ALSPAC schools data collection

Validating the cohort match

For our methodology, study requirements we wanted to reverse check the match

FT matched 86% cases provided (17671 cases)

Very few errors found (<0.5%)

FT matches by variable

FT 'match level' variable

Problems with the match variables

Child Surname (change over time) Child Forename (familiar names) Child Date of Birth Home Postcode (out of date and lost

cases) School Indicator (name & address) from

ALSPAC schools data collection (depended on school participation and out of date information)

Anonymizing the data set

UPN transferred to new internal ID and then to new collaborator ID

Personal variables dropped (DoB, names, postcode, age at census)

Identifying variables dropped (care authority)

Variables recoded (ethnicity, SEN) LEA & Estab Ids recoded into our own

unique ALSPSCHL_ID

Issues encountered

Cases not covered by NPD REE – not including old schools Primary to junior succession Children who resit years or are in a

non natural school year Historical records of school

movement

Issues - UPN

We discovered that the U in UPN isn’t that unique!

215 ALSPAC cases have multiple UPNs (with no clear pattern as to why)

PLASC 2004 has two ALSPAC children with the same UPN

Sample

At least 1 PLASC return identified for 11,997 (85%) of the 14,062 enrolled live births:• 2002 - 11,850 (84%)• 2003 - 11,731 (83%)• 2004 - 11,473 (82%)

Balance:• Private schools• Home educated• Outside England• Not identified

Supplied Documentation

ALSPAC & PLASC

Editing (1)

Convert string variables to numeric, label and sort missing values and write documentation.

Calculate age at census. From date of entry derive age on starting

at current school and length of time at current school.

Derive expected NCYG (National Curriculum Year Group).

Editing (2)

Ethnicity: 39 cases had new ethnicity codes in 2002 – these were mapped back to old codes and an equivalent to main category derived. Also derive white/non-white indicators.

Care: In 2003 17 of the 34 cases marked as currently in care were marked as N for ever in care. Did not occur in 2004.

Unanswered questions

6.6% of children were not in the expected NCYG in 2002 compared with 0.7% in 2003 and 2004.

Large increase in use of code T for ethnicity source between 2003 & 2004, even if restricted to Year 7 only.

Ethnicity Source (1) 2003 2004

N % N %

C Child 625 3.8 968 6.1

P Parent 12418 76.1 9998 62.5

S Current school 3178 19.5 2530 15.8

T Previous school 17 0.1 2354 14.7

O Other 89 0.5 150 0.9

16327 16000

Ethnicity Source (2) 2003 (Yr 7) 2004 (Yr 7)

N % N %

C Child 534 13.8 472 5.2

P Parent 2517 65.1 4956 54.6

S Current school 796 20.6 1229 13.5

T Previous school 5 0.1 2326 25.6

O Other 17 0.4 89 1.0

3869 9072

Age on starting current schoolPLASC 2002

26 2 17571

6984

736 731

5034

1222 988

171

0

1000

2000

3000

4000

5000

6000

7000

8000

0 1 2 3 4 5 6 7 8 9 10

Illegal Values (1)

Numeric codes in Boarder field (should be only ‘B’ or ‘N’) – 2 cases in 2002, 7 in 2003 and 13 in 2004.

Code ‘1’ in for NCYG in 2003 for child in secondary school who was expected to be in Year 7 and who was recorded as in Year 6 in 2002 and Year 8 in 2004.

Illegal Values (2)

X in NCYG in 2004 – 2 cases. A small number of cases are missing

important fields like date of entry, NCYG.

3 cases had the same code for primary and secondary SEN types.

Uses Identifying Developmental Impairments:

• Investigating the use of early life parental questionnaires to predict later problems.

• SEN types used to identify autism, speech/language problems and possible learning difficulties.

• Twin approach with medical database searches.

Autism project. Ethnicity.

Wish List

Detailed documentation describing how different fields relate (especially for SATs).

Numeric fields supplied as numeric rather than string.

top related