the rapid registrations cancer dataset
TRANSCRIPT
The Rapid Registrations Cancer Dataset
Sean McPhail
Principal Analyst, National Disease Registration Service
Intended Audience & Learning Goals
Technical brief for researchers, ‘cancer data people’, information leads/
managers interested in the capabilities of the Rapid Registration (RR) Data.
Not presenting finalised 2020 time series data.
• What is the Rapid Registration Data?
• Survey of data features and quality issues to be aware of
• How do I use/access the data?
• Q&A
2 The Rapid Registrations Cancer Dataset
Please also refer to the data quality document
Learning Goals
• What is the Rapid Registration Data
• Data features and quality issues to be aware of
• 2020 data sneak peak
• How do I access/use the data?
• Q&A
What is the Rapid Registration Data
• Background – ‘normal’ cancer registration
• Rationale for creating the RR dataset
• Build process for the RR dataset
• RR Structure: Tables & field lists
4 The Rapid Registrations Cancer Dataset
Encore
2018 ONS FIGURES517,604
quality >98%
timeliness~18 months
5 The Rapid Registrations Cancer Dataset
Rapid Registrations Rationale
6 The Rapid Registrations Cancer Dataset
Build process for the RR dataset
CWT
HES
COSD RTDS
SACTRapid
Registrations
Arriving at NDRS within 4 months:
• Cancer Outcomes & Services Dataset
• Hospital Episode Statistics
• Cancer Waiting Times
• Systemic Anti-Cancer Therapy
• Radiotherapy DataSet
7 The Rapid Registrations Cancer Dataset
Build process for the RR dataset
1. Summarise everything useful into an unified events ‘pathway’ table.
2. Refine, combine and filter those events into rapid “proxy” tumour diagnoses
– the best performance so far from case-finding using COSD data.
3. Add auxiliary information to bare tumour & patient detail.
4. (Note - patients in pathway table >> patients in tumour table.)
8 The Rapid Registrations Cancer Dataset
Rapid Registration tumour table inclusion
1. Created for 2018-01-01 to most recently available data
2. Covers all malignant cancers (excl. NMSC) and selected non-malignant
cancers: Breast, Bladder & Brain (ICD-10 C00-C97 excl. C44, plus D05,
D09, D32, D33, D35, D41, D42-D44)
3. Persons appearing in English COSD data – includes cross-border cases.
9 The Rapid Registrations Cancer Dataset
Rapid Registration structureRAPID_TUMOURTUMOUR_AVPID
INDIVIDUALID
PATIENTID
NHSNUMBER
DIAGNOSIS DATE
BASIS OF DIAGNOSIS
TUMOUR SITE
TUMOUR MORPHOLOGY
ROUTE TO DIAGNOSIS (HES)
CHARLSON COMORBIDITY
STAGE
BIRTH DATE
SEX
POSTCODE
SURNAME
FORENAME
ETHNICITY
RAPID_PATHWAYAVPID
INDIVIDUALID
PATIENTID
NHSNUMBER
SOURCE_TABLE
SOURCE ID
EVENT_TYPE
EVENT_PROPERTY_1
EVENT_PROPERTY_2
EVENT_PROPERTY_3
EVENT_DATE
EVENT_END
TRUST_CODE
ID fields
Patient fields
Tumour fields
“What, where & when”
34 event types summarising 5 data types
10 The Rapid Registrations Cancer Dataset
Aside: Encore, CAS & monthly snapshots
So… a monthly system and “CAS2010” is the 2020 October snapshot (CASyymm)
Cancer Analysis
System (CAS)
inc. COSD,
SACT, & RTDS
Encore
monthly
External (NHSD /
NHSE)
monthly(-ish)
CAS-reference
(CASREF) inc.
HES and CWTNHSnumberlinkage+Rapid Reg.
Learning Goals
• What is the Rapid Registration Data
• Data features and quality issues to be aware of
• 2020 data sneak peak
• How do I access/use the data?
• Q&A
Data features and quality issues
• Comparing to normal registration data
• False positives and false negatives
• Stage at diagnosis
• “snapshot creep”
• Summary
12 The Rapid Registrations Cancer Dataset
Comparing to normal cancer registration data, 2018 cases
CR
Lung
Br Pr
Mel. Bladder
C00 C97D05 D45
Cancer type by ICD-10 code
13 The Rapid Registrations Cancer Dataset
Comparing to normal cancer registration data, 2018 cases
14 The Rapid Registrations Cancer Dataset
Comparing to normal cancer registration data, 2018 cases
15 The Rapid Registrations Cancer Dataset
Further defining quality – False Positives, False Negatives
Compare rapid registration data to normal registrations, Apr 2018 to Sept 2018.
Define a matched/successful proxy registration as:
1. Matching in a broad cancer group (e.g., C18-C21, or C50 & D05)
2. Matching diagnosis date to within 3 months
Errors:
False negative (FNE) – real registration but no proxy registration (“missing data”)
False positive (FPE) – no real registration but proxy registration (“bad data”)
Note assumption – COSD data behaviour in 2020 ~ that in 2018
16 The Rapid Registrations Cancer Dataset
False Positives (5%), False Negatives (18%)
17 The Rapid Registrations Cancer Dataset
False Positives, False Negatives
18 The Rapid Registrations Cancer Dataset
What are the False negatives?
Cancers not represented in
COSD…
Age dependence is explained
by more clinical pathways in
older people (plus DCOs).
19 The Rapid Registrations Cancer Dataset
What are the False Positives?
Not so clear cut, but approximately:
30% ‘Tricky’ tumour sites – (Unknown/Sarcoma)
20% D-code / C-code mismatch in COSD and normal registration
12.5% COSD tumours proved mistaken and deleted
5% Melanoma / NMSC confusion
32.5% Tumours diagnosed in previous years reoccuring?
Proportion of FNEs & FPEs treated
20 Presentation title - edit in Header and Footer
FNE FPE
SACT 13% 22%
RTDS 9% 14%
Surgery 26% 14%
Any 39% 40%
21 The Rapid Registrations Cancer Dataset
Relaxing the matching criteria
22 The Rapid Registrations Cancer Dataset
Stage at diagnosis (Big 4 cancers) – TNM stage
Proxy Stage
Normal stage 1 2 3 4 Unk
1 89% 9% 2% 1% 30%
2 6% 83% 5% 1% 16%
3 3% 5% 85% 3% 12%
4 1% 2% 6% 95% 16%
Unknown 2% 2% 2% 1% 26%
100% 100% 100% 100% 100%
0%
5%
10%
15%
20%
25%
30%
35%
1 2 3 4 Unknown
Pro
po
rtio
n o
f tu
mo
urs
Big 4 cancers, stage completeness, TNM Group
Proxy Normal
“~80% of (Big 4) cancers are staged, of those 80-90% are correct.”
23 Presentation title - edit in Header and Footer
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
1 2 3 4 Unknown
Pro
port
ion o
f tu
mours
Breast cancer
Proxy Normal
0%
5%
10%
15%
20%
25%
30%
35%
1 2 3 4 Unknown
Pro
port
ion o
f tu
mours
Colorectal cancer
Proxy Normal
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
1 2 3 4 Unknown
Pro
port
ion o
f tu
mours
Lung cancer
Proxy Normal
0%
5%
10%
15%
20%
25%
30%
35%
40%
1 2 3 4 UnknownP
roport
ion o
f tu
mours
Prostate cancer
Proxy Normal
24 The Rapid Registrations Cancer Dataset
Stage at diagnosis – Early/Late stage
Proxy Stage
Normal stage Early Late UnkEarly 93% 5% 46%
Late 5% 94% 28%
Unknown 2% 2% 26%
100% 100% 100%0%
10%
20%
30%
40%
50%
60%
Early Late Unknown
Pro
po
rtio
n o
f tu
mo
urs
Big 4 cancers, stage completeness, Early vs Late
Proxy Normal
25 The Rapid Registrations Cancer Dataset
‘snapshot creep’
26 The Rapid Registrations Cancer Dataset
‘snapshot creep’
27 The Rapid Registrations Cancer Dataset
Data Summary
• Overall error rate:
• False Positive: 5%
• False Negative: 18%
• Older persons and more clinical pathways are under-represented
• Strong variation in quality with cancer type
• “creep”: completeness in the most recent months increases over >1 month
Please also refer to the data quality document
Learning Goals
• What is the Rapid Registration Data
• Data features and quality issues to be aware of
• 2020 data sneak peak
• How do I access/use the data?
• Q&A
How do I access / use the data?
• Which datasets?
• Capabilities of the RR data
• Accessing the data
• Ongoing development
29 The Rapid Registrations Cancer Dataset
Which datasets?
Cancer Analysis
System (CAS)
inc. COSD,
SACT, & RTDS
CAS-reference
(CASREF) inc.
HES and CWT
RAPID_TUMOURTUMOUR_AVPID
INDIVIDUALID
PATIENTID
NHSNUMBER
RAPID_PATHWAYAVPID
INDIVIDUALID
PATIENTID
NHSNUMBER
SOURCE_TABLE
SOURCE ID
30 The Rapid Registrations Cancer Dataset
Capabilities of the Rapid Reg. dataset
The Good
Reasonable proxy for cancer registrations (in 2018 overlap).
Typically 4-5 month delay (varying with data type).
Strong potential to link to other datasets.
The less good
Some systematic biases in FNE (missing registrations) in particular.
2018 data ~ 2020 data?
31 The Rapid Registrations Cancer Dataset
Accessing the data
General principles:
• Not trying to replace gold standard cancer registration data.
• Use where normal registration data doesn’t cover period of interest…
& where question can be adequately answered given limitations.
• Generally restricting patient data availability to those in tumour table.
Available through PHE’s Office for Data Release (from late October):
https://www.gov.uk/government/publications/accessing-public-health-england-data
http://www.ncin.org.uk/cancer_type_and_topic_specific_work/topic_specific_work/covid19
32 The Rapid Registrations Cancer Dataset
Ongoing development
• Monthly builds incorporating data as it comes in.
• Extend staging to new tumour types.
• Attempt to lower FNE without raising FPE.
• Add other events/fields, depending on demand?
User feedback is very welcome.
Learning Goals
• What is the Rapid Registration Data
• Data features and quality issues to be aware of
• 2020 data sneak peak
• How do I access/use the data?
• Q&A
Any Questions?
Many thanks to:
Tom Bacon
Jackie Charman
Cong Chen
Carolynn Gildea
Sophie Jose
Ravneet Sandhu
Sally Vernon
This work uses data that has been provided by
patients and collected by the NHS as part of their
care and support. The data are collated, maintained
and quality assured by the National Cancer
Registration and Analysis Service, which is part of
Public Health England (PHE).