discovering patterns in adverse drug reactions

22
Discovering Patterns in Adverse Drug Reactions Student: Ernst Joham Supervisor: Associate Prof Jiuyong Li Associate Supervisor Dr. Jan Stanek

Upload: xylia

Post on 19-Jan-2016

48 views

Category:

Documents


0 download

DESCRIPTION

Discovering Patterns in Adverse Drug Reactions. Student: Ernst Joham Supervisor: Associate Prof Jiuyong Li Associate Supervisor Dr. Jan Stanek. Outline. Background Motivation Research questions Literature review Data Mining process Results Conclusion. Background. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Discovering Patterns in Adverse Drug Reactions

Discovering Patterns in Adverse Drug Reactions

Student: Ernst Joham

Supervisor: Associate Prof Jiuyong Li

Associate Supervisor Dr. Jan Stanek

Page 2: Discovering Patterns in Adverse Drug Reactions

2

• Background• Motivation• Research questions• Literature review• Data Mining process• Results • Conclusion

Outline

Page 3: Discovering Patterns in Adverse Drug Reactions

3

• What is data mining?

Data mining is used to discover unexpected, interesting and valuable information in datasets.

• High percentage of patients admitted or prolonged hospitalisation is due to ADRS.

• What can cause ADRS?• Amount of dosage given to patients• More then one drug taken at the same time

• Ingredients in drugs which can result in adverse reaction.

 

Background

Page 4: Discovering Patterns in Adverse Drug Reactions

4

Background

• Problems with medical datasets• Medical data is more diverse and complex• Ethical and legal issues• Data quality

• Missing values• Noise

• Ownership• Lack of information

Page 5: Discovering Patterns in Adverse Drug Reactions

5

Motivation

• To have a successful outcome in discovering patterns for medical datasets

• Finding the most suitable algorithms to handle noise and missing values for medical datasets

• Improve complexity and diversity of medical datasets

Page 6: Discovering Patterns in Adverse Drug Reactions

6

• The aim of the research was to use data mining methods in an attempt to produce relevant results from real world medical data.

• The following research questions were answered

(1) Is it possible to discover patterns in spares datasets?

(2) What patterns can be identified through data mining for ADRs?

Research Questions

Page 7: Discovering Patterns in Adverse Drug Reactions

7

• Decision Tree, Logistic programs, K nearest neighbour and Bayesian classifier techniques have been applied to medical datasets (Laverac 1999).

• Lee et al(2000) states that techniques that easily extract specific knowledge are the key for medical decision.

• A study on drug discovery showed that neural networks performed better then logistic regression, but decision tree performed better in identifying active compounds (Obenshain 2004).

Literature review (techniques)

Page 8: Discovering Patterns in Adverse Drug Reactions

8

• Medical data mining applications that is expected to discover new knowledge should follow a five stage process model (Wang 2000).

• planning tasks • developing data mining hypotheses• preparing data• selecting data mining tools• evaluating data mining results.

• Cios & Moore 2002 state that for success you need to follow the DMKD that adds several steps to the CRISP-DM model that has been applied to several medical problem domains.

Literature review (process model)

Page 9: Discovering Patterns in Adverse Drug Reactions

9

• Brown & Kros (2003) focused on the impact of missing data and how existing methods can help.

They categories methods for dealing with missing data into:

• Use complete data only• Delete selected case or variables• Data imputation• Model-based approaches

• Some researchers have focused on data cleansing tools to help eliminate noise but this can only achieve a reasonable result (Zhu & Wu 2004).

Literature review (problems with medical datasets)

Page 10: Discovering Patterns in Adverse Drug Reactions

10

• (Zhu & Wu 2004). Attribute noise is more difficult to handle and include:

• (1) Incorrect attribute values• (2) Missing or don’t know attribute values• (3) Incomplete attributes or don’t care values

Literature review

Page 11: Discovering Patterns in Adverse Drug Reactions

11

• The project used the data mining method of CRISP_DM six step data mining process

• Understand the main aim of the project• Understand the dataset

ADRDATE Agedays BRAND DRUG ID Prob ROUTE Recov Severity URNO ATC

31/01/2007 Lyclear Permethrin 707 Cert Topical Rec Minor unknown P03AC04

9/06/2003 14367 Tegretol CR Carbamazepine 4 Cert Oral Rec ax6cx8z N03AF01

11/06/2003 1 4173 Zoloft Sertraline 5 Unc Oral

ax66486 N06AB06

Data Mining Processing

Page 12: Discovering Patterns in Adverse Drug Reactions

Data mining Process

12

ADRDATE ADEDAYS ROUTE RECOV ATCMissing valuesUnknown

0 1 570 344

188

191

NRREC

82657

Summary of missing values

Total 1286 records

Page 13: Discovering Patterns in Adverse Drug Reactions

13

• Data .csv format

• R programming language

• Rattle tool for data mining• Data preparation

• Remove duplicates• Correct misspelled words• Correct meanings of values• Find missing ATC values (Anatomical Therapeutic

Chemical) • Leave missing values for rest of dataset

Data Mining Process

Page 14: Discovering Patterns in Adverse Drug Reactions

14

• Data transformation• Date when the patient was admitted to hospital for ADRs

(October-March =1, April-September = 0)• How old the patient is categorised into equal number of

records.(0-2 years old = 1, 2-5 years old = 2, 5-11 years old = 3, 11-16 years old = 4, and above 16 years of age = 5)

• The administration of the medication that caused the ADR is either oral or intravenous.(Oral = 1, Intravenous = 0)

• Recovered from ADRs or not.(Recovered = 0, Not recovered = 1)

• The drugs given to the patient either are antibiotics or not.(Antibiotics =1, Not Antibiotics =0)

Data mining Process

Page 15: Discovering Patterns in Adverse Drug Reactions

15

Data Mining ProcessingADRDATE AGE

AGE

RECOV ATC

ROUTEROUTE

ROUTE

Page 16: Discovering Patterns in Adverse Drug Reactions

16

• Modelling phase• Logistic regression,• Decision tree,• Risk pattern algorithm

• Evaluation Phase• Deployment

Data Mining Process

Page 17: Discovering Patterns in Adverse Drug Reactions

17

• Results for the logistic regression technique Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -1.901353 0.466304 -4.077 4.55e-05 ***

ADRDATE 0.136312 0.285722 0.477 0.633

AGEDAYS 0.002067 0.115482 0.018 0.986

ROUTE 0.059532 0.290016 0.205 0.837

ANTIBIOTICS -0.181255 0.300150 -0.604 0.546

Results

Page 18: Discovering Patterns in Adverse Drug Reactions

18

• Decision Tree Result1) root 1035 473 1 (0.4570048 0.5429952)

2) AGE>=3.5 407 140 0 (0.6560197 0.3439803)

4) ADRDATE< 0.5 203 61 0 (0.6995074 0.3004926) *

5) ADRDATE>=0.5 204 79 0 (0.6127451 0.3872549)

10) AGE>=4.5 100 35 0 (0.6500000 0.3500000)

20) ROUTE>=0.5 79 27 0 (0.6582278 0.3417722) *

21) ROUTE< 0.5 21 8 0 (0.6190476 0.3809524)

42) RECOV=Yes 18 6 0 (0.6666667 0.3333333) *

43) RECOV=NO 3 1 1 (0.3333333 0.6666667) *

Results

Page 19: Discovering Patterns in Adverse Drug Reactions

19

• Decision Tree Result

11) AGE< 4.5 104 44 0 (0.5769231 0.4230769)

22) ROUTE< 0.5 77 30 0 (0.6103896 0.3896104) *

23) ROUTE>=0.5 27 13 1 (0.4814815 0.5185185) *

3) AGE< 3.5 628 206 1 (0.3280255 0.6719745)

6) ROUTE< 0.5 236 109 1 (0.4618644 0.5381356)

12) RECOV=NO 24 6 0 (0.7500000 0.2500000)

Results

Page 20: Discovering Patterns in Adverse Drug Reactions

20

• Risk patterns for NO1 3 3.0324 2.4852 26 9 7 ADRDATE 1 A GEDAYS 3 ANTIBIOTICS 0

2 2 3.1002 2.5582 62 46 16 AGEDAYS 3 ANTIBIOTICS 0

3 3 2.5663 2.1904 25 9 6 ADRDATE 1 AGEDAYS 4 ROUTE 1

4 3 2.5375 2.1757 34 26 8 AGEDAYS 4 ROUTE 1 ANTIBIOTICS 0

• Pattern 1 where Risk Ratio = 2.48• Agedays = between 5-11 years old• Adrdate = months between October – March• Antibiotics = No

Results

Page 21: Discovering Patterns in Adverse Drug Reactions

21

• Building a data mining process to answer the problem posed.

• Use algorithms that work for medical applications• Noise and missing values does pose a problem but

reasonable results can still be achieved.• More relevant patterns can be produced for medical

experts if maximum information is included in the dataset.

Conclusion

Page 22: Discovering Patterns in Adverse Drug Reactions

22

 

• Brown, ML & Kros, JF 2003, 'Data mining and the impact of missing data', Industrial Management & Data Systems, vol. 103, pp. 611-621. 

• Cios, K 2002, 'Uniqueness of medical data mining', Artificial intelligence in medicine, vol. 26, no. 1-2, pp. 1-24.

• CRISP_DM 2000, Cross Industry Standard Process for Data Mining, viewed 27 August 2008, <http://www.crisp-dm.org/Partners/index.htm>.

• Li, J, Fe, AW-c, He, H, Chen, J, Jin, H, McAullay, D, Williams, G, Sparks, R & Kelman, C 2005, Mining risk patterns in medical data, ACM, Chicago, Illinois, USA.

• Lavrač, N 1999, 'Selected techniques for data mining in medicine', Artificial intelligence in medicine, vol. 16, no. 1, pp. 3-23.

• Lee, I-N, Liao, S-C & Embrechts, M 2000, 'Data mining techniques applied to medical information', Medical Informatics & the Internet in Medicine, vol. 25, no. 2, pp. 81-102.

• Obenshain, MK 2004, ‘Application of Data Mining Techniques to Healthcare Data’, Infection Control and Hospital Epidemiology, vol.25, no 8, pp. 690-695.

• Safety of Medicines 2002, A Guide to Detecting and Reporting Adverse DrugReaction Why Health Professionals Need to Take Action, WHO publications, viewed 15 April 2008, http://whqlibdoc.who.int/hq/2002/WHO_EDM_QSM_2002.2.pdf>.

• Wang, H & Wang, S 2008, 'Medical knowledge acquisition through data mining', paper presented at the IT in Medicine and Education, 2008. ITME 2008. IEEE International Symposium on, Xiamen

• Zhu, X, Khoshgoftaar, T, Davidson, I & Zhang, S 2007, 'Editorial: Special issue on mining low-quality data', Knowledge and Information Systems, vol. 11, no. 2, pp. 131-136.

Reference