using large data sets to study factors associated with the incidence of multiple sclerosis. tamah...

Post on 25-Dec-2015

219 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Using large data sets to study factors associated

with the incidence of multiple sclerosis.

Tamah Fridman

David Glick

John Kidd

Multiple Sclerosis (MS)

• A complex autoimmune disease with both acute and chronic phases.

• Confounding factors include:o genetic background o viral infections including EBV and HSV o nutritional factors o environmental factors such as latitude and

smoking

Multiple Sclerosis (MS)

• In a more general way, this module could be used to explore the difference between correlation and causation.

• For use in a course, the instructor will supply appropriate background information on the immune response as applied to MS.

Multiple Sclerosis (MS)

• There is a vast literature examining the effects of o geography omigration o infectious diseases o sunlight related to vitamin D levelso cigarette smoking o diet o hormones

Multiple Sclerosis (MS)

• Over time a number of data sets have been published that explore relationships between environmental factors and MS.

• Many of these are single studies that were later included in one or more “meta-analysis” articles.

• In addition, there are incidence statistics available from a variety of sources such as CDC, World Life Expectancy.com, WHO, and others.

Multiple Sclerosis (MS)

• In order to demonstrate the module’s potential, we have constructed several examples of analysis using a variety of techniques linking MS incidence to rainfall and viral diseases via:o A GIS ploto A scatter plot o 3-D Principle Component Analysis (PCA)

• These are based on the same data to demonstrate that large data sets can be visualized and analyzed in a variety of ways.

Multiple Sclerosis (MS)

Multiple Sclerosis (MS)

• The Excel function “Correl” was used to look for correlations with MS rates and a series of viral diseases and a “lifestyle” disease. o Hepatitis C: -0.0152o Cervical cancer: -0.34991o Liver cancer: -0.25501o HIV: -0.1451o Lung cancer: 0.547928

Multiple Sclerosis (MS)

Country ms rate Hep C rate cerv ca rate liv ca rate HIV rate lung ca rate

Afghanistan 0.4 3.8 2.6 3.8 0 7.2

Albania 2.8 0.1 1.5 6.7 0.2 31

Algeria 0.1 0.1 3.4 1.3 2 10.6

Andorra 0.4 0.6 0.8 4.9 0 21.6

Angola 0.2 1 12.5 9.6 79.2 2.3

Antigua/Bar. 0 0 5.4 5.2 19.7 8.3

This slide is a sample—the complete spreadsheet contains 192 countries.

Multiple Sclerosis (MS)

• The above spreadsheet data were also used to construct scatter plots of MS v Hepatitis C (a viral disease) and also v Lung Cancer (an environmental/lifestyle disease). These plots follow.

Multiple Sclerosis (MS)

0 1 2 3 4 5 60

0.5

1

1.5

2

2.5

3

f(x) = − 0.0482397115134393 x + 0.310970668054575R² = 0.0110671587093064

ms rate (Y) versus Hep C rate (X)

ms rateLinear (ms rate)

Multiple Sclerosis (MS)

0 10 20 30 40 50 600

0.5

1

1.5

2

2.5

3

f(x) = 0.017306523346616 x + 0.0283145007695525R² = 0.300225342134711

ms rate (Y) versus lung cancer rate (X)

ms rateLinear (ms rate)

Multiple Sclerosis (MS)

• The complete Excel spreadsheet was also used in Principal Component Analysis (PCA).

• The data were saved in a tab delimited format and then imported into the NIA Array Analysis Tool for Principle Component Analysis.

• The results are password protected on this site: http://lgsun.grc.nia.nih.gov/ANOVA/index.html

Multiple Sclerosis (MS)• As something completely different, meta-

analysis data were extracted into Excel, transformed into a PGPLOT, and a Fortran program was written to analyze and display these data.

• A great deal of difficulty was encountered fitting disparate data points into congruent categories, so the following graph are shown with some reservation.

• However, students “inventing” their own analysis can be expected to encounter similar problems.

Multiple Sclerosis (MS)

Multiple Sclerosis (MS)

Multiple Sclerosis (MS)

• We are deeply indebted to: • Ileana Betancourt and Colleen McLinn for

help with GIS • Jeff Lutgen and Bruce Wiggins for help

with Excel.

top related