using large data sets to study factors associated with the incidence of multiple sclerosis. tamah...
Post on 25-Dec-2015
219 Views
Preview:
TRANSCRIPT
Using large data sets to study factors associated
with the incidence of multiple sclerosis.
Tamah Fridman
David Glick
John Kidd
Multiple Sclerosis (MS)
• A complex autoimmune disease with both acute and chronic phases.
• Confounding factors include:o genetic background o viral infections including EBV and HSV o nutritional factors o environmental factors such as latitude and
smoking
Multiple Sclerosis (MS)
• In a more general way, this module could be used to explore the difference between correlation and causation.
• For use in a course, the instructor will supply appropriate background information on the immune response as applied to MS.
Multiple Sclerosis (MS)
• There is a vast literature examining the effects of o geography omigration o infectious diseases o sunlight related to vitamin D levelso cigarette smoking o diet o hormones
Multiple Sclerosis (MS)
• Over time a number of data sets have been published that explore relationships between environmental factors and MS.
• Many of these are single studies that were later included in one or more “meta-analysis” articles.
• In addition, there are incidence statistics available from a variety of sources such as CDC, World Life Expectancy.com, WHO, and others.
Multiple Sclerosis (MS)
• In order to demonstrate the module’s potential, we have constructed several examples of analysis using a variety of techniques linking MS incidence to rainfall and viral diseases via:o A GIS ploto A scatter plot o 3-D Principle Component Analysis (PCA)
• These are based on the same data to demonstrate that large data sets can be visualized and analyzed in a variety of ways.
Multiple Sclerosis (MS)
• Link to interactive ArcGIS plot:• http://arcgis.com/explorer/?open=2e7723700ef942b7a5aa2f8cbd96a5fc&extent=37882315.9514645,2989772.13723539,44144037.3085845,6061929.17807238
Multiple Sclerosis (MS)
• The Excel function “Correl” was used to look for correlations with MS rates and a series of viral diseases and a “lifestyle” disease. o Hepatitis C: -0.0152o Cervical cancer: -0.34991o Liver cancer: -0.25501o HIV: -0.1451o Lung cancer: 0.547928
Multiple Sclerosis (MS)
Country ms rate Hep C rate cerv ca rate liv ca rate HIV rate lung ca rate
Afghanistan 0.4 3.8 2.6 3.8 0 7.2
Albania 2.8 0.1 1.5 6.7 0.2 31
Algeria 0.1 0.1 3.4 1.3 2 10.6
Andorra 0.4 0.6 0.8 4.9 0 21.6
Angola 0.2 1 12.5 9.6 79.2 2.3
Antigua/Bar. 0 0 5.4 5.2 19.7 8.3
This slide is a sample—the complete spreadsheet contains 192 countries.
Multiple Sclerosis (MS)
• The above spreadsheet data were also used to construct scatter plots of MS v Hepatitis C (a viral disease) and also v Lung Cancer (an environmental/lifestyle disease). These plots follow.
Multiple Sclerosis (MS)
0 1 2 3 4 5 60
0.5
1
1.5
2
2.5
3
f(x) = − 0.0482397115134393 x + 0.310970668054575R² = 0.0110671587093064
ms rate (Y) versus Hep C rate (X)
ms rateLinear (ms rate)
Multiple Sclerosis (MS)
0 10 20 30 40 50 600
0.5
1
1.5
2
2.5
3
f(x) = 0.017306523346616 x + 0.0283145007695525R² = 0.300225342134711
ms rate (Y) versus lung cancer rate (X)
ms rateLinear (ms rate)
Multiple Sclerosis (MS)
• The complete Excel spreadsheet was also used in Principal Component Analysis (PCA).
• The data were saved in a tab delimited format and then imported into the NIA Array Analysis Tool for Principle Component Analysis.
• The results are password protected on this site: http://lgsun.grc.nia.nih.gov/ANOVA/index.html
Multiple Sclerosis (MS)• As something completely different, meta-
analysis data were extracted into Excel, transformed into a PGPLOT, and a Fortran program was written to analyze and display these data.
• A great deal of difficulty was encountered fitting disparate data points into congruent categories, so the following graph are shown with some reservation.
• However, students “inventing” their own analysis can be expected to encounter similar problems.
top related