Transcript Document
Using large data sets to study factors associated with the incidence of multiple sclerosis. Tamah Fridman David Glick John Kidd Multiple Sclerosis (MS) • A complex autoimmune disease with both acute and chronic phases. • Confounding factors include: o genetic background o viral infections including EBV and HSV o nutritional factors o environmental factors such as latitude and smoking Multiple Sclerosis (MS) • In a more general way, this module could be used to explore the difference between correlation and causation. • For use in a course, the instructor will supply appropriate background information on the immune response as applied to MS. Multiple Sclerosis (MS) • There is a vast literature examining the effects of o geography o migration o infectious diseases o sunlight related to vitamin D levels o cigarette smoking o diet o hormones Multiple Sclerosis (MS) • Over time a number of data sets have been published that explore relationships between environmental factors and MS. • Many of these are single studies that were later included in one or more “meta-analysis” articles. • In addition, there are incidence statistics available from a variety of sources such as CDC, World Life Expectancy.com, WHO, and others. Multiple Sclerosis (MS) • In order to demonstrate the module’s potential, we have constructed several examples of analysis using a variety of techniques linking MS incidence to rainfall and viral diseases via: o A GIS plot o A scatter plot o 3-D Principle Component Analysis (PCA) • These are based on the same data to demonstrate that large data sets can be visualized and analyzed in a variety of ways. Multiple Sclerosis (MS) Multiple Sclerosis (MS) • Link to interactive ArcGIS plot: • http://arcgis.com/explorer/?open=2e7723 700ef942b7a5aa2f8cbd96a5fc&extent=37 882315.9514645,2989772.13723539,4414 4037.3085845,6061929.17807238 Multiple Sclerosis (MS) • The Excel function “Correl” was used to look for correlations with MS rates and a series of viral diseases and a “lifestyle” disease. o Hepatitis C: -0.0152 o Cervical cancer: -0.34991 o Liver cancer: -0.25501 o HIV: -0.1451 o Lung cancer: 0.547928 Multiple Sclerosis (MS) This slide is a sample—the complete spreadsheet contains 192 countries. Country ms rate Hep C rate cerv ca rate liv ca rate HIV rate lung ca rate Afghanistan 0.4 3.8 2.6 3.8 0 7.2 Albania 2.8 0.1 1.5 6.7 0.2 31 Algeria 0.1 0.1 3.4 1.3 2 10.6 Andorra 0.4 0.6 0.8 4.9 0 21.6 Angola 0.2 1 12.5 9.6 79.2 2.3 Antigua/Bar. 0 0 5.4 5.2 19.7 8.3 Multiple Sclerosis (MS) • The above spreadsheet data were also used to construct scatter plots of MS v Hepatitis C (a viral disease) and also v Lung Cancer (an environmental/lifestyle disease). These plots follow. Multiple Sclerosis (MS) ms rate (Y) versus Hep C rate (X) 3 2.5 y = -0.0482x + 0.311 R² = 0.0111 2 ms rate 1.5 Linear (ms rate) 1 0.5 0 0 1 2 3 4 5 6 Multiple Sclerosis (MS) ms rate (Y) versus lung cancer rate (X) 3 2.5 y = 0.0173x + 0.0283 R² = 0.3002 2 1.5 ms rate Linear (ms rate) 1 0.5 0 0 10 20 30 40 50 60 Multiple Sclerosis (MS) • The complete Excel spreadsheet was also used in Principal Component Analysis (PCA). • The data were saved in a tab delimited format and then imported into the NIA Array Analysis Tool for Principle Component Analysis. • The results are password protected on this site: http://lgsun.grc.nia.nih.gov/ANOVA/index.html Multiple Sclerosis (MS) • As something completely different, metaanalysis data were extracted into Excel, transformed into a PGPLOT, and a Fortran program was written to analyze and display these data. • A great deal of difficulty was encountered fitting disparate data points into congruent categories, so the following graph are shown with some reservation. • However, students “inventing” their own analysis can be expected to encounter similar problems. Multiple Sclerosis (MS) Multiple Sclerosis (MS) Multiple Sclerosis (MS) • We are deeply indebted to: • Ileana Betancourt and Colleen McLinn for help with GIS • Jeff Lutgen and Bruce Wiggins for help with Excel.