Analysis of Medical Data - Florida State University College of Medicine

Analysis of Medical Data

Research Perspective

Nancy B. Clark. M.Ed.

Director of Medical Informatics Education FSU College of Medicine Spring 2004

Objectives    Review statistical concepts to be on Step 1.

Determine what data exist relative to a clinical question or formal hypothesis   use IT to locate existing data sources identify and locate existing data sets   Within institution Outside institution Analyze, interpret and report findings  Select and use appropriate computer software: Excel, SPSS   Use software to perform simple statistical analysis and portray results graphically Interpret reports

Prerequisite Skills (Step 1 USMLE) • • • • • • • • Fundamental concepts of measurement Scales of measurement Distribution, central tendency, variability, probability Disease prevalence and incidence Disease outcomes (eg, fatality rates) Associations (correlation or covariance) Health impact (eg, risk differences and ratios) Sensitivity, specificity, predictive values

More Prerequisite Skills (Step 1 USMLE)  Fundamental concepts of hypothesis testing and statistical inference  Confidence intervals  Statistical significance and type I error  Statistical power and type II error

More Step 1 Topics  Fundamental concepts of study design       Types of experimental studies (eg, clinical trials, community intervention trials) Types of observational studies (eg, cohort, case control, cross-sectional, case series, community surveys) Sampling and sample size Subject selection and exposure allocation (eg, randomization, stratification, self- - selection, systematic assignment) Outcome assessment Internal and external validity

Scales of Measure    


– qualitative classification of equal value: gender, race, color, city


- qualitative classification which can be rank ordered: socioeconomic status of families


- Numerical or quantitative data: can be rank ordered and sizes compared : temperature


- interval data with absolute zero value: time or space

Distribution, Central Tendency… Mean

…Variability, Probability…  Mean  Median  Mode  Standard deviation  Statistical Significance p < .01

Confidence Interval

Statistical Significance Type I and Type II errors Null Hypothesis = H o H o True H o False Reject H o Type I error Correct decision Do Not Reject H o Correct decision Type II error

Statistics Online Textbook  The Statistics Homepage  .html

Disease Prevalence and Incidence  Prevalence  probability of disease in entire population at any point in time  2% of the population has diabetes  Incidence  probability that patient without disease develops disease during interval  0.2% or 2 per 1000 new cases per year

Sensitivity, Specificity  


= a / (a+c)


= d / (b+d)

Test is positive Test is negative Patients with disease Patients without disease

a c b d

Predictive Value    

Positive predictive value

= a / ( a+b)

Negative predictive value

= d / (c+d)

Post-test probability of disease given positive test

= a / (a+b)

Test is positive Post-test probability of disease given negative test

= c / (c+d)

Test is negative Patients with disease Patients without disease

a b c d

Good Resource Sen, Spc, PV  An Introduction to Information Mastery  ult.htm

 Diagnosis   Sensitivity and specificity Predictive values  Likelihood ratios  InfoRetriever  Calculators: Epidemiology, Diagnostic test

Fundamental Concepts of Study Design  Good Resource 

Epidemiology for the Uninitiated

 BMJ  Online Textbook 

Finding Health Statistics

Types of Health Statistics Questions  Fact lookups  Research  Presentations  Social and Policy indicators

Strategies for Finding Health Stats  Use Portal  Start at Internet site  Start with book or article

Internet Portals of Health Stats  Lists of links that provide starting points for browsing or searching  Keyword search in portal vs Google  General idea what you want  The Related Health Services Research Web Sites

 The NCHS portal:

Other Statistical Web Sites  CDC Data and Statistics

 FedStats Home Page   Compare these two U Michigan’s Statistical Resources on the WEB – HEALTH  What type of stats

Lexis-Nexis Statistical Universe  Subscription resource  Searches stat data  Subject List  Limit search  Reports or tables  http://web.lexis al+Universe

MMWR   Morbidity – illness Mortality – death   Disease Trends  Tables - searchable

Health Care Data  Healthcare Cost and Utilization Project  HCUPnet  Hospital discharges  Ambulatory service  Costs  Amount of care  By diagnosis and procedure  Surveys of hosp, physicians, nursing homes

Health Consequences  Costs to society, individuals  Cost from care  Costs of illness  Impact on infrastructure  HCFA=>CMS Health Accounts 


State and International Data  - Where Florida Health Data Resides  DOH Epidemiology  KFF State Health Facts Online  United Nations Statistics Division  World Health Organization Research Tools

Individual Datasets  EMR  Billing  CDCS  Customized data collection tools

Data Analysis

Selecting the Appropriate Software  Spreadsheet      Numerical (interval or ratio) data Sums Averages Standard deviations Simple charts and graphs  Statistical Software        Nominal or Ordinal data Comparisons of two+ groups Frequency tables Complicated charts and graphs Normal curves Class intervals Statistical significance

Spreadsheets  Excel  Pocket Excel

Data Tables  Field names at top  Each row is a record (sample)  Sorting whole table  By one column  By more than one column  Sorting individual sections

Descriptive Statistics  Distribution  frequency distribution  Histogram  Central tendency  Mean  Median  mode    Dispersion  Range  Standard deviation  Variance N Not P (inferential stats)

Central Tendency     Mean  =AVERAGE(b2:b1500) Median  =MEDIAN(A2:A7) Mode  =MODE(A2:A7) N  =COUNT(A2:A1500)  =COUNTBLANK(A2:B5)

Dispersion  Range  =MAX(A2:A60)- MIN(A2:A60)  Standard deviation  =STDEV(A2:A110)  Variance  =VAR(A2:A110)

Distribution  Frequency distribution  Not easy – use SPSS  FREQUENCY(data_array,bins_array)  Use help  Histogram  Bar chart of frequency table

Hands on experience  Analyze data in examples2.xls

Statistical Software Intro to SPSS

Statistical Software  SPSS  Provided by request/justification  Lab Computers  Start => Programs => SPSS for Windows => SPSS 11.0 for Windows

Start Screen   Don’t show this dialog in the future.


Open Breast Cancer Survival

Data View


Variables View

File Information    Utilities Menu File Info… Output window

Descriptive Statistics  Analyze Menu      Descriptive Statistics Frequencies Select Age ► Click


s button In Central Tendency  Mean, Median, Mode  In Dispersion  Standard Deviation, variance    In Percentile Values  Quartiles Continue OK

Graphing     Graphs Menu  Pie… Summary for Groups of cases Lymph Nodes ► OK

Histogram with Normal Curve     Graphs Menu  Histogram..

Select Age ► Check

Display Normal Curve


Simple Correlation Analysis         Age and Tumor Size Analyze Menu  Correlate…  Bivariate Select Age ► Select Pathological Tumor Size ► Check Pearson and Spearman – Two tailed OK Is there a correlation? Negative or Positive?

Is it statistically significant?

Save Output  Save on All Users drive  Under Nancy.clark

 SPSS Output Files  Name it your name: ie, KerryBachista.spo

Importing Data     From Excel, SAS, dBase, etc.

Variable names first row File Menu, Open  Data… Files of Type  Excel  Tutorial, Samples  Demo.exe

 Type in Labels    Pick Type of variable Enter Value Labels Etc.

SPSS Tutorials  In the Help Menu  On Informatics Web page  Books:  Statistics for Social & Health Research (Sage)  Argyrous, George  Statistics Applied to Clinical Trials (Klawer Academic Publishers)  Cleophas, Ton J., et al

