Statistics 102

Download Report

Transcript Statistics 102

Statistics 102

Outline

I. Sensitivity and Specificity/Likelihood Ratios II. Statistical significance for group data III. Statistical significance for correlational data IV. Non-inferiority trials V. Linear regression VI. Logistic regression VII. Stepwise multivariate regression VIII. Type I and II errors/ Sample size estimates

I

. Sensitivity and Specificity

• • • • Sensitivity: true positives (proportion of individuals with the disease who test +; ranges from 0 to 1, or from 0% to 100%) 1-Sensitivity: false negatives (proportion of individuals with the disease who test -; ranges from 0 to 1, or 0% to 100%) – If sensitivity = 0.8 (80%), 1-sensitivity = 0.2 (20%) Specificity: true negatives (proportion of individuals without the disease who test -; ranges from 0 to 1, or from 0% to 100%) 1-Specificity: false positives (proportion of individuals without the disease who test +; ranges from 0 to 1, or 0% to 100%) – If specificity = 0.92 (92%), 1-specificity = 0.08 (8%)

Use of Sensitivity and 1-Specificity in Receiver Operating Curves (ROCs) and the Areas under the ROCs (the AUC)

• Plots sensitivity of the test (true + rate, TPR) on Y axis, from 0 to 1 vs. 1-specificity (false + rate, FPR) on X axis, from 0 to 1 at different test cutoffs • Perfect Classification: AUC=1 (area of a square with sides=1) • Random guess: AUC=0.5 (area of a triangle with base and height=1) • AUC between 0.5 and 1: Test is Better than a random guess • AUC between 0 and 0.5: Test is Worse than a random guess • AUC has a 95% CI • e.g., 0.78 (0.69-0.87)

ROCs with AUCs better than a random guess (between 0.5 and 1.0) ↓

= “sweet spot” cut off, a trade off between sensitivity and specificity

High cut off →→→→→→→→→→→→→→→ Low cut off

Additional terms that can be derived from sensitivity and specificity

• Likelihood ratios: does the test usefully change the probability (likelihood) of a disease or condition?

Positive

• likelihood ratio= true+/false + =sensitivity/1-specificity. The higher the + likelihood ratio, the more confident we are that the patient has the condition if the test is +. + LR can approach  .

N egative

• likelihood ratio = false-/true - = 1- sensitivity/ specificity. The lower the – likelihood ratios, the more confident we are that the patient does not have the condition if the test is -. – LR can approach 0.

Example 1: Use of + and - likelihood ratios

• • Your patient with COPD has an acute onset of worsening dyspnea. He had arthroscopic knee surgery 2 weeks ago. There is no leg swelling or leg pain, hemoptysis, personal or family history PE or DVT, or malignancy. You clinically assess the odds of him having a PE as 50:50, or equally likely that he had a PE as that he did not have a PE.

If ordered and performed, how would the results of a CT angiogram (CTA) of the pulmonary arteries change your estimated likelihood of PE in this patient? In other words, how good is CTA in helping you diagnose or exclude a PE in this patient?

Example 1, continued

Literature

(Annals Internal Medicine 136: 286-287, 2002): CTA and pulmonary angiography (gold standard) were performed in 250 patients with possible PE. 50 (20%) of the patients had PE on pulmonary angiography. 200 had no PE on angiography.

Results:

CTA+ PE on pulm angio (n=50) 35 No PE on pulm angio (n=200) 2 CTA 15 198

Example 1, continued

Likelihood ratio (LR) calculations:

CTA sensitivity (true +)= 35/50 (.70) , or 70% 1-sensitivity (false - )= 15/50 (.30) CTA specificity (true - )= 198/200 (.99), or 99% 1-specificity (false + )=2/200 (.01) +LR = sensitivity/1-specificity = true+/false+ = .70/.01=70 (PE 70 x as likely as before test).

-LR = 1-sensitivity /specificity= false-/true- = .30/.99=.303 (PE .3 x as likely as before test)

II.

Are measured group differences in variables or outcomes statistically significant? Which test(s) to use?

If data are normally distributed:

• Use paired t (if each subject is his/her own control) 1 • Use unpaired t (group t) if there are two groups 2

If data are skewed (not normally distributed):

Is the variable a continuous one, such as age or PaO 2 ?

• Use Mann Whitney U, 3 , or • Use Wilcoxon’s sign rank 4 Is the variable a categorical one, such as gender or age > 65?

• Use Fisher’s exact, 5 , or • Use chi square test 6

If there >2 study groups:

Use analysis of variance (ANOVA) 7

III.

Are correlations between variables

statistically significant? What test(s) to use?

If the variables are normally distributed:

Use Pearson’s test 8 : Pearson’s r ranges from -1 to +1.

r  0 indicates no correlation.

P value depends both on r and N.

If the variables are skewed (not normally distributed):

Use Spearman’s test 9 : Spearman’s r ranges from -1 to +1 r  0 indicates no correlation.

P values depend both on r and N. P< 0.05 usually used.

80 70 60 50 40 30 20 10 0 0

Example 2: METABOLIC ALKALOSIS

Pearson’s r=0.97; P<0.0001

10 20 30

[HCO 3 ] (mEq/L)

40 50 60

IV. Non-Inferiority Tria

ls • A New Treatment Can Truly Be: Better (Superior) • A Trial Can Test Whether New is: Better (superior) • Essentially equal • Worse (Inferior) than the usual treatment • • Not better (non-superior)* Not worse (non-inferior) • Worse (Inferior)* * rarely done

Non-inferiority trials

• • • • • Non-inferiority trials are intended to show that the effect of a new treatment is not worse than that of an active control by more than a specified amount. – A little like a point spread in football. The non-inferiority margin (NIM) is chosen by the investigators before the study (a priori) and can be somewhat arbitrary. Study endpoints in non-inferiority trials can be efficacy or safety parameters or a combination of the two.

Study design may include 3 arms with placebo group (preferred) or 2 arms with only new and usual treatments (much less ideal, since no internal validation that new treatment is better than placebo) Delta (δ) is the measured difference (best estimate of the true difference) between the two active treatments. This δ will have a 95% CI. – Example 3: δ = -4% (95% CI, -9% to +1%)

Example 3: δ = -4% (95% CI, -9% to +1%), the control Rx being slightly better • • If the NIM had been chosen to be -10 % by the investigators a priori, using Example 3, the new drug would be shown to be ‘non inferior’ to the control, as -10 %, the NIM, was less that the 95% CI for δ, -9% to +1%.

(If the NIM had been chosen to be -5 % by the investigators a priori, using Example 3, the new drug would not be shown to be ‘non-inferior- to the control, as -5%, the NIM, fell in the 95% CI for δ, -9% to +1%. In this context, non inferiority not shown is the same as being inferior.)

V.

Linear regression

• Simple regression: y=mx+b – – – – one independent variable, x one dependent variable, y If x=0, y=b, the intercept b can be +, as shown, zero, or – • y, dep variable • • b • • ∆y /∆x= m, slope x, indep variable

METABOLIC ALKALOSIS

PaCO 2 =1.2[HCO 3 ]+5.8

80 70 60 50 40 30 20 10 0 0 10 20 30

X= [HCO 3 ] (mEq/L)

40 50 60

V.

Linear regression

• • Simple regression: y=mx+b – – – – one independent variable, x one dependent variable, y If x=0, y=b, the intercept b can be +, as shown, zero, or More complex regression: y=m 1 x 1 +m 2 x 2 +b – two independent variables – – – one dependent variable If x 1 =x 2 =0, y=b, the intercept b can be +, zero, or -

VI. Logistic regression. A popular method

• • • • • • • • A model predicting the probability of a dependent categorical outcome, such as death, using 2 or more patient-specific independent variables.

The logit, z, is the total contribution of ALL the patient-specific independent variables used in the model to predict the outcome, f(z), the dependent variable.

z=β 0 +β 1 x 1 +β 2 x 2 +…+ β n x n.

– β 0 = intercept – β 1 , β 2 , … β n are regression coefficients for x 1 ,x 2 , ... x n – If x 1 , x 2 , …,x n all =0 (the pt has no risk factors), z=β 0 =the risk of the dependent outcome (such as death) when no factors affecting risk are present.

. If β n > 0, then the variable, n, increases the risk of the outcome.

If β n <0, then the variable, n, reduces risk of the outcome.

Large β n means the variable, n, has a large influence on the outcome.

Small β n means the variable n has a small influence on the outcome f(z)= likelihood of outcome, such as death= e z /(e z +1)= 1/(1+e

-z

)

The logistic function is useful because it can input any z from +  to  whereas the output, f(z), will be confined to values between zero and 1.

Note: if z=0, f(z)=0.5, because 1/1+e -0 =1/1+1=1/2=0.5

Example 4. Logistic regression.

• • • • Three independent variables , x z=β 0 +β 1 x 1 +β 2 x 2 + β 3 x 3 1 , x 2 , and x 3 are studied to try to predict the 10-year death risk from heart disease. Using data obtained from a large study population, the following logistic regression model was derived to best fit the data: – x 1 = age in years above 50 (age is a continuous variable); β 1 = +2.0 – x 2 = sex, where 0 is male and 1 is female (gender is a categorical variable); β 2 = -1.0

– x 3 = blood cholesterol in mmol/L above 5 mmol/L (194 mg/dL); β 3 =+1.2

β 0 = -5.0. Risk of death=f(z)=1/(1+e -z ), where z (the logit)= -5.0+2.0 x

1 - 1.0 x 2 + 1.2 x 3

.

Thus, in a 50 y.o. female with a cholesterol of 5 mmol/L, z=  0 =-5.0 (see prev Fig)

Logistic regression.

• • • • • • • • Three independent variables , x z=β 0 +β 1 x 1 +β 2 x 2 + β 3 x 3 1 , x 2 , and x 3 are studied to try to predict the 10-year death risk from heart disease. Using data obtained from a large study population, the following logisitic regression model was derived to best fit the data: – – x 1 = age in years above 50 (age is a continuous variable); β 1 = +2.0 x 2 = sex, where 0 is male and 1 is female (gender is a categorical variable); β 2 = -1.0

– x 3 = blood cholesterol in mmol/L above 5 mmol/L (194 mg/dL); β 3 =+1.2

β 0 = -5.0. Risk of death=f(z)=1/(1+e -z ), where z (the logit)= -5.0+2.0 x

1 - 1.0 x 2 + 1.2 x 3

.

Thus, in a 50 y.o. female with a cholesterol of 5 mmol/L, z=  0 =-5.0

Example 4: What is the risk of death in the next 10 years from heart disease in a 50 year man with a blood cholesterol of 7 mmol/L (272 mg/dL)?

z= -5.0+2(50-50) -1(0)+1.2(7-5). Thus, z= -5.0+0+0+2.4= -2.6

Since z= -2.6 in this man, f(z)=his risk of 10-year death from heart disease =

1/(1+e -z

) = 1/(1+e

2.6

) = 0.07, a 7% 10-yr risk.

The 95% confidence intervals can also easily be calculated for f(z).

Ex. 4

VII.

Stepwise multivariate regression

• • • If several variables, INDIVIDUALLY, help to predict an outcome by univariate analysis, but these variables could be closely related to each other, stepwise multivariate analysis helps sort out independent contributions of the variables.

– e.g., blood pressure, BMI and type 2 DM EACH increase risk of MI This procedure is used primarily in regression modeling. At each step, after a new variable is added, a test is made to see if some variables can be deleted without appreciably increasing the discrepancy between the data and the regression model.

The procedure terminates when the measure is maximized or when the available improvement (by adding more variables) falls below some critical, predetermined value.

• • • •

Example 5. Stepwise multivariate regression.

Cohort of  300 outpatients with low serum TSH undergoing radioiodine uptake and scan.

Many, but not all, had thyroid disease (e.g., Grave’s). Numerous variables were examined to see which correlated with a normal uptake and scan result.

Three of the numerous variables examined predicted a normal uptake and scan : • If patient was using a statin: OR= 6.5 (95% CI, 2.9-14.6) • • If patient was a man: OR= 2.5 (95% CI, 1.3-4.5) If patient was > 45 years of age: OR= 2.0 (95% CI, 1.1-3.6) Which of these variables independently predicted a normal thyroid uptake and scan despite the low serum TSH?

– Is it statin use, being male, and/ or being older than predicts normal thyroid function if a patient has low serum TSH?

Example 5: Stepwise multivariate regression

Step 1: STATIN USE  2 = 21.8

P<0.001

Step 2: OLDER AGE  2 =8.5

P=0.004

Step 3: MALE GENDER  2 =3.9

Not significant from Yandell et al. Thyroid 2008; 18:1039-42.

VIII. Type 1(α) and Type 2 (β) Errors

Null Hypothesis

: there is no differences between two treatments tests Reject the null hypothesis Accept null hypothesis Correct decision (no error) Error Type 1 (  ) error (P), which can be large or small Correct decision (no error) Error Type 2 (  ) error, which can large or small

Choosing the size of

and

errors

• • • • The type 1 error, or  (also called P) is conventionally set at 0.05 (5%) – – i.e, chance of a type 1 error if the null hypothesis is rejected is < 5% – Can state “p<0.05” or give exact p value (e.g., p=0.001, or p=0.049) The type 2 error, or  , is often set at 2 to 4 times  (10%-20%) , or 0.10-0.20 – – – i.e., chance of making a type 2 error if the null hypothesis is accepted is 10-20% Power to detect a real difference (and thus to reject the null hypothesis ) = 1  smaller  larger  (e.g., 0.1), more power (.9) (e.g., 0.2), less power (.8) If a study is highly powered and the null hypothesis is accepted, the chance of there being a true difference is quite small.

If the study is under-powered and the null hypothesis is accepted, there can be little confidence that a true difference has been excluded.

Example 6: Use of α and β in sample size planning

A new antibiotic is developed for

C. difficile

. How many patients would be needed to be included in a phase 3 trial to be able to show that this new drug is superior to metronidazole? To answer this question, we need to know: 1. What is the expected success rate for metronidazole? [P 1 ] 2. What would be a clinically important and expected improvement in success rate (based on phase 1/2 studies) with the new drug? [P 2 ] 3. What should be the  study? (Recall: Power = 1 (type 1 error) and the   .) (type 2 error) for the

Sample size estimation, cont’d

• • • • P1 = 0.75 (metronidazole, based on literature) P2 = 0.90 (New Rx, based on small phase 1/2 trials)  = 0.05 (1 in 20)  = 0.10 (1 in 10). Power = 0.90 (9 in 10) Needed N 1 and N 2 patients in total = 158 per group (from Fleiss tables), or 316 If  10% drop out rate is expected, then 158+16=174 per group, or 348 patients in total would need be randomized.

(This sample size may necessitate a multi-center study to enroll sufficient patients during the proposed time frame.) Analyze data by intent-to-treat and by evaluable patients.