Sample Size and Power - Vanderbilt University

Download Report

Transcript Sample Size and Power - Vanderbilt University

Case studies in biostatistics

Bonnie LaFleur Department of Biostatistics [email protected]

Outline

    Miscellaneous review of graphics and data collection/display.

Paper 1: Enhanced tumor formation in cyclin D1 x transforming growth factor beta1 double transgenic mice with characterization by magnetic resonance imaging.

Cancer Res. 2004 Feb 15;64(4):1315-22.

Paper 2: Neuroblastomas of infancy exhibit a characteristic ganglioside pattern. Cancer 2001 Feb. 15; 91(4): 785-793.

Paper 3: 56: 1486-1495 MeCP2 mutations in children with and without the phenotype of Rett syndrome. Neurology 2001; .

Bar graphs

 Useful for counts or proportions, not for means  Need to make sure that the standard error, if shown, is the correct standard error for proportions, and whether or not standard error or standard deviation is what you want to show.

Example

 Percent of type 1 in each group is 25, 27.4, 73  What is the standard error?

By definition the standard error is a way to express how close to the real value we are getting using a random sample instead of the whole population.

100 90 80 70 60 50 40 30 20 10 0 Group Group Group 1 2 3 Type 1 Type 2

Standard error

se

p

( 1 

p

)

N

  Can see that this is dependent on N What does this mean for our example?

Back to our example

 For our example (20.4) we calculate the standard error for two different sample sizes

se

(

p

)  ( 0 .

204 )( 0 .

796 ) 4  0 .

201  20 .

1 % 

se

(

p

)  ( 0 .

204 )( 0 .

796 ) / 20  0 .

090  9 % So, our estimate of the true percentage has lower sampling fluctuation with higher sample sizes

So what does this mean

 The main use of standard errors, from a statistical sense, is to calculate 95% confidence intervals for our estimate: p ± 1.96(se)   For n=4: (-19%, 60%) For n=20: (3%, 38%)

Why did I show this?

   Bar charts should be used for proportions (or percentages) or counts … not means Correct standard error bars need to be shown, if at all (show standard deviation instead), MUCH more important to include sample sizes with bar charts than either standard error or standard deviation, since once p and the sample size are given the standard deviation and/or standard error are easily calculated.

Like this

N=568 N=574 N=522 NIH Sponsered Human Studies Non-Gender Specific Non-Gender Specific Including Women Analyzed by Gender N=568 1.0

0.8

0.6

0.4

0.2

0.0

1993 1995 YEAR 1997 1998

Box plots - for continuous data

   Dot here is the median (can also include the mean as a bar) Ends of the “ box ” the 1 st and 3 rd quartiles are “ hinges ” are the interquartile range, 1.5 x quartiles (never exceed the data)

What can sometimes happen

Alternative type of plot

Plots to display multiple events over time

Dot Plots

Data: Things to avoid when creating a dataset to be used in statistical packages      Character variables must be in the same case and consistent Don ’ t mix characters with data that should be numeric Date formats should be consistent No summary computations in middle of spreadsheet Differentiate between missing values and “ zero ’ s ” , “ below detection ” , etc.

Date of Blood CD4 CD4 % HIV VL

1/14/1998 10/16/1998 7/15/1997 10/14/1997 3/3/1998 12/14/1998 8/17/1999 2/7/2000 1/23/1997 6/16/1997 1/20/1998 2/3/1998 9/15/1998 5/16/2000 11/25/1997 4/28/1998 11/24/1998 4/14/1999 1/10/2000 3/26/1997 10-6-98 (lab date) 431 2 627 0 759 829 829 589 430 1736 1061 897 841 842 28 <10000 950 942 1966 1997 920 1462 not CHIP Patient 28 27 39 42 34 35 <20 20670 86569 <400 20-50 <20 0 31.9

36 36 32 25 53 42 32 39 47 133585 <400 452 452 4471 2885 6400 40500 26,310 72,617 <20 37 1 1120 315703 (9-16-98)

HAART start date VZV ser CMV ser VZV RCF

03/03/98 03/03/98 03/03/98 03/03/98 03/03/98 03/04/98 9/15/98 9/15/98 9/15/98 9/15/98 no therapy 12/21/98 12/21/98 12/21/98 12/21/98 12/21/99 no HAART pos pos pos.

pos.

pos pos pos pos pos pos pos pos pos pos pos pos.

pos pos pos pos pos pos pos pos pos pos pos pos pos pos pos neg neg.

neg.

neg.

neg 1.3

<1 3.8

ND 6.2

<1 5.18

>8 <1 >8 5.5

>8 5.4

<1 >8 3.7

<1 ND 6.2

>8

CMV RCF

ND ND <1 1.5

1 <1 <1 1.85

ND ND ND ND <1 <1 ND ND ND ND ND <1

Mean vs. DTaP Hep B 9 10 11 1 2 3 4 5 6 7 8 birth PT 18 9 16 35 14 8 5 6 5 49 63 6 M PT 34 17 29 33 8 16 32 11 18 14 4 7 M PT 30 15 52 68 27 14 68 10 15 19 6 12 13 14 DTaP 9 10 11 1 2 3 4 5 6 7 8 birth PT 9 9 9 20 11 4 7 9 5 10 17 6 M PT 4 3 2 4 7 17 48 8 11 30 27 29 8 7 11 10 29 7 M PT 57 10 23 31 64 25 9 10 8 32 32 4 9 27 20.72727

0.069793

19.63636

0.665387

29.45455

0.552718

0.142732

8.5

17.57143

24.35714

Paper #1

 Basic question was whether cyclin D1/TGF  1 double transgenic mice are different from cyclin D1 single transgenic mice on a variety of outcomes: • Tumor incidence • • • Tumor multiplicity Tumor burden Cellular and molecular changes

For tests regarding histologic/cellular changes

The data were categorical, plus there were some zero (and very small) cell counts so we had to use nonparametric tests.

block 1231 age 1243 1239 1247 1259 1024 1251 1034 1235 1026 1263 1223 1019 1015 1017 1028 1='Wild Type' 2='Alb-TGFB' 3='LFABP-Cyclin D1' 4='Double Transgenic' 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 type 1 1 1 3 4 4 4 4 1 2 2 2 2 3 3 3 score 3 4 6 6 6 3 1 1 1 4 1 1 2 0 1 2 cytomegaly nuclear 0 0 1 0 0 0 DPM 1 1 2 1 2 1 0 0 1 2 1 0 1 1 0 2 0 2 1 0 0 0 1 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 nodularity lesions 0 1 1 1 1 1 0 1 0 1 1 1 3 2 3 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0

type cytomegaly Frequency ‚ Percent ‚ Row Pct ‚ Col Pct ‚ 0‚ 1‚ 2‚ Total ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Wild Type ‚ 2 ‚ 2 ‚ 0 ‚ 4 ‚ 12.50 ‚ 12.50 ‚ 0.00 ‚ 25.00 ‚ 50.00 ‚ 50.00 ‚ 0.00 ‚ ‚ 40.00 ‚ 25.00 ‚ 0.00 ‚ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Alb-TGFB ‚ 2 ‚ 1 ‚ 1 ‚ 4 ‚ 12.50 ‚ 6.25 ‚ 6.25 ‚ 25.00 ‚ 50.00 ‚ 25.00 ‚ 25.00 ‚ ‚ 40.00 ‚ 12.50 ‚ 33.33 ‚ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ LFABP-Cyclin D1 ‚ 1 ‚ 3 ‚ 0 ‚ 4 ‚ 6.25 ‚ 18.75 ‚ 0.00 ‚ 25.00 ‚ 25.00 ‚ 75.00 ‚ 0.00 ‚ ‚ 20.00 ‚ 37.50 ‚ 0.00 ‚ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Double Transgeni ‚ 0 ‚ 2 ‚ 2 ‚ 4 c ‚ 0.00 ‚ 12.50 ‚ 12.50 ‚ 25.00 ‚ 0.00 ‚ 50.00 ‚ 50.00 ‚ ‚ 0.00 ‚ 25.00 ‚ 66.67 ‚ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 5 8 3 16 31.25 50.00 18.75 100.00

Statistics for Table of type by cytomegaly Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 6 6.8667 0.3334 Likelihood Ratio Chi-Square 6 8.8589 0.1817 Mantel-Haenszel Chi-Square 1 3.4839 0.0620 Phi Coefficient 0.6551 Contingency Coefficient 0.5480 Cramer's V 0.4632 WARNING: 100% of the cells have expected counts less than 5. Chi-Square may not be a valid test. Statistics for Table of type by cytomegaly Fisher's Exact Test ______________________________________ Table Probability (P) 0.0024

Pr <= P 0.4390

Examine tumor volume

   First, we graphically examined the data Note that the times are not equal for the two groups (or for any two samples) We grouped into time intervals for the analysis • • • • 0-20 days 21-40 days 41-60 days > 60 days

Examine tumor volume

   First, we graphically examined the data Note that the times are not equal for the two groups (or for any two samples) We grouped into time intervals for the analysis • • • • 0-20 days 21-40 days 41-60 days > 60 days

Obs vol mouse tumno group day 1 993.65 B119 1 2 1 2 1017.52 B119 2 2 1 3 921.01 B119 1 2 11 4 878.19 B119 2 2 11 5 131.29 B120 1 2 1 6 248.32 B120 1 2 22 7 312.06 B120 1 2 37 8 1611.53 BH130 1 2 1 9 1447.34 BH130 1 2 16 10 1474.50 BH130 2 2 16 11 685.10 BH130 1 2 35 12 59.63 F4410 1 2 1 13 185.63 F4410 1 2 59 14 102.95 F4410 2 2 59 15 348.20 F4410 3 2 59 16 32.50 F4411 1 2 1 17 52.35 F4411 2 2 1 18 322.24 F4411 1 2 43 19 279.02 F4411 2 2 43 20 108.44 F446 1 2 1 21 14.80 F446 2 2 1 22 52.38 F446 1 2 43 23 363.50 F446 1 2 101 24 490.00 F446 2 2 101

Examine tumor volume

   First, we graphically examined the data Note that the times are not equal for the two groups (or for any two samples) We grouped into time intervals for the analysis • • • • 0-20 days 21-40 days 41-60 days > 60 days

Statistics

 We then used an analysis that accounts for repeated measures on a single mouse, and looked at the difference over time Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F daygp 3 12 0.42 0.7422

group 1 7 1.36 0.2812

group*daygp 3 39 0.96 0.4233

Findings (based on the specific analyses I show here)

  There is no difference in genotype and cytomegaly (though if we use a sum of all the tumor histopathology variable scores we do see a difference between the double transgenic group and all the other genotypes).

There was no difference in tumor volume between the double transgenic and the Cyclin D1 genotype.

Paper 2, ganglioside pattern

   In typical embryonic development ganglioside expression shifts from the fetal b pathway to the adult a pathway.

Neuroblastomas in infants is different (biologically and clinically) than those found in older children The main question is whether the ganglioside pathway is different between these two types of neuroblastomas.

Data

 68 confirmed neuroblastoma samples that were either diagnosed by urinary HVA and VMA at either 3 weeks or 6 months of age (n=25), or presented clinically during study period (n=43). Information was collected on age at sample, time until disease progression, stage, and some other clinical information that was not discussed in this paper.

First, lets look at the plot of the data that looks at the % of b pathway gangliosides

Why nonparametric?

  We probably could have used a t test (comparing two normal means) or analysis of variance (comparing more than two normal means) But, there was some indication that the distributions of these % b gangliosides was non-normal, so we decided to use the Wilcoxon-rank sum test.

Event free survival

  Survival analysis is used to compare “ time-to-event ” between groups. In this case we are looking at time until some clinical adverse event.

We need to use specialized statistic tests because we have “ censoring up until the end of study. ” in the data. Censoring is when you have incomplete data due to loss-to-follow-up or no event

> 60% B Predominance < 60 % B Predominance 0 10 20 30 40 50 60 Time (months) 70 80 90 100 110 120

Results

Test of Equality over Strata Pr > Test Chi-Square DF Chi-Square Log-Rank 7.4102 1 0.0065 Wilcoxon 6.1856 1 0.0129

Findings

 The distribution in % b pathway ganglioside production is different in children ≥ 1 year of age that present clinically compared with group that is screened (3 weeks or 6 months of age) This fit their paradigm that neuroblastomas in older children are different than younger children

Findings (continued)

 There is a difference in the event free survival distributions between those patients with ≥ 60% b pathway gangliosides and those with < 60 % b pathway gangliosides. The group with ≥ 60% b pathway gangliosides had longer event free survival.

Paper 3: MeCP2 mutations in Rett syndrome

 This study wanted to examine the association between MeCP2 gene mutations and Rett syndrome (a neurodevelopmental disorder)  More specifically, whether a particular pattern of mutation, X-inactivation, along with clinical features differ among mutation types

Type of mutation by clinical severity

   Here we are looking at 5 mutations • • • • • MBD nonsense Nonsense between MBD and TRD TRD nonsense TRD missense C-terminal deletions And scores of 5 clinical parameters (head growth, seizures, scoliosis and motor skills/ability to walk) The scores were all measured on an ordinal scale

REDUCED GENOTYPE and PHENOTYPE DATA Obs MUTATION HV MOTOR SIEZURE SCOL HCIRC TOT AGE SUBJECT 1 1 3 3 2.5 0 3.0 11.5 6.0 1 2 1 1 2 0.0 1 3.0 7.0 10.0 2 3 1 0 3 2.0 1 3.0 9.0 18.0 3 4 1 1 3 0.0 0 3.0 7.0 3.0 4 5 1 2 1 2.0 2 3.0 10.0 19.0 5 6 1 0 1 0.0 0 3.0 4.0 6.0 6 7 1 2 3 0.0 0 3.0 8.0 2.0 7 8 1 0 2 0.0 0 3.0 5.0 6.0 8 9 1 3 3 1.0 3 3.0 13.0 8.0 9 10 1 3 0 0.0 0 3.0 6.0 9.0 10 11 1 2 2 3.0 0 2.0 9.0 8.0 11 12 1 0 3 1.0 1 3.0 8.0 6.0 12 13 1 0 3 0.0 3 1.0 7.0 21.0 13 14 1 3 1 0.0 0 3.0 7.0 6.0 14 15 1 0 2 0.0 2 2.0 6.0 13.0 15 16 1 0 3 2.5 0 3.0 8.5 4.0 16 17 1 2 3 1.0 3 3.0 12.0 9.0 17 18 1 2 2 1.0 2 3.0 10.0 9.0 18 19 2 0 3 1.0 2 3.0 9.0 34.0 19 20 2 2 2 1.0 1 3.0 9.0 6.0 20 21 2 2 3 2.0 1 3.0 11.0 8.0 21 22 2 3 3 1.0 0 3.0 10.0 4.0 22 23 2 3 3 1.0 2 3.0 12.0 4.0 23 24 3 1 0 0.0 0 1.5 2.5 5.0 24

How we analyzed these data

  Since the severity scores were ordinal, we viewed them as continuous (and normally distributed) We used ANOVA and looked at differences between the mean scores for each of the mutation groups

Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept 1 1.4000 0.3298 0.7536 2.0464 18.02 <.0001 MUTATION 1 1 1.3778 0.3728 0.6470 2.1085 13.66 0.0002 MUTATION 2 1 1.6000 0.4664 0.6858 2.5142 11.77 0.0006 MUTATION 3 1 0.5286 0.4318 -0.3178 1.3750 1.50 0.2210 MUTATION 4 1 0.2250 0.4947 -0.7447 1.1947 0.21 0.6493 MUTATION 5 0 0.0000 0.0000 0.0000 0.0000 . . Scale 1 0.7375 0.0835 0.5907 0.9208 NOTE: The scale parameter was estimated by maximum likelihood. LR Statistics For Type 3 Analysis Chi- Source DF Square Pr > ChiSq MUTATION 4 18.94 0.0008

Contrast Results Chi- Contrast DF Square Pr > ChiSq Type 1 vs 2 1 0.35 0.5520 LR 1 vs 3 1 6.17 0.0130 LR 1 vs 4 1 7.27 0.0070 LR 1 vs 5 1 11.71 0.0006 LR 2 vs 3 1 5.72 0.0168 LR 2 vs 4 1 7.05 0.0079 LR 2 vs 5 1 10.28 0.0013 LR 3 vs 4 1 0.43 0.5125 LR 3 vs 5 1 1.47 0.2253 LR 4 vs 5 1 0.21 0.6497 LR

Analysis of covariance

   Is a combination of analysis of variance and regression The main aim is to see if the regression lines in two or more groups are different In this study, we wanted to see if two of the mutations differed in their regression of clinical severity and X-inactivation (% of one allele active); can be stated as the covariance of mutations on the regression of clinical severity on X-inactivation.

Main questions for analysis of covariance

  Is the straight line relationship between clinical score and severity the same for the two mutations (missense in MBD and nonsense between MBD and TRD versus TRD missense and nonsense and C terminal deletions)? Do the clinical severity scores for the two mutations differ after adjusting for X inactivation pattern?

Total Score by % X-Inactivation 4 3 2 1 0 14 13 12 11 10 9 8 7 6 5 40 P-value for intercepts < 0.0001

50 60 P-value for slope = 0.006

70 % X-INACTIVATION 80 90 Group 2 Group 1

What we found

  There was an a statistically significant difference in many of the mutations with respect to head circumference data as well as when a summary of all clinical features There was a statistically significant difference in clinical severity score between the two mutation groups, as well as a difference in slopes between severity score and x-inactivation between the two mutation groups Both of these findings confirmed, and described, MeCP2 mutations causative in Rhett syndrome

Thank you for your time

 Suggested readings • Creating More Effective Graphs by Naomi B. Robbins • Statistical Analysis and Data Display by Heiberger and Holland • Introduction to Biostatistics by Bernard Rosner