Transcript Document

MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for the mean 5. Hypothesis testing (1 sample t test)

Reading: Introduction to the Practice of Statistics: 1.3, 3.4, 5.2, 6.1-6.4 and 7.1

Standard deviation vs standard error for describing data Table 1. Characteristics of study subjects (n=35)

Variable Mean

Age (yrs) Height (cm) Weight (kg) Blood Cholesterol (mmol/l) 43.5 165.8 64.3 5.00

Standard Deviation

4.78

Standard Error

0.81 5.66 0.97 8.61 0.94 1.46 0.16

Importance of Normal Distribution* 1. Distributions of real data are often close to normal.

2. Mathematically easy to work with so many statistical tests are designed for normal (or close to normal) distributions).

3. If the mean and SD of a normal distribution are known, you can make quantitative predictions about the population. * also called Gaussian curve

Red bars = scores  6 Proportion = 0.303

Red area under the density cure are  6.

Proportion = 0.293

Cumulative proportion for value x is the proportion of all observations that are  x; this is the area to the left of the curve.

Mean = 64.5 inches SD = 2.5 inches “The 68-95-99.7 Rule”

The standard normal distribution is: a normal distribution with a mean of 0 and a SD of 1. Normal distributions can be transformed to standard normal distributions by the formula: where

X

is a score from the original normal distribution,

μ

is the mean of the original normal distribution, and σ is the standard deviation of original normal distribution. The standard normal distribution is sometimes called the z distribution.

Standardized Normal Distribution

Z-score A z score always reflects the number of standard deviations above or below the mean a particular score is. Ex. If a person scored 70 on a test with mean of 50 and SD of 10, then they scored 2 standard deviations above the mean. Converting the test scores to z scores, an X of 70 would be: So, a z score of 2 means the original score was 2 SD above the mean.

Z Scores -Provide a meaningful way to compare individuals from different normal distributions – on the same scale Ie. How many SD above or below the mean?

Eg, - bone density measures - growth charts – height of children at different ages “normalized” data

Quantile-Quantile (Q-Q) Plot QQ-plot shows the theoretical quantiles versus the empirical quantiles. If the distribution is “normal”, we should observe a straight line.

Rice Virtual Lab in Statistics http://onlinestatbook.com/rvls/ Hyperstat Online Section 5. Normal Distribution - theory

Sampling and Estimation

Populations and Samples Population: entire group of individuals that we want information about Sample: a part of the population that we actually examine in order to gather information Goal: to try to draw conclusions about the population from the sample

Sample Whole Population Inference Mean =  SD =  Sample Mean = x SD =

s

Parameter: - a number that describes the

population

- number is fixed but in practice we do not know its value (eg, μ) Statistic: - a number that describes a

sample (eg, x)

. - its value is known when we take a sample, but it can change from sample to sample. - often used to estimate an unknown

parameter .

Statistical inference

is the process by which we draw conclusions about the population from the results observed in a sample.

.

Two main methods used in inferential statistics: estimation and hypothesis testing. In estimation, the sample is used to estimate a

parameter

and a

confidence interval

about the estimate is constructed.

Random Sampling is Key!

- every individual in the population sampled must have a chance of being included in the sample - the choice of one subject does not influence the chance of other subjects being chosen - use a method of sampling in which chance alone operates - toss of a coin, draw from a hat - random number generators - random assignment in clinical trials results in randomly selected groups

Simple Random Sampling (SRS) - the chances for each individual in the population to be selected is equal - every possible sample an equal chance to be chosen Stratified Sampling - divide the population into strata - choose SRS in each stratum - combine these SRS to form full sample eg. Strata: prognostic factors in cancer patients; male/female, age - consult a statistician for more complex sampling

Sample mean (x) as an estimator of the population mean (  ) What would happen if we repeated the sample several times?

Sampling variability: - repeated samples from the same population will not have the same mean - depends partly on how variable the underlying population is and on the size of the sample selected

Sampling Distribution of X - the distribution of values taken by the mean (x) in all possible samples of the same size from the same population

1. Mean of sampling distribution of x =  2. SD of sampling distribution = called standard error of the mean 3. Shape of the sampling distribution is approximately a normal curve,

regardless of the shape of the population distribution

, provided n is large enough ( Central Limit Theorem)

Simulation of Sampling Distribution Central Limit Theorum Rice Virtual Lab in Statistics http://onlinestatbook.com/rvls/

Population: All MBP1010 students n=37   = 1.00 cup = 1.07 cups

Population One Randomly n=37 Selected Sample n=12   = 1.00

= 1.07

x = 0.875

s = 0.78

Population Sampling Distribution n=37 1000 repeats of n=12   = 1.00

= 1.07

Mean = 1.00

SD = 0.26

Population Sampling Distribution One Sample n=37 1000 repeats of n=12 n=12   = 1.00

= 1.07

Mean = 1.00

SD = 0.26

(SEM) x = 0.875

s = 0.78

SEM = 0.23

s/  n

Confidence Interval of the Mean

Standard Normal Distribution

95% Confidence Interval =0.025

-1.96

2.5 th = 0.95

1.96

97.5 th =0.025

95% Confidence Interval for a population mean If population  known (not realistic) Pr (-1.96  z  1.96) = 0.95

Pr (-1.96  x   /  n  1.96) = 0.95

Express x in standardized form:

z

statistic Pr (x -1.96

 /  n    x + 1.96

 /  n ) = 0.95

x - 1.96(

/

n) and x + 1.96(

/

n)

are the 95 percent confidence intervals on the population mean 

24 out of 25 samples included  (96%) In the long run, 95% of all samples will have an interval that includes  .

90% Confidence Interval =0.05

-1.645

5 th = 0.90

1.645

95 th =0.05

Confidence Interval for a population mean population  NOT known (usual) - use sample standard deviation (s) as an estimate of  - therefore,  /  n estimated from sample using: s/  n (standard error of the mean;SE) - SE of the sample is the estimate of the SD that would be obtained from the means of a large number of samples drawn from that population

Problem: Critical Ratio = x s/  n  is not normally distributed -need to consider reliability of both x and s as estimators of  and  respectively - shape of the distribution depends on the sample size

n

Therefore follows the s/ n

t

distribution

t

- distribution

-

a family of distributions indexed by the degrees of freedom (n-1) - degrees of freedom refer to number of independent quantities among a series of numerical quantities

Degrees of Freedom For SD: - there are

n

deviations around the mean - there is one restriction: sum of deviations = 0 - therefore once we have calculated n-1 deviations around the mean, the last number would be already determined as the sum must be 0 ( ie . not independent).

- for

n

deviatons around the mean there are

n-1

degrees of freedom (DF)

95% Confidence Interval for a population mean population  NOT known (usual) A sample consists of 25 mice with a mean tumor size of 2.1 cm and SD = 1.9 cm. x -

t

24,0.975 x s/  n, x +

t

24,0.975 x s/  n

t

24,0.975 = 2.064 (from tables of

t

dist) 2.1 - (2.064 x 1.9/  25), 2.1 + (2.064 x 1.9/  25)

= 1.32 , 2.88 cm

Confidence interval for a Mean Estimate of mean tumor size = 2.1 cm; n=25.

95% CI = 1.32 , 2.88 cm Interpretation:

- 95% of the intervals that could be constructed from repeated random samples of size 25 contain the true population mean  - we are 95% confident that the mean tumor size is between 1.32 and 2.88 cm.

Factors affecting the length of the confidence interval x 

t

n-1, .975 x s/  n s/  n = SE Sample size: as

n

increases, length of the CI decreases variation: as

s

, which reflects variability of the distribution of observations, increases, the length of the CI increases level of confidence: as the confidence desired increases (ie 90,95, 99% CI), the length of the CI increases.

Standard deviation vs standard error for describing data Table 1. Characteristics of study subjects (n=35)

Variable Mean Standard Deviation

4.78

Standard Error

0.81 Age (yrs) Height (cm) Weight (kg) Blood Cholesterol (mmol/l) 43.5 165.8 64.3 5.00 5.66 8.61 0.94 0.97 1.46 0.16

Standard deviation vs standard error for describing data If the purpose is to describe the

data

(eg. to see if subjects are typical): standard deviation - variability of the observations If the purpose is to describe the

results

(outcome) of the Study: standard error confidence interval - precision of the estimate of a population parameter Note: -can calculate one from the other - indicate clearly whether reporting SD or SE

What Formal Statistical Inference

Cannot

Do -tell you what population you should be interested in - ensure that you sampled properly from the population - determine whether measurements made are biased (systematically wrong) DOES: - give a quantitative indication of how much random variation may have affected your results

What/who are we trying to study?

Target Population

Patients with rheumatoid arthritis All voters

Population Sampled

Patients admitted to a particular hospital

Sample Studied

telephone listings Sample of sample of records of above listings above patients

Hypothesis Testing

Dietary fat intake in the low fat and control groups | 45 + | | | | | | 40 + | | | | | | | 35 + 0 | | 0 | | 0 +-----+ | | | 30 + | | | | *--+--* | | | | | | | | 25 + | | | | | +-----+ | | | | +-----+ | 20 + | | | | | | | | *--+--* | | | | | 15 + | | | | +-----+ | | | | 10 + | | | | | | 5 + Low Fat Control ------------+-----------+---------- GROUP 1 2

Blood HDL-cholesterol levels in the low fat and control groups (n=163 intervention and 199 control) | 2.6 + | | | | 2.4 + | | | | 0 | | | 2.2 + | | | | | | | | | | 2 + | | | | | | | | | | | 1.8 + | +-----+ | | | | | | | | | +-----+ | | 1.6 + | | | | | | | | + | | | | *-----* | *--+--* | | 1.4 + | | | | | | | | | | | | +-----+ | +-----+ | 1.2 + | | | | | | | | | | | 1 + | | | | | | | | | | 0.8 + ------------+-----------+---------- Low Fat Control

mean = 1684 kcal/day SD = 380.5 kcal/day

Examples of conclusions of hypothesis tests The mean intake of dietary fat is significantly lower in the low-fat group as compared to the control group (17.5 vs 28.3 percent energy from fat; p  0.001). (2 sample

t

test) Does the energy intake of women in a sample differ from the “recommended” level of 1850 kcal?

(1 sample

t

test)

Hypotheses - hypotheses stated in terms of the population parameters (true means) - null hypothesis: H o - statement of no effect or no difference - assess the strength of evidence against null hypothesis - alternative hypothesis: H a - what we expect/hope to see - Usually a 2 sided test

Control Intervention  c =  T X c vs X T

Overview of hypothesis testing Compute the probability of obtaining a difference as large or larger than the observed difference assuming that, in fact, there is no difference in the true means.

If the probability is not very small, we conclude that observing such a difference is plausible, even when true means are equal, I.e. the data do not provide evidence that true means are different.

if probability is very small, we conclude there is a difference between the means.

Significance tests answers the question: Is chance or sampling variation a likely explanation of the discrepancy between a sample results and the null hypothesis population value?

Yes: sample result is compatible with idea that sample is from population in which null hypothesis is true No: discrepancy unlikely due to chance variation - sample result is not compatible with idea that sample is from population in which null hypothesis is true

Steps in Hypothesis Testing 1. State hypothesis.

2. Specify the significance level.

3. Calculate the test statistic.

4. Determine

p

value.

5. State conclusion.

One Sample T test

One Sample T test: Energy intake in women For a sample of randomly selected 29 women: Mean energy intake = 1,684 kcal/day Standard deviation (s) = 380.5 kcal/day Does the energy intake of women in this study differ from the “recommended” level of 1850 kcal?

Example of energy intakes 1. State hypotheses: H o : the true mean energy intake of women in the trial is

not

different from 1,850 kcal/day H a : the true mean energy intake of women in the trial is different from 1,850 kcal/day Specific Notation: H

o

:  = 1,850 H

a

:   1,850 (2 sided)

2. Significance Level - how much evidence against H o to reject H o we require (determine in advance) - compare the

p

value with a fixed value that is considered decisive - this value is called significance level - denoted as  - commonly use  = 0.05

Significance Level  = 0.05

- require that the data give evidence against H o so strong that it would happen not more than 5% of the time (1 in 20), when H o is true.

 = 0.01

- require that the data give evidence against H o so strong that it would happen not more than 1% of the time (1 in 100), when H o is true.

3. Calculate the test statistic - test statistic measures compatibility between null hypothesis and the data - to assess how far the estimate is from parameter: standardize the estimate -

z

statistic (when  known) -

t

statistic (when  not known)

One Sample

t

test - use

t

distribution when population standard deviation (  ) not known To test hypothesis Ho:  of size

n

, compute the

t

=  o based on a SRS statistic: degrees of freedom = n-1

Step 3. Calculate test statistic.

Based on sample of 29 women: x = 1684 kcal/day; standard deviation (s) = 380.5 kcal/day

t

=

x

 s/  n = 1684 - 1850 380.5/  29 = -2.35

Determine the

p

value - probability of getting an outcome

as extreme or more extreme than the actually observed outcome

- extreme: far from would be expected if null hypothesis is true - smaller the

p

value, the stronger the evidence against the null hypothesis

Energy Intake in Women

t

= 1684 - 1850 380.5/  29 2 sided test: = -2.35

P(t  P(t  -2.35 or t  2.34) -2.35) = 0.0130

P

(t

 2.35) = 1 - 0.9870 = 0.0130

P

value = 2

P

( t  -2.35) = 0.026

Step 4. Determine p value.

p = 0.0130

t = -2.35

2 sided

p

= 0.026

p = 0.0130

t = 2.35

What does a “small”

p

value mean?

1. An unlikely event occurred (getting a large value for the test statistic by chance).

2. The null hypothesis is false.

P value for a 2 sided test: Probability of getting an outcome as extreme or more extreme than the actually observed outcome

in either direction,

if the null hypothesis is true.

Statistical Significance In the example:

p

value = 0.026

2.6% chance of observing a mean energy intake of 1684 kcal/day in a sample of women even if the true mean is not different from the recommended level of 1850 kcal/day.

What do we conclude?

Statistical Significance

p

value = 0.026

We reject the null hypothesis, H o.

The mean energy intake of women is significantly lower than the recommended intake (

p

< 0.05).

The mean energy intake of women is significantly lower than the recommended intake (

p

= 0.03).

(Significant at the 5% but not the 1% level)

Using R – One Sample t-test R code: t.test(energy.intake, mu=1850) R Output: One Sample t-test data: energy.intake t = -2.3493, df = 28, p-value = 0.02610

alternative hypothesis: true mean is not equal to 1850 95 percent confidence interval: 1539.260 1828.741 sample estimates: mean of x 1684.001

Statistical Significance If recommended level is 1750 kca/day; then p = 0.36.

36% chance of observing a mean energy intake of 1684 kcal/day in a sample of women even if the true mean is not different from the recommended level of 1750 kcal/day.

What do we conclude?

Statistical Significance

p

value = 0.36

We do not reject the null hypothesis, H o.

The data do not provide evidence that mean energy Intake of women is different from the recommended level.

The mean energy intake of women in the study is not significantly different from recommended level of 1750 kcal/day (

p

= 0.36).

H o :  Ha:  = 1850 < 1850 p = 0.0130

One sided test

Probability values for one-tailed tests are one half the value for two tailed tests

as long as the effect is in the specified direction.

One-sided vs two-sided tests - one sided tests are rarely justified - decide on appropriate test prior to experiment - Do not decide on a one-sided test after looking at the data eg.

p

value for 2 sided is 0.09

p

value for 1 sided is 0.045

If any doubt: choose 2 sided test!

General guidelines for stating significance If: 0.01

p

< 0.05

0.001

p

< 0.01

p

< 0.001

p

> 0.05

0.05 

p

< 0.10

results are: significant highly significant very highly significant not statistically significant (NS) trend towards statistical significance

Reporting actual p values A.

p

value = 0.0512 Conclude: result is NS, p > 0.05

If the effect is interesting and potentially important would probably want to: - repeat study - check power of study b.

p

value = 0.75

Conclude: result is NS, p > 0.05

- likely no effect

Comments/Cautions about hypothesis testing

Statistical vs clinical significance - look at the size of effect not just p value - look at confidence interval for parameter of interest - with a large sample size, a very small effect may be statistically significant

Exploratory data analysis vs hypothesis testing - exploratory data analysis is important - but cannot test a hypothesis on the same data that first suggested it - if report findings - clearly state - post hoc - need to design a new study to test the hypothesis

Relationship between confidence interval and p value

95% Confidence interval for a population mean A sample consists of 25 mice with a mean tumor size of 2.1 cm and SD = 1.9 cm. x -

t

24,0.975 x s/  n, x +

t

24,0.975 x s/  n

t

24,0.975 = 2.064 (from tables of

t

dist) 2.1 - (2.064 x 1.9/  25), 2.1 + (2.064 x 1.9/  25)

= 1.32 , 2.88 cm

CI and Hypothesis Test 95 % CI for mean tumor size = 1.32 , 2.88 cm H o :  = 2.9

Ha:   2.9

x s

= 2.1 cm = 1.9 cm

t

=

x

 s/  n = 2.1- 2.9 1.9/  25 = 2.105

p

= 0. 0459