Statistical Hypothesis Testing - Analytica Wiki

Download Report

Transcript Statistical Hypothesis Testing - Analytica Wiki

Statistical Hypothesis Testing

(8 th Session in “Gentle Introduction to Modeling Uncertainty”)

Lonnie Chrisman, Ph.D.

Lumina Decision Systems Analytica User Group 15 July 2010 Copyright © 2010 Lumina Decision Systems, Inc.

Scope of Today’s Webinar

• • •

Included:

Conceptual underpinnings of classical hypothesis testing.

Interpretation of statistical significance (p-values). General methodology for applying it in any scenario.

Intended to promote conceptual understanding.

Building on Monte Carlo tools.

Not included:

Standard canned hypothesis tests (like t-tests, etc) Copyright © 2010 Lumina Decision Systems, Inc.

Outline

• • • • • • • • • Motivating example Statistical significance The Statistic Methodology Modeling the Null hypothesis Computing the pValue Interpretation of results Drawbacks of methodology Additional exercise Copyright © 2010 Lumina Decision Systems, Inc.

• •

Does Stock Market Volatility Vary with Day of Week?

Random selected 100 trading days (from 2000-2010).

Computed day change (close-open)/open for S&P 500 index.

Day of week # samples Volatility

Mon Tue Wed Thu Fri Total volatility: 18.1% 20 20 20 20 20 19.4% 11.9% 21.5% 20.1% 14.3%

Side note: Annualized volatility := SDeviation * sqrt(T) where T = # trading days/yr = 250

• • Alice: “This shows that the market volatility does depend on the day of the week.” Bob: “No, the variation is just due to random sampling variation.” Copyright © 2010 Lumina Decision Systems, Inc.

Download Model with S&P Data

• Please download: “Hypothesis Test S&P Volatility.ana” the download link is at the bottom of talk abstract on Analytica Wiki.

• You’ll use this data for exercises… Copyright © 2010 Lumina Decision Systems, Inc.

• • • •

Statistical Significance

Day of week

Mon Tue Wed Thu Fri

# samples

20 20 20 20 20

Observed Volatility

19.4% 11.9% 21.5% 20.1% 14.3% Alice: “This shows that the market volatility depends on the day of the week.” Alice’s mission: To show that this observed variation is unlikely if it is just due to random sampling variation.

Null Hypothesis: The “true” underlying volatility is the same for every day of the week.

Level of significance: The probability that this much variation in volatility would be observed if the Null Hypothesis is true. (termed the “p-value”) Copyright © 2010 Lumina Decision Systems, Inc.

Statistical Significance #2

Day of week # samples Observed Volatility

Mon Tue Wed Thu Fri 20 20 20 20 20 After her statistical analysis, Alice might say: 19.4% 11.9% 21.5% 20.1% 14.3% “This shows at a significance level p=3% that market volatility varies with the day of the week.” • By convention, p ≤ 5% is usually considered to be “statistically significant”. p>5% is said to be “not statistically significant”.

• What can you conclude if the p-value turns out to be 20%?

Copyright © 2010 Lumina Decision Systems, Inc.

Day

Mon Tue Wed Thu Fri

The “Statistic”

# samples

20 20 20 20 20

Observed Volatility (vol)

19.4% 11.9% 21.5% 20.1% 14.3% Total volatility: 18.1% • • • We need a scalar metric to summarize degree of conflict with Null-hypothesis (H 0 ).

Smaller value  more consistent with H 0 Larger value  greater disagreement with H 0 Examples: Max(vol,day) – Min(vol,day) SDeviation(vol,day) F = Variance(vol,day) / Total_volatility^2 Exercise: Pick a statistic and compute its value for the S&P 500 dataset in your Analytica model.

Copyright © 2010 Lumina Decision Systems, Inc.

Methodology

Model of Null Hypothesis

Simulated Dataset Statistic on simulated pValue Measured dataset Statistic on measured • • • Construct a model that simulates measurements given that the null-hypothesis is true.

Typically makes various assumptions.

Use Monte Carlo simulation to produce several simulated data sets. Apply the statistic to each.

pValue: Pr( Stat sim ≥ Stat meas ) Copyright © 2010 Lumina Decision Systems, Inc.

Modeling the Null Hypothesis

Day

Mon Tue Wed Thu Fri

# samples

20 20 20 20 20

Observed Volatility (vol)

19.4% 11.9% 21.5% 20.1% 14.3% Total volatility: 18.1% • • • • Null Hypothesis: The volatility is 18.1% on every day of the week.

How could you simulate the data?

(Hint: There are multiple possible approaches)

What assumptions are you making?

Some ideas: Randomly generate each day’s price change from a LogNormal distribution.

Shuffle existing data.

Exercise: Implement a model of the null-hypothesis in your Analytica model. (One random dataset for each item in Run) Copyright © 2010 Lumina Decision Systems, Inc.

Computing Statistic on Simulated

• Exercise: Apply your statistic to each simulated dataset.

Note: Larger statistic values occur when the variation in volatility by day is largest.

• Exercise: What fraction of simulated datasets have a larger statistic value than the actual data?

This is the p-value Is Alice’s hypothesis statistically significant?

Copyright © 2010 Lumina Decision Systems, Inc.

• • •

Common Misuse of Paradigm: Multiple Hypotheses

Scenario: Alice identifies 20 other plausible hypotheses to test, e.g.: Volatility on Tues is different than the other 4 days.

Volatility varies my month.

September has a higher volatility than other months.

… She tests each of these individually and finds one of them to be statistically significant at a 5% level.

She publishes this result.

What’s wrong here?

What should she do differently?

Copyright © 2010 Lumina Decision Systems, Inc.

Interpreting p-Value

• • • Small value (< 5%) Accept main hypothesis Data is inconsistent with Null-hypothesis Otherwise (p > 5%) Conclude only that data sample was too small to detect relationship.

Hypothesis may still be true or false: “Larger research study required” P-value is

not

: A measure of the strength of relationship.

The probability that the hypothesis is true.

Copyright © 2010 Lumina Decision Systems, Inc.

• • • • •

Drawbacks with Statistical Hypothesis Testing Paradigm

1 in 20 false hypotheses are accepted (at 5% significance level).

Often abused by people testing many hypotheses.

Nearly any hypothesis is confirmed with a large enough sample.

Most hypotheses will have at least a miniscule “true” effect.

With enough data, even the most miniscule effect becomes statistically significant.

The “uncertainty” about the hypothesis is not available. Doesn’t provide P(H), which would be useful in model that use the results.

Numerous subjective components that are not recognized or reported explicitly.

“Cookbook tests” are very often misapplied when assumptions don’t hold, leading to greater confidence than is warranted by the data.

Copyright © 2010 Lumina Decision Systems, Inc.

New Exercise

Number of subjects:

(purely fictional data)

Parkinson’s

Not exposed Exposed to TCE 10 4

No Parkinson’s

140 25 • • • Hypothesis: TCE exposure is associated with an increased risk of getting Parkinson’s disease.

Null Hypothesis: Parkinson’s rates are the same among those exposed and not exposed to TCE.

Exercise: Identify an appropriate statistic.

Model the null-hypothesis Compute the p-Value Copyright © 2010 Lumina Decision Systems, Inc.

Summary

• • • • Statistical Hypothesis Testing tests: Is the support for a hypothesis statistically significant given a dataset.

Significance level (p-value) is: Probability of seeing data at least as extreme as the actual data when the Null hypothesis is true.

p-value <= 5% p-value > 5%   accept hypothesis conclude nothing, need more data.

Methodology: Identify statistic (scalar metric): A measure of divergence from null-hypothesis.

Build model of null-hypothesis to “simulate” data sets.

Compute p-value.

Copyright © 2010 Lumina Decision Systems, Inc.