Statistical Hypothesis Testing

Download Report

Transcript Statistical Hypothesis Testing - Analytica Wiki

(8 th Session in “Gentle Introduction to Modeling Uncertainty”)

Lonnie Chrisman, Ph.D.

Scope of Today’s Webinar

• • •

Included:

Conceptual underpinnings of classical hypothesis testing.

Interpretation of statistical significance (p-values). General methodology for applying it in any scenario.

Intended to promote conceptual understanding.

Building on Monte Carlo tools.

•

Not included:

Outline

• • • • • • • • • Motivating example Statistical significance The Statistic Methodology Modeling the Null hypothesis Computing the pValue Interpretation of results Drawbacks of methodology Additional exercise Copyright © 2010 Lumina Decision Systems, Inc.

• •

Does Stock Market Volatility Vary with Day of Week?

Random selected 100 trading days (from 2000-2010).

Computed day change (close-open)/open for S&P 500 index.

Day of week # samples Volatility

Mon Tue Wed Thu Fri Total volatility: 18.1% 20 20 20 20 20 19.4% 11.9% 21.5% 20.1% 14.3%

Side note: Annualized volatility := SDeviation * sqrt(T) where T = # trading days/yr = 250

Download Model with S&P Data

• Please download: “Hypothesis Test S&P Volatility.ana” the download link is at the bottom of talk abstract on Analytica Wiki.

• • • •

Statistical Significance

Day of week

Mon Tue Wed Thu Fri

# samples

20 20 20 20 20

Observed Volatility

19.4% 11.9% 21.5% 20.1% 14.3% Alice: “This shows that the market volatility depends on the day of the week.” Alice’s mission: To show that this observed variation is unlikely if it is just due to random sampling variation.

Null Hypothesis: The “true” underlying volatility is the same for every day of the week.

•

Statistical Significance #2

Day of week # samples Observed Volatility

Mon Tue Wed Thu Fri 20 20 20 20 20 After her statistical analysis, Alice might say: 19.4% 11.9% 21.5% 20.1% 14.3% “This shows at a significance level p=3% that market volatility varies with the day of the week.” • By convention, p ≤ 5% is usually considered to be “statistically significant”. p>5% is said to be “not statistically significant”.

• What can you conclude if the p-value turns out to be 20%?

Day

Mon Tue Wed Thu Fri

The “Statistic”

# samples

20 20 20 20 20

Observed Volatility (vol)

19.4% 11.9% 21.5% 20.1% 14.3% Total volatility: 18.1% • • • We need a scalar metric to summarize degree of conflict with Null-hypothesis (H 0 ).

Smaller value  more consistent with H 0 Larger value  greater disagreement with H 0 Examples: Max(vol,day) – Min(vol,day) SDeviation(vol,day) F = Variance(vol,day) / Total_volatility^2 Exercise: Pick a statistic and compute its value for the S&P 500 dataset in your Analytica model.

Methodology

Model of Null Hypothesis

Simulated Dataset Statistic on simulated pValue Measured dataset Statistic on measured • • • Construct a model that simulates measurements given that the null-hypothesis is true.

Typically makes various assumptions.

Use Monte Carlo simulation to produce several simulated data sets. Apply the statistic to each.

Modeling the Null Hypothesis

Day

Mon Tue Wed Thu Fri

# samples

20 20 20 20 20

Observed Volatility (vol)

19.4% 11.9% 21.5% 20.1% 14.3% Total volatility: 18.1% • • • • Null Hypothesis: The volatility is 18.1% on every day of the week.

How could you simulate the data?

(Hint: There are multiple possible approaches)

What assumptions are you making?

Some ideas: Randomly generate each day’s price change from a LogNormal distribution.

Shuffle existing data.

Computing Statistic on Simulated

• Exercise: Apply your statistic to each simulated dataset.

Note: Larger statistic values occur when the variation in volatility by day is largest.

• Exercise: What fraction of simulated datasets have a larger statistic value than the actual data?

This is the p-value Is Alice’s hypothesis statistically significant?

• • •

Common Misuse of Paradigm: Multiple Hypotheses

Scenario: Alice identifies 20 other plausible hypotheses to test, e.g.: Volatility on Tues is different than the other 4 days.

Volatility varies my month.

September has a higher volatility than other months.

… She tests each of these individually and finds one of them to be statistically significant at a 5% level.

She publishes this result.

What’s wrong here?

What should she do differently?

Interpreting p-Value

• • • Small value (< 5%) Accept main hypothesis Data is inconsistent with Null-hypothesis Otherwise (p > 5%) Conclude only that data sample was too small to detect relationship.

Hypothesis may still be true or false: “Larger research study required” P-value is

not

: A measure of the strength of relationship.

The probability that the hypothesis is true.

• • • • •

Drawbacks with Statistical Hypothesis Testing Paradigm

1 in 20 false hypotheses are accepted (at 5% significance level).

Often abused by people testing many hypotheses.

Nearly any hypothesis is confirmed with a large enough sample.

Most hypotheses will have at least a miniscule “true” effect.

With enough data, even the most miniscule effect becomes statistically significant.

The “uncertainty” about the hypothesis is not available. Doesn’t provide P(H), which would be useful in model that use the results.

Numerous subjective components that are not recognized or reported explicitly.

“Cookbook tests” are very often misapplied when assumptions don’t hold, leading to greater confidence than is warranted by the data.

New Exercise

Number of subjects:

(purely fictional data)

Parkinson’s

Not exposed Exposed to TCE 10 4

No Parkinson’s

140 25 • • • Hypothesis: TCE exposure is associated with an increased risk of getting Parkinson’s disease.

Null Hypothesis: Parkinson’s rates are the same among those exposed and not exposed to TCE.

Exercise: Identify an appropriate statistic.

Summary

• • • • Statistical Hypothesis Testing tests: Is the support for a hypothesis statistically significant given a dataset.

Significance level (p-value) is: Probability of seeing data at least as extreme as the actual data when the Null hypothesis is true.

p-value <= 5% p-value > 5%   accept hypothesis conclude nothing, need more data.

Methodology: Identify statistic (scalar metric): A measure of divergence from null-hypothesis.

Build model of null-hypothesis to “simulate” data sets.

Compute p-value.