Statistics - Johns Hopkins University

Download Report

Transcript Statistics - Johns Hopkins University

Statistics
What Is Engineering?
July 16, 2007
Why Study Statistics?
Statistics: A mathematical science concerned with data collection,
presentation, analysis, and interpretation.
Statistics can tell us about…
Sports
Population
Economy
New York City, NY
4.6%
Unemployed
Employed
Other
32.3%
63.1%
Estimated Population
Employment Rates: 2006
8,250,000
8,200,000
8,150,000
8,100,000
8,050,000
8,000,000
7,950,000
7,900,000
2000 2001 2002 2003 2004 2005 2006
Year
Why Study Statistics?
Statistical analysis is also an integral part of scientific research!
Are your experimental results believable?
Example: Breaking strength of Welds
Control:
Response:
Velocity (102 ft/min)
Breaking Strength (ksi)
2.00
89
2.5
97, 91
2.75
98
3.00
100, 104, 97
Data suggests a relationship between velocity and breaking strength!
Not perfect – have random error.
(To make a weld, the operator stops a rotating part by forcing it into a stationary
part; the resulting friction generates heat that produces a hot-pressure weld.)
Why Study Statistics?
Responses and measurements are variable!
Due to…
Randomness (or individual differences) in sampling
population.
Inability to perform measurements in exactly the
same way every time.
Goal of statistics is to find the model that best describes a
target population by taking sample data.
Represent randomness using probability.
Probability
Experiment of chance: a phenomena whose outcome is uncertain.
Probabilities
Chances
Sample Space
Probability Model
Events
Probability of Events
Sample Space: Set of all possible outcomes
Event: A set of outcomes (a subset of the sample space). An
event E occurs if any of its outcomes occurs.
Probability: The likelihood that an event will produce a certain
outcome.
Probability
Consider a deck of playing cards…
Sample Space?
Event?
Probability?
Set of 52 cards
R: The card is red.
F: The card is a face card.
H: The card is a heart.
3: The card is a 3.
P(R) = 26/52
P(F) = 12/52
P(H) = 13/52
P(3) = 3/52
Events and variables
Can be described as random or deterministic:
The outcome of a random event cannot be predicted:
The sum of two numbers on two rolled dice.
The time of emission of the ith particle from radioactive material.
The outcome of a deterministic event can be predicted:
The measured length of a table to the nearest cm.
Motion of macroscopic objects (projectiles, planets, space
craft) as predicted by classical mechanics.
Extent of randomness
A variable can be more random or more deterministic depending
on the degree to which you account for relevant parameters:
Mostly deterministic:
Only a small fraction of the outcome cannot be accounted for.
Length of a table:
• Temperature/humidity variation
• Measurement resolution
• Instrument/observer error
• Quantum-level intrinsic uncertainty
Mostly Random:
Most of the outcome cannot be accounted for.
• Trajectory of a given molecule in a solution
Random variables
Can be described as discrete or continuous:
• A discrete variable has a countable number of values.
Number of customers who enter a store before one
purchases a product.
• The values of a continuous variable can not be listed:
Distance between two oxygen molecules in a room.
Consider data collected for undergraduate students:
Random Variable
Possible Values
Gender
Male, Female
Class
Fresh, Soph, Jr, Sr
Height (inches)
# in interval {30,90}
College
Arts, Education, Engineering, etc.
Shoe Size
3, 3.5 … 18
Is the height a discrete or continuous variable?
How could you measure height and shoe size to make them continuous variables?
Probability Distributions
If a random event is repeated many times, it will produce a
distribution of outcomes (statistical regularity).
(Think about scores on an exam)
The distribution can be represented in two ways:
• Frequency distribution function: represents the
distribution as the number of occurrences of each
outcome
• Probability distribution function: represents the
distribution as the percentage of occurrences of each
outcome
Discrete Probability Distributions
Consider a discrete random variable, X:
f(xi) is the probability distribution function
What is the range of values of f(xi)?
Therefore, Pr(X=xi) = f(xi)
Discrete Probability Distributions
Properties of discrete probabilities:
Pr(X  xi )  f ( xi )  0
k
for all i
k
 Pr(X  x )   f ( x )  1
i
i 1
i 1
i
for k possible discrete outcomes
Pr(a  X  b)  F (b)  F (a) 
 f (x )
a xi b
Where:
F ( x)  Pr(X  x)
i
Discrete Probability Distributions
Example: Waiting for a success
Consider an experiment in which we toss a coin until heads turns up.
Outcomes, w = {H, TH, TTH, TTTH, TTTTH…}
Let X(w) be the number of tails before a heads turns up.
1
f ( x)  x 1
2
For x = 0, 1, 2….
0.5
0.45
0.4
0.35
0.3
f(x) 0.25
0.2
0.15
0.1
0.05
0
0
1
2
3
Waiting time
4
5
6
Cumulative Discrete Probability Distributions
j
Pr(X  x' )  F ( x' )   f ( xi )
i 1

Where xj is the largest discrete
value of X less than or equal to x’
Pr(X  xk )  1
Cumulative Continuous Probability Distributions
For continuous variables, the events of interest are intervals rather
than isolated values.
Consider waiting time for a bus which is equally likely to be
anywhere in the next ten minutes:
Not interested in probability that the bus will arrive in 3.451233
minutes, but rather the probability that the bus will arrive in the
subinterval (a,b) minutes:
ba
P(a  T  b)  F (b)  F (a) 
10
F(t)
1
t
10
Continuous Probability Density Function
c.d.f: Gives the fraction of the total probability that lies at or to the
left of each x
p.d.f: Gives the density of concentration of probability at each
point x
In terms of the c.d.f.:
P( x  X  x  x)  F ( x  x)  F ( x)  F ( x)
When F(x) is differentiable at x, and Δx is small, we can approximate
ΔF by the differential of F:
dF( x)  F ' ( x)x
Continuous Probability Distributions
Properties of the cumulative distribution function:
F ()  0
0  F ( x)  1
F ( )  1
Properties of the probability density function:
b
Pr(a  X  b)  F (b)  F (a)   f ( x)dx
a
Continuous Probability Distributions
Example: Gaussian (normal) distribution:
 ( x   )2 
1
f ( x) 
exp

2
2

2 


Each member of the normal distribution family is described by the
mean (μ) and variance (σ2).
Standard normal curve: μ = 0, σ = 1.
Central Limit Theorem
As the sample size goes to infinity, the distribution function of the
standardized variable leads to the normal distribution function!
http://www.jhu.edu/virtlab/prob-distributions/
Moments
In physics, the moment refers to the force applied to a system at a distance
from the axis of rotation (as in a lever).
In mathematics, the moment is a measure of how far a function is from
the origin.
The 1st moment about the origin:
(mean)
 Average value of x
The 2nd moment about the mean:
 A measure of the ‘spread’ of the data
(variance)
Moments
Other values in terms of the moments:
Skewness:
3
 
2 3/ 2
‘lopsidedness’ of the distribution
 a symmetric distribution will have a skewness = 0
 negative skewness, distribution shifted to the left
 positive skewness, distribution shifted to the right
Kurtosis:
4
( 2 ) 2
 Describes the shape of the distribution with respect to
the height and width of the curve (‘peakedness’)
Standard Error
  2
Standard Deviation:
Variance is the average squared distance of the data from the mean.
Therefore, the standard deviation measures the spread of data about the
mean.

Standard Error:
N
Where N is the sample size
How do we reduce the size of our standard error?
Independence
A measure of whether two variables are related.
Random variation
or correlation?
Consider data collected for arrowhead breakage:
Base
Middle
Tip
Fire
21
8
18
47
Other
15
11
4
30
36
19
22
77
Does the location of the fracture depend on the cause of fracture?
Or in other words, is the location of fracture independent of the
cause?