Tutorial for Basics Statistics in Scilab FISL 13 2012 Professora Dra

Download Report

Transcript Tutorial for Basics Statistics in Scilab FISL 13 2012 Professora Dra

Tutorial for Basics Statistics
in Scilab
FISL 13 2012
Professora Dra Ariane Ferreira
IPRJ/UERJ
LabMacambira.sf.net
Promovendo a Programação de Software Livre
Basics Concepts of Statistics
Normal Probability Distribution
●
●
●
A normal distribution has a bell-shaped
density curve described by its mean  and
standard deviation  .
The density curve is symmetrical, centered
about its mean, with its spread determined by
its standard deviation.
Sample - Population
Population
- The population is the entire group of individuals
that we want information about.
Sample : Empirical data
- The sample is the data set that are showed from
a population of units of interest.
- A set of elements drawn from and analyzed to
estimate the characteristics of a population;
Random Sample
- A sample in which every element in the
population has an equal chance of being
selected;
Sample – Population Example
●
Process : Oxidation furnace
–
–
–
Population : all wafers coming out of the furnace
after the operation
Sample : 5 measurements taken on 4 selected
wafers = 20 measurements
Individual : Every measurement that could be
taken on each wafer of the batch
Sampling x Empirical Data
●
Sampling
–
–
●
The act of choosing or selecting;
Random sampling : the selection of a random
sample; each element of the population has an
equal chance of been selected.
Empirical data
–
–
Sample data collected from process.
Empirical data normally follow a distribution of
probability with known density.
Sampling concepts
Random Sample :
- The random variables X1, X2, …, Xn are a
random sample of size n if:
1. The Xi’s are independent random variables;
2. Every Xi has the same probability distribution.
Statistic:
- A statistic is any function of the observations in
a random sample (sample mean, sample
variance).
Sampling Distribution:
- That are the probability distribution of a statistic.
What are probability distributions?
●
●
Concept: a mathematical model that relates
the value of a random variable with its
probability of occurrence.
Two types:
–
Discrete Probability Distributions
●
–
To describe random variables that can only take on
certain specific values: number of defects on a
semiconductor wafer, rotations to failure, number of
landings to failure, etc.
Continuous Probability Distributions
●
To describe random variables that can have any value
on a continuous scale: linewidth in a sample population
of interconnect, calendar time.
Main Discrete Probability Distributions
Choice of important distributions, only 2 :
●
●
Binomial Distribution
Poisson Distribution
Distributions in details... to see :
Freund, J.E. & Perles B. M. Modern Elementary Statistics, Pearson International
Edition, 12th edition , 2007.
Montgomery, D.C. & Runger, G.C. Applied Statistics and Probability for Engineers,
John Wiley & Sons, Inc., 4th edition, 2006.
Most common continuous distributions
●
●
●
●
●
Exponential
Weibull
Normal Distributions in details... to follow
Gamma
Lognormal
FDistributions in details... to see :
Freund, J.E. & Perles B. M. Modern Elementary Statistics, Pearson International
Edition, 12th edition , 2007.
Montgomery, D.C. & Runger, G.C. Applied Statistics and Probability for Engineers,
John Wiley & Sons, Inc., 4th edition, 2006.
Normal Distribution
Probability density function a normally distributed variable x:
 1  x   2 
1
f X ( x) 
exp  
 
 2
 2    
Notation x ~N(,): x is normally distributed with mean 
and standard-deviation .
The normal distribution has a symmetric bell-shape.
●
The density function of normal distribution with the mean
value 0 with respect to different standard-deviations  .
Cumulative normal distribution
a
P ( x  a )  FX (a ) 
f
X
( x).dx
This integral cannot be
evaluated in closed form.

Standard normal distribution
a

a 
P( x  a)  P  z 

  
 

  
x
z

 = Standard distribution =
normal distribution with   = 0
and  = 1.
Data from any normal distribution may be
transformed into data following the standard normal
distribution by subtracting the mean  and dividing
by the standard deviation  .
Standard Normal curve
●
If a dataset follows a normal distribution, then
about:
–
68% of the observations will fall within  of the
mean , which in this case is with the interval (1,1).
–
95% of the observations will fall within 2 standard
deviations of the mean, which is the interval (2,2) for the standard normal, and
–
99.7% of the observations will fall within 3
standard deviations of the mean, which
corresponds to the interval (-3,3) in this case.
Standard Normal curve
Empirical Data
Table of data : m rows and n columns.
X j ,k
is the measurement
corresponding to the kth
items of the jth sample.
Normality of data must be checked : Anderson-Darling test, Shapiro-Wilk test
or Q-Plot test.
Statistics from empirical data
From the probability distribution of the empirical data,
the information can be derived :
Mean : X
–
Position Characteristics :
–
Dispersion Characteristics : Standard  deviation : s
~
Median : X
Range : R
Position Characteristics
•Mean :
•Median :
1 n
X1  X 2    X n
X   Xk 
n k 1
n
 X  ( n 1) 
  2 
~ 
X  X n X n
 




1



  2   2 

2
If n is odd number
If n is even number
Mean x Median
The median is more robust than the mean.
If X1, X2, …, Xn
–
–
–
are n independents normal (,) random variables,
then
Mean : X
~
Median : X
are two unbiased estimators of .
Dispersion Characteristics
•Standard-deviation s>0
n

1 n
1
2
2
2
  X k  nX 
s
(Xk  X ) 

n  1 k 1
n  1  k 1

•Range R>0
R  X  n   X  1
•Variance s2
Remark: s and R are not the
same scale
Unbiased estimators of 
For samples with size n<30
If X1, X2, …, Xn
–
–
are n independents normal (,) random
variables,
then
s
R
Correction terms for
dispersion
characteristics
–
K s  n
are two unbiased estimators of .
K R  n
Correction terms for dispersion
●
Where:
Don't worry!!!
Be
Ha
pp
y
Table for Ks(n) and KR(n)
Machine (M) / Process (P)
Variability
Machine variability “1M”
Process variability “5M”
How to Compute position/ Dispersion
Characteristics ?
For the jth sample, one can compute :
The mean: X j
~
The median: X j
The standard-deviation: s j
The range:
Rj
How to Compute position/ Dispersion
Characteristics ?
To compute the process characteristics:
The process mean:
1 m
X  Xj
m j 1
The process median:
~ 1 m ~
X  Xj
m j 1
The process standard-deviation:
The process range:
m n
1
2
 X j ,k  X 
sp 

mn  1 j 1 k 1
R p  max( X j ,k )  min( X j ,k )
How to Compute position/ Dispersion
Characteristics ?
To compute the machine characteristics:
The machine standard-deviation:
The machine range:
1 m
sM   s j
m j 1
1 m
RM   R j
m j 1
Example: Table of data
Normality test for Data
In order to test the normality of the process,
then consider the m samples of size n as one
sample of size mn .
In order to test the normality of the machine, we
have to remove the machine effect by making the
following transformation:
X j ,k  X j ,k  X j
Normality tests
●
●
●
●
Anderson-Darling test;
Shapiro-Wilk test;
Skewness and Kurtosis tests;
Normal Q-plot test;
in details... to see :
Freund, J.E. & Perles B. M. Modern Elementary Statistics, Pearson
International Edition, 12th edition , 2007.
Montgomery, D.C. & Runger, G.C. Applied Statistics and Probability for
Engineers, John Wiley & Sons, Inc., 4th edition, 2006.
Skewness and Kurtosis tests
●
Skewness
–
●
A term for asymmetry usually employed with
respect to a histogram of data or a probability
distribution.
Kurtosis
–
A measure of the degree to with a unimodal
distribution is peaked.
Example Normal Q-Plot Test
Estimation of the Population
Parameters
We assume:
X j ,k ~ N   ,  
Estimation of :
~
ˆ  X
Estimation of M:
sM
RM

or ˆ M 
K s ( n)
K R ( n)
Estimation of p:
ˆ M
ˆ p  s p