化工應用數學 授課教師: 郭修伯 Lecture 2 實驗數據的分析 實驗精確程度  The degree of accuracy sought in any investigation should depend upon the projected use of the results,

Download Report

Transcript 化工應用數學 授課教師: 郭修伯 Lecture 2 實驗數據的分析 實驗精確程度  The degree of accuracy sought in any investigation should depend upon the projected use of the results,

化工應用數學
授課教師: 郭修伯
Lecture 2 實驗數據的分析
實驗精確程度

The degree of accuracy sought in any investigation should
depend upon the projected use of the results, and the
accuracy of the required data and calculations should be
consistent with the desired accuracy in the results.

It is desirable to complete the investigation and obtain the
required accuracy with a minimum of time and expense.

The accuracy of a number representing the value of a
quantity is the degree of concordance between this
number and the number that represents the true value of
the quantity; it may be expressed in either absolute or
relative terms.
誤差來源

Accidental errors of measurement
– Such errors are inevitable in all measurements and that they result from
small unavoidable errors of observation due to more or less fortuitous
variation in the sensitivity of measuring instrument and the keenness of
the senses of perception. (例如用了不準的A來校正B,用B的錯誤校正曲
線進行量測)

Precision and constant errors
– A result may be extremely precise and at the same time highly inaccurate.
– Constant errors can be detected only by performing the measurement with
a number of different instruments and , if possible, by several independent
methods and observers. (例如用了不準確的儀器或樣品取樣在不具代表
性的地方)

Errors of Methods
– These arises as a result of approximations and assumptions made in the
theoretical development of an equation used to calculate the desired result.
(例如在計算時,用的錯誤的假設)
Variance and distribution of random
errors



If an experimental measurement is repeated a number of times,
the recorded values of the measured quantity almost invariably
differ from one another.
The data so obtained may be used for two purposes:
– to evaluate the precision of the measurement
– to obtain an estimate of the probability that the mean of the
measurement differs from the true value of the measured
quantity by some special amount
The “scatter” of the repeated measurements of the quantity is
commonly reported in terms of the “variance” or “standard
deviation” of the sets of measurements.
Sample variance and standard
deviation
n

Sample mean
x
x
k 1
k
n
n


Sample variance

2

2
(
x

x
)
 k
k 1
Sample standard deviation
n
  2
Population variance and standard
deviation
n

Population mean
x  lim
n 
x
k 1
k
n
n

Population variance
 2  lim
n 

Population standard deviation
2
(
x

x
)
 k
k 1
n
  2
Population and sample
Sample 1
Population
Normal frequency distribution

If an infinite data set the variation in x are
random, it was first shown by Gauss that the
distribution of values of x about the
population mean is given by
exp{(1 / 2)[(x  x ) /  ]2 }
f 
 2
– f is frequency, or probability of occurrence, of a
value of magnitude x.
Normal frequency distribution

The probability that a single measurement will
give a value lying between x - dx/2 and x +
dx/2 is

exp{(1 / 2)[(x  x ) /  ]2 }
fdx  
dx
 2

The probability is less than 5% that a single measurement of x
will differ from x by more than twice the standard deviation, i.e.
by more than  2.

The range  2 is frequently called the “95 percent confidence
belt on x”.
More about variance...


The sample mean is the best estimate of the
population mean.
The sample variance is not the best estimate of
the population variance. A better estimate is
given by
n
Sample estimate of the
population variance
sample
s 
2
2
(
x

x
)
 k
k 1
n 1
n

2
n 1
Population
Number of measurment

By how much can the mean of n
measurement be expected to differ from
the best value of the population mean?
Population x ,  2
Set 1, k times
x1 , s12
Set 2, k times
x2 , s22
Estimate of the set of means, xm
Estimate variance of the set of means,
… n sets
sm  si
2
2
Sample variance of the mean
n


The grand mean
xm 
i 1
i
(Sample猜測的population mean)
n
The sample estimate of the variance of the set of means
n
sm

x
2

 (x
i
i 1
 xm ) 2
n 1
(Sample猜測的population variance)
The sample estimate of the variance of the set of means
may be estimated by a single set
sm
2
s2

k
sample
Population
Confident limits for small samples

To associate the magnitude of deviations of xi
from the population mean x with the probability
of the occurrence of such deviations.

It is known that if the sample set contains at least
20 entries, the error introduced by the previous
slide is not serious.
s2
2
2
sm


k
m
For smaller samples, however, s2/k is not an
adequate estimate of m2.
Student’s t test


The solution to small number of entries was first
pointed out in Biometrika, Vol. VI, 1908, by W.S.
Gossett, who signed his article “STUDENT”.
The dimensionless quantity of particular interest in a
confidence-limit analysis is called “Student’s t”:
xi  x
t 
sm

It involves estimates obtained from a sample of finite
size.
Student’s t test


t is the difference between the measured sample
mean and the true population mean divided by the
sample estimate of the standard deviation of the
population of means.
“Student” derived the frequency distribution for t:
t 2 ( f 1) / 2
f t dt  C f {(1  )
}dt
f
– Cf is a function of f only
– f is the “degree of freedom”, defined as the number of values
used to calculate the means on which t is based, less the
number of means so calculated.
Student’s t test

The distribution funcion of t is used to
calculate the probability value of the size of
the sample (the degrees of freedom f).
 Probability calculation of this kind have been
carried out over a wide range of conditions,
and the results are tablulated in the handout
given in the course.
sample
t
Population
T test 範例

Two methods were used to measure a
quantity. It is desired to use these data to
obtain the following information:
Xi1
Xi2
Xi3
Xi4
Xi5
Xi
2
 (Xik- Xi)
2
S1 (Xi)
2
Sm
95% confidence limits
Procedure 1
55.3
56.9
55.8
57.3
57.7
56.6
4.1
1.0
0.2
55.3 = X1 = 57.9
Procedure 2
52.6
54.3
58.0
52.7
60.0
55.5
43.9
11
2.2
51.3 = X2 = 59.9
T test 範例(續)

The confidence limits to be assigned to
the results of procedures 1 and 2
 The significance of the difference between
the mean values of the results of
procedures 1 and 2
 The “best value” to be assigned to the
sample analysis
 The confidence limits of the best value
T test 範例(續)

Confidence limits
– The sample mean of procedure 1 is easily found to be 56.6
– The external probability limit on t will be arbitrarily set at 0.05,
corresponding to 95 percent confidence limits.
– Five measured values are used and one mean is calculated,
degree of freedom: f = 5 - 1 = 4
– From the “t table” for f = 4, values of t lying outside  2.776
only are 0.05 probable.
– The 95 percent confidence limits on t are:
– s  s   ( x  x ) /(k  1)  4.1 / 4  0.20
2
2
2
m ,1
1
 2.776  t  2.776
1i
k
1
k
5
– The 95 percent confidence limits on x1 are then
x1  x1  2.776 sm ,1  2.776
55.3  x1  57.9
0.20  1.3
T test 範例(續)

The significant of the difference
– The mean value obtained by procedure 1 is 56.6 with 95
percent limits of 55.3  x1  57.9
– The mean value obtained by procedure 2 is 55.5 with 95
percent limits of 51.3  x2  59.9
– The sample means are different, which might be taken to
indicate a systematic difference or bias between the two
methods of analysis.
– The 95 percent confidence limits analysis shows that the
mean of sample 2 is included with the confidence limits of
sample 1, and vice versa.
– It may be concluded that the difference between the two
means has no statitical significance.
T test 範例(續)

The best value?
– Since the difference between the mean value obtained by
procedure 1 and that by procedure 2 is not statistically
significant, the best value to be assigned to the sample
analysis is a weighted combination of the two mean values.
– If the difference between the means has been significant, it
would have been concluded that one or both of the
procedures were affected by non-random factors (errors of
method or bias). In this case, the best value cannot be
estimated.
T test 範例(續)

The best value obtainable from a series of sets of
measurements exhibiting statistically equivalent
n
means is given by
the number of
measurements in the ith set
xbest value 
i 1
n
(k )i
 i2
i
i
W
i 1
Wi 
Population variance
W x

i
(k )i
1

2
s2m , i
si
n
xbest value 
W x
i 1
n
i
i
W
i 1
i

56.6 / 0.2  55.5 / 2.2
 56.6
1 / 0.2  1 / 2.2
T test 範例(續)

The confidence limits of the best value
– the variance of the best value is needed:
 2 m , best value 
1
n
 [(k )
i 1

/i ]
2
i
1

n
 (1 / s
2
m ,i
)
1
 0.18
1 / 0.2  1 / 2.2
i 1
– the degree of freedom: f = 10 - 2 = 8
– the t limits are  2.3
– Consequetly, the 95 percent confidence limits on the best
value are
 2.3  0.18  1.0
55.6  x  57.6
Other test ( L test)

L test:
– A calculation to determine the probability that the
samples represent normal populations exhibiting
the same population variance 2, but without
regard to the population means
Other test (F test)

F test:
– In a single set of data, two types of error are possible - random
errors and errors of method or bias. The magnitude of the random
errors is estimated by the with-sample or error variance. However,
from the single set of data, no estimate of the error due to method
is possible. Suppose that other sets of data are available which are
likewise subject to both random and method errors. For each data
set, the within-sample or error variance may be calculated.
– The statistic F is the ratio of the variance which contains both
random and method error to the variance which includes random
errors only.
– The magnitude of F is a measure of the importance of errors of
method which differ from one data set to the next.
最小平方法 (least squares)


Recall: A best straight line as the one for which the sum of
the squares of the error terms is a minimum. Y  a  b( x  x )
The best measure of the precision with which the points fit
the line is the variance of estimate:
n
se ( yi ) 
2

 (Y
i 1
i
 yi ) 2
degrees of freedom (2 = a, b)
n2
The estimate of the error variance of Yi is
( xi  x ) 2
1
se (Yi )  se [a  b( xi  x )]  se (a)  ( xi  x ) se (b)  se ( yi )[  n
]
ni
 ( xi  x )2
2
2
2
2
2
2
i 1

The confidence limits of Yi is (  t ) se2(Yi )