Numerical Accuracy - Texas A&M University
Download
Report
Transcript Numerical Accuracy - Texas A&M University
CHAPTER 4
Round-Off and
Truncation Errors
Numerical Accuracy
Truncation error : Method dependent
Errors which result from using an approximation
rather than an exact procedure
h2
h3
f ( x i h ) f ( x i ) hf ( x i )
f ( x i )
f ( x i ) ....
2!
3!
Round-off error : Machine dependent
Errors which result from not being able to
adequately represent the true value
Result from using an approximate number to
represent exact number
3.1416 ,
e 2.71828
Taylor Series Expansion
Construction of finite-difference formula
Numerical accuracy: discretization error
x
a
Base point x = a
f ( x ) co c1 ( x a) c 2 ( x a) 2 c 3 ( x a) 3
co f (a)
2
f
(
x
)
c
2
c
(
x
a
)
3
c
(
x
a
)
c 1 f ( a )
1
2
3
f ( x ) 2 c 6 c ( x a )
c 2 f ( a ) / 2!
2
3
c 3 f ( a ) / 3!
f ( x ) 6 c 3
(m)
f ( x ) ( m! )c m ( m 1)m( m 1) 2 c m 1 ( x xo ) c m f ( m ) ( a ) / m!
f ( x)
c
m 0
( x a)
m
m
m 0
f ( m ) (a)
( x a) m
m!
Taylor series expansions
h2
h3
f ( xi 1 ) f ( xi h) f ( xi ) hf ( xi )
f ( xi )
f ( xi ) ....
2!
3!
Taylor Series and Remainder
Taylor series (base point x = a)
f ( x)
m 0
f ( m ) ( a)
( x a) m
m!
f ( a )
f ( a )
f ( n) ( a)
2
3
f ( a ) f ( a )( x a )
( x a)
( x a ) ...
( x a ) n Rn
2!
3!
n!
Remainder
f
( )
n1
Rn
( x a)
( n 1)!
( n1 )
Truncation Error
Taylor series expansion
h2
h3
f ( xi 1 ) f ( xi h) f ( xi ) hf ( xi )
f ( xi )
f ( xi ) ....
2!
3!
Example (higher-order terms truncated)
ex
x2 x3 x4 x5
1 x
....
2! 3! 4! 5!
x 3 x 5 x7 x 9
sin x x
....
3! 5! 7 ! 9!
(xi = 0, h = x xi+1 = x)
Power series
Polynomials
The function
becomes more
nonlinear as m
increases
A MATLAB Script
Filename: fun_exp.m
function sum = exp(x)
% Evaluate exponential function exp(x)
% by Taylor series expansion
% f(x)=1 + x + x^2/2! + x^3/3! + … + x^n/n!
clear all
x = input(‘enter the value of x = ’);
n = input(‘enter the order n = ’);
term =1 ; sum= term;
for i = 1 : n
term = term*x/i;
sum = sum + term;
end
MATLAB For Loops
Filename: fun_exp2.m
function sum = exp(x)
% Evaluate exponential function exp(x)
% by Taylor series expansion
% f(x)=1 + x + x^2/2! + x^3/3! + … + x^n/n!
x = input(‘enter the value of x =’);
n = input(‘enter the order n = ’);
term(1) =1 ; sum(1)= term(1);
for i = 1 : n
term(i+1) = term(i)*x/i;
sum(i+1) = sum(i) + term(i+1);
end
% Display the results
disp(‘i
term(i)
sum(i)’)
a = 1:n+1; [a’ term’ sum’]
Truncation Error
n
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
term
1.0000
10.0000
50.0000
166.6667
416.6667
833.3334
1388.8890
1984.1272
2480.1589
2755.7322
2755.7322
2505.2112
2087.6760
1605.9045
1147.0746
764.7164
477.9478
281.1458
156.1921
82.2064
41.1032
sum
1.0000
11.0000
61.0000
227.6667
644.3334
1477.6667
2866.5557
4850.6826
7330.8418
10086.5742
12842.3066
15347.5176
17435.1934
19041.0977
20188.1719
20952.8887
21430.8359
21711.9824
21868.1738
21950.3809
21991.4844
n
term
sum
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
19.5729
8.8968
3.8682
1.6117
0.6447
0.2480
0.0918
0.0328
0.0113
0.0038
0.0012
0.0004
0.0001
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
22011.0566
22019.9531
22023.8223
22025.4336
22026.0781
22026.3262
22026.4180
22026.4512
22026.4629
22026.4668
22026.4688
22026.4688
22026.4688
22026.4688
22026.4688
22026.4688
22026.4688
22026.4688
22026.4688
22026.4688
x 10, e 22026.4658
x
Truncation Error
n
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
term
1.0000000
-10.0000000
50.0000000
-166.6666718
416.6666870
-833.3333740
1388.8890381
-1984.1271973
2480.1589355
-2755.7321777
2755.7321777
-2505.2111816
2087.6760254
-1605.9045410
1147.0745850
-764.7164307
477.9477539
-281.1457520
156.1920776
-82.2063599
41.1031799
sum
1.0000000
-9.0000000
41.0000000
-125.6666718
291.0000000
-542.3333740
846.5556641
-1137.5715332
1342.5874023
-1413.1447754
1342.5874023
-1162.6237793
925.0522461
-680.8522949
466.2222900
-298.4941406
179.4536133
-101.6921387
54.4999390
-27.7064209
13.3967590
x 10,
n
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
term
-19.5729427
8.8967924
-3.8681707
1.6117378
-0.6446951
0.2479596
-0.0918369
0.0327989
-0.0113100
0.0037700
-0.0012161
0.0003800
-0.0001152
0.0000339
-0.0000097
0.0000027
-0.0000007
0.0000002
0.0000000
0.0000000
sum
-6.1761837
2.7206087
-1.1475620
0.4641758
-0.1805193
0.0674404
-0.0243965
0.0084024
-0.0029076
0.0008624
-0.0003537
0.0000263
-0.0000889
-0.0000550
-0.0000647
-0.0000620
-0.0000627
-0.0000625
-0.0000626
-0.0000626
4
e 0.45399 10
x
How to reduce error? e 10 1 / 22026.4658
Round-off Errors
Computers can represent numbers to a
finite precision
Most important for real numbers integer math can be exact, but limited
How do computers represent numbers?
Binary representation of the integers and
real numbers in computer memory
32 bits (23, 8, 1)
28 = 256
128
38
smallest
.
100
00
(2)
0
.
14693
10
127
38
largest
.
111
11
(2)
0
.
18905
10
64 bits (52, 11, 1)
211 = 2048
smallest .100 00(2)1024
largest
.11111(2)1023
MATLAB uses double precision
Order of operation
Addition problem:
0.99 0.0044 0.0042 0.9986
exact result
with 3-digit arithmetic:
(0.99 0.0044) 0.0042 0.994 0.0042 0.998
0.99 (0.0044 0.0042) 0.99 0.0086 0.999
Round-off error
Cancellation error
x 2 bx 1 0
br
x1
2
br
x2
2
If b is large, r is close to b
Difference of two numbers very
close to each other potential
for greater error!
r b2 4
Rationalize:
b r b r b2 r 2
4
2
x2
2 b r 2b r 2b r b r
Try b = 97
x 97x 1 0
2
(r = 96.9794)
x2 (3 sig. figs.)
exact:
0.01031
standard:
0.01050
rationalized: 0.01031
Corresponding to “cancellation, critical arithmetic”
Significant Figures
48.9 mph? 48.95 mph?
Significant Digits
The places which can be used with confidence
32-bit machine: 7 significant digits
64-bit machine: 17 significant digits
Double precision: reduce round-off error, but
increase CPU time
3.1415926535
8979323846
2643
2 1.4142135623
7310
e 2.7182818284
5904
False Significant Figures
3.25/1.96 = 1.65816326530162... (from MATLAB)
But in practice only report 1.65 (chopping) or
1.66 (rounding)! Why??
Because we don’t know what is beyond the
second decimal place
4082...
3.259 / 1.960 1.6627551020
Chopping
1869...
3.250 / 1.969 1.6505840528
7724...
3.254 / 1.955 1.6644501278
Rounding
6558...
3.245 / 1.964 1.6522403258
Accuracy and precision
Accuracy - How closely a measured or computed
value agrees with the true value
Precision - How closely individual measured or
computed values agree with each other
More
Accurate
More
Precise
Accuracy is getting all your shots near the target.
Precision is getting them close together.
Numerical Errors
The difference between the true value and
the approximation
Approximation = true value + true error
Et = true value approximation = x* x
True Error x * x
Relative Error
True Value
x*
or in percent
x * x
t
100 %
x*
Approximate Error
But the true value is not known
If we knew it, we wouldn’t have a problem
Use approximate error
approximat
e e rror
a
100%
approximat
ion
pre se ntapprox. pre viousapprox.
100%
pre se ntapproximat
ion
xnew xold
Relative error
100%
xnew
Number Systems
Base-10 (Decimal): 0,1,2,3,4,5,6,7,8,9
Base-8 (Octal): 0,1,2,3,4,5,6,7
Base-2 (Binary): 0,1 – off/on, close/open,
negative/positive charge
Other non-decimal systems
1 lb = 16 oz, 1 ft = 12 in, ½”, ¼”, …..
5 ,129 5 10 3 1 10 2 2 10 1 9 100
base 10 :
1
2
3
4
0
.
3125
3
10
1
10
2
10
5
10
101101 1 2 5 0 2 4 1 2 3 1 2 2 0 2 1 1 2 0 45
base 2 :
11
1
2
3
4
0
.
1011
1
2
0
2
1
2
1
2
16
Decimal
System
(base 10)
Binary
System
(base 2)
Integer Representation
Signed magnitude method
Use the first bit of a word to indicate the
sign – 0: negative (off), 1: positive (on)
Remaining bits are used to store a number
+
1 0 1 0
0 1
0
1 1
0
Sign
Number
off / on, close / open, negative / positive
Integer Representation
8-bit word
2
6
2
5
2
4
2
3
2
2
2
1
2
0
Sign
Number
smallest number 0000000base2 0base10
largest number 1111111base2 127base10
+/- 0000000 are the same, therefore we may use
“-0” to represent “-128”
Total numbers = 28 = 256 (-128 127)
Integer Representation
16-bit word
1 214 1 213 .... 1 21 1 20 32,767
Range: -32,768 to 32,767
Overflow: > 32,767 (cannot represent 43,000
A&M students)
Underflow: < -32,768 (magnitude too large)
32-bit word
Range: -2,147,483,648 to 2,147,483,647
9 significant digits
Overflow: world population 6 billion
Underflow: budget deficit -$100 billion
Integer Operations
Integer arithmetic can be exact as long as
you don't get remainders in division
7/2 = 3 in integer math
or overflow the maximum integer
For a 8-bit computer max = 128 (or -127)
So 123 + 45 = overflow
and -74 * 2 = underflow
Floating-Point Representation
Real numbers (also called floating-point
numbers) are represented differently
For fraction or very large numbers
Store as
sign signed exponent
mantissa
sign is 1 or 0 for negative or positive
exponent is maximum value (positive or
negative) of base
mantissa contains significant digits
Floating-Point Representation
e
m
e1 e 2 em d 1 d 2 d 3 d p
sign of
number
signed exponent
mantissa
N .d1 d 2 d 3 d p B mB
e
e
m: mantissa
B: Base of the number system
e: “signed” exponent
Note: the mantissa is usually “normalized”
if the leading digit is zero
Integer representation
Floating-point number representation
Decimal Representation
8-bit word
1
0
1
2
3
4
10 10 10 10 10 10
sign
signed exponent
number
1|095|1467 (base: B = 10)
mantissa: m = -(1*10-1 + 4*10-2 + 6*10-3 + 7*10-4 ) = -0.1467
signed exponent: e = + (9*101 + 5*100) = 95
10951467base 10 mB 0.1467 10
e
95
Floating-Point Representation
8-bit word (without normalization)
2
1
2
0
2
1
2
2
2
3
2
4
sign
signed exponent
number
0|111|0101 (base: B = 2)
mantissa: m = +(0*2-1 + 1*2-2 + 0*2-3 + 1*2-4 ) = 5/16
signed exponent: e = - (1*21 + 1*20) = -3
10111001
base 2 mB (5/16) 2
e
3
5/128
Normalization
1 in 2 (1/144)ft 2 0.006944ft 2
2
2
2
1 in 0.694444 10 ft
(Less accurate)
(Normalization)
Remove the leading zero by lowering the exponent
(d1 = 1 for all numbers)
1
m1
B
1
base 10 : 10 m 1 0.1 m 1
1
base 2 :
m1
2
if m < 1/2, multiply by 2 to remove the leading 0
floating-point allow fractions and very large numbers to
be represented, but take up more memory and CPU time
Binary Representation
8-bit word (with normalization)
2
1
2
0
2
1
2
2
2
3
2
4
sign
signed exponent
number
1|011|1001 (base: B = 2)
mantissa: m = -(1*2-1 + 0*2-2 + 0*2-3 + 1*2-4 ) = -9/16
signed exponent: e = + (1*21 + 1*20) = 3
10111001
base 2 mB (9/16) 2 9/2
e
3
Single Precision
A real variable (number) is stored in four words,
or 32 bits (64 bits for Supercomputers)
bit (binary digit): 0 or 1
byte: 4 bits, 24 = 16 possible values
word: 2 bytes = 8 bits, 28 = 256 possible values
32 bits
23 for the digits
8 for the signed exponent
1 for the sign
smallest .10000(2)127 0.29387 1038
128
39
largest .11111(2) 0.34028 10
Double Precision
A real variable is stored in eight words, or 64 bits
16 words, 128 bits for supercomputers
64 bits
52 for the digits
11 for the signed exponent
1 for the sign
signed exponent 210 = 1024
smallest .10000(2)
1024
largest .11111(2)
1023
Round-off Errors
Floating point characteristics contribute to round-off
error (limited bits for storage)
Limited range of quantities can be represented
A finite number of quantities can be represented
The interval between numbers increases as the
numbers grow
Example - three significant digits
0.0100 0.0101 0.0102 …… 0.0999 (0.0001 increment)
0.100
0.101
0.102 ……. 0.999
(0.001 increment)
1.00
1.01
1.02
(0.01 increment)
……. 9.99
MATLAB
Finite number of real quantities (integers,
real numbers or text) can be represented
For 8-bit, 28 = 256 quantities
For 16-bit, 216 = 65536 quantities
MATLAB uses double precision
4 bytes = 64 bits
more than 1019 (264) quantities