Numerical Accuracy - Texas A&M University

Transcript Numerical Accuracy - Texas A&M University

CHAPTER 4
Round-Off and
Truncation Errors
Numerical Accuracy
Truncation error : Method dependent
 Errors which result from using an approximation
rather than an exact procedure
h2
h3
f ( x i  h )  f ( x i )  hf ( x i ) 
f ( x i ) 
f ( x i )  ....
2!
3!
Round-off error : Machine dependent
 Errors which result from not being able to
adequately represent the true value
 Result from using an approximate number to
represent exact number
  3.1416 ,
e  2.71828
Taylor Series Expansion
 Construction of finite-difference formula
 Numerical accuracy: discretization error
x
a
Base point x = a
 f ( x )  co  c1 ( x  a)  c 2 ( x  a) 2  c 3 ( x  a) 3  
 co  f (a)

2

f
(
x
)

c

2
c
(
x

a
)

3
c
(
x

a
)

 c 1  f ( a )

1
2
3
 f ( x )  2 c  6 c ( x  a )  
 c 2  f ( a ) / 2!
2
3

 c 3  f ( a ) / 3!
 f ( x )  6 c 3  


 (m)
 f ( x )  ( m! )c m  ( m  1)m( m  1)  2 c m  1 ( x  xo )    c m  f ( m ) ( a ) / m!
 f ( x) 

c
m 0
( x  a) 
m
m


m 0
f ( m ) (a)
( x  a) m
m!
Taylor series expansions
h2
h3
f ( xi 1 )  f ( xi  h)  f ( xi )  hf ( xi ) 
f ( xi ) 
f ( xi )  ....
2!
3!
Taylor Series and Remainder
 Taylor series (base point x = a)
f ( x) 


m 0
f ( m ) ( a)
( x  a) m
m!
f ( a )
f ( a )
f ( n) ( a)
2
3
 f ( a )  f ( a )( x  a ) 
( x  a) 
( x  a )  ... 
( x  a ) n  Rn
2!
3!
n!
 Remainder
f
( )
n1
Rn 
( x  a)
( n  1)!
( n1 )
Truncation Error
 Taylor series expansion
h2
h3
f ( xi 1 )  f ( xi  h)  f ( xi )  hf ( xi ) 
f ( xi ) 
f ( xi )  ....
2!
3!
 Example (higher-order terms truncated)
ex
x2 x3 x4 x5
 1 x 



 ....
2! 3! 4! 5!
x 3 x 5 x7 x 9
sin x  x 



 ....
3! 5! 7 ! 9!
(xi = 0, h = x  xi+1 = x)
Power series
Polynomials
The function
becomes more
nonlinear as m
increases
A MATLAB Script
 Filename: fun_exp.m
function sum = exp(x)
% Evaluate exponential function exp(x)
% by Taylor series expansion
% f(x)=1 + x + x^2/2! + x^3/3! + … + x^n/n!
clear all
x = input(‘enter the value of x = ’);
n = input(‘enter the order n = ’);
term =1 ; sum= term;
for i = 1 : n
term = term*x/i;
sum = sum + term;
end
MATLAB For Loops
 Filename: fun_exp2.m
function sum = exp(x)
% Evaluate exponential function exp(x)
% by Taylor series expansion
% f(x)=1 + x + x^2/2! + x^3/3! + … + x^n/n!
x = input(‘enter the value of x =’);
n = input(‘enter the order n = ’);
term(1) =1 ; sum(1)= term(1);
for i = 1 : n
term(i+1) = term(i)*x/i;
sum(i+1) = sum(i) + term(i+1);
end
% Display the results
disp(‘i
term(i)
sum(i)’)
a = 1:n+1; [a’ term’ sum’]
Truncation Error
n
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
term
1.0000
10.0000
50.0000
166.6667
416.6667
833.3334
1388.8890
1984.1272
2480.1589
2755.7322
2755.7322
2505.2112
2087.6760
1605.9045
1147.0746
764.7164
477.9478
281.1458
156.1921
82.2064
41.1032
sum
1.0000
11.0000
61.0000
227.6667
644.3334
1477.6667
2866.5557
4850.6826
7330.8418
10086.5742
12842.3066
15347.5176
17435.1934
19041.0977
20188.1719
20952.8887
21430.8359
21711.9824
21868.1738
21950.3809
21991.4844
n
term
sum
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
19.5729
8.8968
3.8682
1.6117
0.6447
0.2480
0.0918
0.0328
0.0113
0.0038
0.0012
0.0004
0.0001
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
22011.0566
22019.9531
22023.8223
22025.4336
22026.0781
22026.3262
22026.4180
22026.4512
22026.4629
22026.4668
22026.4688
22026.4688
22026.4688
22026.4688
22026.4688
22026.4688
22026.4688
22026.4688
22026.4688
22026.4688
x  10, e  22026.4658
x
Truncation Error
n
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
term
1.0000000
-10.0000000
50.0000000
-166.6666718
416.6666870
-833.3333740
1388.8890381
-1984.1271973
2480.1589355
-2755.7321777
2755.7321777
-2505.2111816
2087.6760254
-1605.9045410
1147.0745850
-764.7164307
477.9477539
-281.1457520
156.1920776
-82.2063599
41.1031799
sum
1.0000000
-9.0000000
41.0000000
-125.6666718
291.0000000
-542.3333740
846.5556641
-1137.5715332
1342.5874023
-1413.1447754
1342.5874023
-1162.6237793
925.0522461
-680.8522949
466.2222900
-298.4941406
179.4536133
-101.6921387
54.4999390
-27.7064209
13.3967590
x  10,
n
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
term
-19.5729427
8.8967924
-3.8681707
1.6117378
-0.6446951
0.2479596
-0.0918369
0.0327989
-0.0113100
0.0037700
-0.0012161
0.0003800
-0.0001152
0.0000339
-0.0000097
0.0000027
-0.0000007
0.0000002
0.0000000
0.0000000
sum
-6.1761837
2.7206087
-1.1475620
0.4641758
-0.1805193
0.0674404
-0.0243965
0.0084024
-0.0029076
0.0008624
-0.0003537
0.0000263
-0.0000889
-0.0000550
-0.0000647
-0.0000620
-0.0000627
-0.0000625
-0.0000626
-0.0000626
4
e  0.45399 10
x
How to reduce error? e 10  1 / 22026.4658
Round-off Errors
 Computers can represent numbers to a
finite precision
 Most important for real numbers integer math can be exact, but limited
 How do computers represent numbers?
 Binary representation of the integers and
real numbers in computer memory
32 bits (23, 8, 1)
28 = 256
128
38

smallest

.
100

00
(2)


0
.
14693

10


127
38

largest

.
111

11
(2)


0
.
18905

10

64 bits (52, 11, 1)
211 = 2048
smallest  .100 00(2)1024

largest
 .11111(2)1023
MATLAB uses double precision
Order of operation
Addition problem:
0.99  0.0044  0.0042  0.9986
exact result
with 3-digit arithmetic:
(0.99  0.0044)  0.0042 0.994 0.0042 0.998
0.99  (0.0044 0.0042)  0.99  0.0086 0.999
Round-off error
Cancellation error
x 2  bx  1  0
br
x1 
2
br
x2 
2
If b is large, r is close to b
Difference of two numbers very
close to each other  potential
for greater error!
r  b2  4
Rationalize:

b  r  b  r  b2  r 2
4
2
x2 



2 b  r  2b  r  2b  r  b  r
Try b = 97
x  97x  1  0
2
(r = 96.9794)
x2 (3 sig. figs.)
exact:
0.01031
standard:
0.01050
rationalized: 0.01031
Corresponding to “cancellation, critical arithmetic”
Significant Figures
48.9 mph? 48.95 mph?
Significant Digits




The places which can be used with confidence
32-bit machine: 7 significant digits
64-bit machine: 17 significant digits
Double precision: reduce round-off error, but
increase CPU time
  3.1415926535
8979323846
2643
2  1.4142135623
7310
e  2.7182818284
5904
False Significant Figures
3.25/1.96 = 1.65816326530162... (from MATLAB)
But in practice only report 1.65 (chopping) or
1.66 (rounding)! Why??
Because we don’t know what is beyond the
second decimal place
4082...
 3.259 / 1.960  1.6627551020
Chopping
1869...
 3.250 / 1.969  1.6505840528
7724...
 3.254 / 1.955  1.6644501278
Rounding
6558...
 3.245 / 1.964  1.6522403258
Accuracy and precision
 Accuracy - How closely a measured or computed
value agrees with the true value
 Precision - How closely individual measured or
computed values agree with each other
More
Accurate
More
Precise
 Accuracy is getting all your shots near the target.
 Precision is getting them close together.
Numerical Errors
The difference between the true value and
the approximation
Approximation = true value + true error
Et = true value  approximation = x*  x
True Error x *  x
Relative Error 

True Value
x*
or in percent
x * x
t 
 100 %
x*
Approximate Error
 But the true value is not known
 If we knew it, we wouldn’t have a problem
 Use approximate error
approximat
e e rror
a 
100%
approximat
ion
pre se ntapprox. pre viousapprox.

100%
pre se ntapproximat
ion
xnew  xold
Relative error 
 100%
xnew
Number Systems




Base-10 (Decimal): 0,1,2,3,4,5,6,7,8,9
Base-8 (Octal): 0,1,2,3,4,5,6,7
Base-2 (Binary): 0,1 – off/on, close/open,
negative/positive charge
Other non-decimal systems
 1 lb = 16 oz, 1 ft = 12 in, ½”, ¼”, …..
5 ,129  5  10 3  1  10 2  2  10 1  9  100
base  10 : 
1
2
3
4
0
.
3125

3

10

1

10

2

10

5

10

101101 1  2 5  0  2 4  1  2 3  1  2 2  0  2 1  1  2 0  45

base  2 : 
11
1
2
3
4
0
.
1011

1

2

0

2

1

2

1

2


16

Decimal
System
(base 10)
Binary
System
(base 2)
Integer Representation
Signed magnitude method
 Use the first bit of a word to indicate the
sign – 0: negative (off), 1: positive (on)
 Remaining bits are used to store a number
+
1 0 1 0
0 1
0
1 1
0
 


Sign
Number
off / on, close / open, negative / positive
Integer Representation

8-bit word

2
6
2
5
2
4
2
3
2
2
2
1
2
0
 
Sign
Number
 smallest number  0000000base2  0base10

largest number  1111111base2  127base10


+/- 0000000 are the same, therefore we may use
“-0” to represent “-128”
Total numbers = 28 = 256 (-128 127)
Integer Representation
16-bit word
1  214  1  213  .... 1  21  1  20  32,767
 Range: -32,768 to 32,767
 Overflow: > 32,767 (cannot represent 43,000
A&M students)
 Underflow: < -32,768 (magnitude too large)
32-bit word




Range: -2,147,483,648 to 2,147,483,647
9 significant digits
Overflow: world population 6 billion
Underflow: budget deficit -$100 billion
Integer Operations


Integer arithmetic can be exact as long as
you don't get remainders in division
7/2 = 3 in integer math


or overflow the maximum integer
For a 8-bit computer max = 128 (or -127)
 So 123 + 45 = overflow
 and -74 * 2 = underflow
Floating-Point Representation



Real numbers (also called floating-point
numbers) are represented differently
For fraction or very large numbers
Store as
sign signed exponent



mantissa
sign is 1 or 0 for negative or positive
exponent is maximum value (positive or
negative) of base
mantissa contains significant digits
Floating-Point Representation
e
m

 
  e1 e 2  em d 1 d 2 d 3  d p
sign of
number
signed exponent
mantissa
N   .d1 d 2 d 3 d p B  mB
e




e
m: mantissa
B: Base of the number system
e: “signed” exponent
Note: the mantissa is usually “normalized”
if the leading digit is zero
Integer representation
Floating-point number representation
Decimal Representation
 8-bit word
1
0
1
2
3
4
  10 10 10 10 10 10



sign
signed exponent
number
1|095|1467 (base: B = 10)
mantissa: m = -(1*10-1 + 4*10-2 + 6*10-3 + 7*10-4 ) = -0.1467
signed exponent: e = + (9*101 + 5*100) = 95
10951467base 10  mB  0.1467  10
e
95
Floating-Point Representation
 8-bit word (without normalization)

 2
1
2
0
2
1
2
2
2
3
2
4



sign
signed exponent
number
0|111|0101 (base: B = 2)
mantissa: m = +(0*2-1 + 1*2-2 + 0*2-3 + 1*2-4 ) = 5/16
signed exponent: e = - (1*21 + 1*20) = -3
10111001
base 2  mB  (5/16) 2
e
3
 5/128
Normalization
1 in 2  (1/144)ft 2  0.006944ft 2

2
2
2
1 in  0.694444 10 ft
(Less accurate)
(Normalization)
 Remove the leading zero by lowering the exponent
(d1 = 1 for all numbers)
1
m1
B
1

base  10 : 10  m  1  0.1  m  1

1
base  2 :
m1
2

 if m < 1/2, multiply by 2 to remove the leading 0
 floating-point allow fractions and very large numbers to
be represented, but take up more memory and CPU time
Binary Representation
 8-bit word (with normalization)

 2
1
2
0
2
1
2
2
2
3
2
4



sign
signed exponent
number
1|011|1001 (base: B = 2)
mantissa: m = -(1*2-1 + 0*2-2 + 0*2-3 + 1*2-4 ) = -9/16
signed exponent: e = + (1*21 + 1*20) = 3
10111001
base 2  mB  (9/16) 2  9/2
e
3
Single Precision
 A real variable (number) is stored in four words,
or 32 bits (64 bits for Supercomputers)
 bit (binary digit): 0 or 1
 byte: 4 bits, 24 = 16 possible values
 word: 2 bytes = 8 bits, 28 = 256 possible values
32 bits
23 for the digits
8 for the signed exponent
1 for the sign
 smallest  .10000(2)127  0.29387 1038

128
39
largest  .11111(2)  0.34028 10
Double Precision
 A real variable is stored in eight words, or 64 bits
 16 words, 128 bits for supercomputers
64 bits
52 for the digits
11 for the signed exponent
1 for the sign
 signed exponent  210 =  1024
smallest  .10000(2)

1024
largest  .11111(2)
1023
Round-off Errors
 Floating point characteristics contribute to round-off
error (limited bits for storage)
 Limited range of quantities can be represented
 A finite number of quantities can be represented
 The interval between numbers increases as the
numbers grow
 Example - three significant digits
0.0100 0.0101 0.0102 …… 0.0999 (0.0001 increment)
0.100
0.101
0.102 ……. 0.999
(0.001 increment)
1.00
1.01
1.02
(0.01 increment)
……. 9.99
MATLAB
 Finite number of real quantities (integers,
real numbers or text) can be represented
 For 8-bit, 28 = 256 quantities
 For 16-bit, 216 = 65536 quantities
 MATLAB uses double precision
 4 bytes = 64 bits
 more than 1019 (264) quantities