Real time image processing on-board satellites Heiko Schröder, 2003

Download Report

Transcript Real time image processing on-board satellites Heiko Schröder, 2003

Real time
image processing
on-board satellites
Heiko Schröder, 2003
The PPU
for X-SAT1 and beyond
Srikanthan Thambipillai
Timo Bretschneider
Tobias Trenschel
Ian McLoughlin
Doug Maskell
Wu Jigang
Heiko Schröder
© Heiko Schröder, 2003
Parallel image processing 2
RGB
2500nm
nm
1000
Multispectral
© Heiko Schröder, 2003
Parallel image processing 3
CHRIS
Multispectral
© Heiko Schröder, 2003
Parallel image processing 4
Hyperspectral
Precision farming!
© Heiko Schröder, 2003
Parallel image processing 5
Increasing Performance of X-SAT1
685 km
7 km/s 100 Mbit/sec
4000 s/orbit 400 Gbit/orbit
download: 4 Gbit/orbit
On-board image
analysis and
compression
Singapore
© Heiko Schröder, 2003
100 x output if
useful/useless<=1/100
100image
x value
Parallel
processing 6
Performance evaluation
P=C/A
•C -- Cost
•A --image area: Useful (can be sold)
X-SAT1 without PPU:
P=15,000,000/25,000/1200 = .5 $/km2
$12,500 for 50x500 km2  air plane
Useful? What are we looking for?
© Heiko Schröder, 2003
Parallel image processing 7
Our aim: High performance via COTS
16 processors (+ spares) off-the-shelf
connected via a
fault tolerant reconfigurable network
Mesh/torus
Real-time
© Heiko Schröder, 2003
Parallel image processing 8
PPU for X-SAT1
on-board
fault
tolerant
mesh
© Heiko Schröder, 2003
processors
Parallel image processing 9
Mesh with slow recovery
FPGA
BSP?
Instructions
to PEs
ctrl
h/v
o/e
r/w
Real-time
link to PE
© Heiko Schröder, 2003
Parallel image processing 10
Mesh with fast recovery
Not on X-SAT1
Instructions
to PEs
© Heiko Schröder, 2003
ctrl
h/v
o/e
r/w
Diagnostic
set switches
Parallel image processing 11
Available data (320 images) – search task
Output
Random
selection
U=1/5
Algorithms:
•Compression
•Classification
•Segmentation
Oil slicks, forest fires, red tide, settlements, …
© Heiko Schröder, 2003
Parallel image processing 12
The satellite efficiency cube
Compression
ratio (CR=4
loss-less)
Not likely
U=.8
U=64
U=4
LOSSY=60
U=32
U=.2
(0,0,0)
U=1
Classification gain
(CG=5, 1 in 5 images
contain useful information)
© Heiko Schröder, 2003
U=16
Segmentation gain
(SG=16, 1/16 ofParallel
a useful
image is useful)
image processing 13
Target
Mode
Search
Classification Segmentation
gain
gain
>100
>100
Total
gain
>100
Oil slick
Ships
Search
>10
>100
>100
Air pollution
Search/investigate
>10
>10
>100
Storms
Search/investigate
>100
<10
>100
Floods
Search/investigate
>100
<10
>100
Landslides
Search/investigate
>1000
>10
>100
Volcanic
Search/investigate
>100
>10
>100
Forrest fires
Search/investigate
>100
>10
>100
Assumption: Exhaust download capacity  PPU can
achieve price reduction by more than 2 orders of magnitude
$100 for 50x500 km2 image.
Enough useful data? – Customers?
© Heiko Schröder, 2003
Parallel image processing 14
What is a good algorithm?
Fast – real-time
Correct – low error rate
Classification:
Error 1:
Does not detect a good image
Error 2:
Flags a bad image as useful
© Heiko Schröder, 2003
Parallel image processing 15
Choice of Image Processing Routines:
Evaluation criteria (gain): G=UP/U
U – useful area/data received without PPU
UP – useful area/data received with PPU
Example: 1000 pictures can be taken, 10 pictures are good,
10 pictures can be downloaded  .1 picture without PPU
Algorithm A:
Real-time,
flags 50% of good images 5,
flags 1% of bad images 10,
 5*2/3=3.3
GA=33
© Heiko Schröder, 2003
Algorithm B:
¼ real-time,
flags 90% of good images 2.5x.9,
flags .1% of bad images 1,
 2.3
GB=23
Algorithm C:
Real-time,
flags 40% of good images 4,
flags .1% of bad images 1,
4
GC=40
Parallel image processing 16
1 HL
2 LL
3 HL
4 HL
LL
5 HL
6 HL
LL
7 HL
8
LL
LL
LL
1
9
17
25
33
41
49
57
LL
LL
HL
HL
LH
9 HH
10 LH
11 HH
12 HL
LH
13 HH
14 HL
LH
15 HH
16
LL
LL
LL
LL
LL
17 HL
18 LL
19 HL
20 HL
LL
21 HL
22 HL
LL
23 HL
24
LL
LL
HL
HL
LL
LL
HL
HL
LH
25 HH
26 LH
27 HH
28 LH
29 HH
30 LH
31 HH
32
LH
LL
33 LH
HL
34 LH
LL
35 LH
HL
36 HH
LL
37 HH
HL
38 HH
LL
39 HH
HL
40
2
10
18
26
34
42
50
58
3
11
19
27
35
43
51
59
4
12
20
28
36
44
52
60
5
13
21
29
37
45
53
61
6
14
22
30
38
46
54
62
7
15
23
31
39
47
55
63
8
16
24
32
40
48
56
64
LH
LH
LH
41 HH
42 LH
43 HH
44 HH
LH
45 HH
46 HH
LH
47 HH
48
LH
LL
49 LH
HL
50 LH
LL
51 LH
HL
52 HH
LL
53 HH
HL
54 HH
LL
55 HH
HL
56
LH
LH
LH
57 HH
58 LH
59 HH
60 HH
LH
61 HH
62 HH
LH
63 HH
64
1
3
2
4
© Heiko Schröder, 2003
L
H
LL HL
1+2
1-2
1+2+3+4 1+3-2-4
L
H
LH HH
3+4
3-4
1+2-3-4 1+4-2-3
Invertible!
+ /2 - /2
Parallel image processing 17
1+2
3+4
5+6
7+8
LL3
LL1
LL2
LL1
LL2
LL1
LL1
+9+10 HL3
+11+12 HL2
+13+14 HL2
+15+16 HL1 HL1 HL1 HL1
17+18 19+20 21+22 23+24
LH3
LL1
LL2
LL1
LL2
LL1
LL1
+25+26 HH3
+27+28 HL2
+29+30 HL2
+31+32 HL1 HL1 HL1 HL1
33+34 35+36 37+38 38+40
LH2
LL1
LL1
LL1
LL1
+41+42 LH2
+43+44 HH2
+24+46 HH2
+47+48 HL1 HL1 HL1 HL1
49+50 51+52 53+54 55+56
LH2
LL1
LL1
LL1
LL1
+57+58 LH2
+59+60 HH2
+61+62 HH2
+63+64 HL1 HL1 HL1 HL1
LH1 LH1 LH1 LH1 HH1HH1HH1HH1
LH1 LH1 LH1 LH1 HH1HH1HH1HH1
33+42 35+44
LH1 LH1 LH1 LH1 HH1HH1HH1HH1
-34-41 -36-43
49+58 51+60
LH1 LH1 LH1 LH1 HH1HH1HH1HH1
-50-57 -52-59
© Heiko Schröder, 2003
Parallel image processing 18
© Heiko Schröder, 2003
Parallel image processing 19
How to find areas of interest
Image classification
?
© Heiko Schröder, 2003
Parallel image processing 20
Thresholding
© Heiko Schröder, 2003
Parallel image processing 21
Mathematical morphology
Structural element
reference point
erosion
dilation
edge detection, thinning, noise removal, enlarging
© Heiko Schröder, 2003
Parallel image processing 22
Thresholding
© Heiko Schröder, 2003
MM-segmentation
Parallel image processing 23
MM-Hough Transform
erosion
m
reference point
d
d
m
a dot leads to one addition if there is a matching point
© Heiko Schröder, 2003
Parallel image processing 24
Investigative mode
72 sec
500km
1min
420km
36sec
250km
© Heiko Schröder, 2003
Parallel image processing 25
Search mode
7 min
2900 km
Follow coast
400 sec:
Storm: 3km
Ship: 5km
Fire: 50m
Air plane: 100km
30 sec
210km
© Heiko Schröder, 2003
Parallel image processing 26
200 km
2000 km
High-performance
Computer network
Image analysis
•Classification
•Segmentation
•compression
Intelligent search
Maximize the efficiency/
useful output
of the satellite!
© Heiko Schröder, 2003
Parallel image processing 27
skeletons
Compression and classification
© Heiko Schröder, 2003
Parallel image processing 28
diamond circles
3
1
5
2
2
13
© Heiko Schröder, 2003
Parallel image processing 29
odd square circles
• 8-neighbourhood skeleton
3
6
0
4
13
© Heiko Schröder, 2003
Parallel image processing 30
square circles
• red-square-skeleton
1
0
1
0
0
0
3
1
6
© Heiko Schröder, 2003
Parallel image processing 31
red square skeletons
• one-sweep algorithm to produce the red square skeleton
• wavefront --- ISA
new = min{W,NW,N}+1
© Heiko Schröder, 2003
Parallel image processing 32
granularity
• Histograms of skeletons classify images
© Heiko Schröder, 2003
Parallel image processing 33
segmentation
• locate objects
• partition image
• thinning
© Heiko Schröder, 2003
Parallel image processing 34
Mathematical morphology
Structural element
reference point
erosion
dilation
edge detection, thinning, noise removal, enlarging
© Heiko Schröder, 2003
Parallel image processing 35
Border following
8 1 2
7X3
6 5 4
23 3 3 3 3 3
1
1
1
8
4
5
6
5
5
1
8
6
8
6
Histogram:
#
7 6
Image classification
12345678
© Heiko Schröder, 2003
Parallel image processing 36
Image classification I
© Heiko Schröder, 2003
Parallel image processing 37
Image classification II
© Heiko Schröder, 2003
Parallel image processing 38
Image classification III
© Heiko Schröder, 2003
Parallel image processing 39
Hough transform
• good line detection method
m
d
d
m
every dot leads to M (1K) additions
© Heiko Schröder, 2003
Parallel image processing 40
MM-Hough transform
erosion
m
reference point
d
d
m
a dot leads to one addition if there is a matching point
© Heiko Schröder, 2003
Parallel image processing 41
HT on the ISA
Bertil
© Heiko Schröder, 2003
Parallel image processing 42
shearing
5
column sum
4
3
MM
OR
OR
AND
AND
d
OR
2
22/10
15/25/6
15/10
15/20
12/28/16
12/16
© Heiko Schröder, 2003
1
eliminates dirt
0
d : parameter
Parallel image processing 43
skewing
8
7
6
5
4
3
2
1
23/11
15/25/6
© Heiko Schröder, 2003
16/21
12/28/16
0
Parallel image processing 44
alternatives
8
7
OR
} }
}
6
AND
OR
5
OR
AND
OR
4
MM
OR
3
} }
}
AND
OR
2
OR
AND
OR
1
0
© Heiko Schröder, 2003
Parallel image processing 45
advantages of MM-HT
• higher contrast
• less additions
• more flexibility
– lines of given thickness
– dashed lines
– lines of given length
– lines of given orientation
– circles, …
• tomography !!
© Heiko Schröder, 2003
Parallel image processing 46
circles
© Heiko Schröder, 2003
Parallel image processing 47
Hough transform I
© Heiko Schröder, 2003
Parallel image processing 48
Hough transform II
© Heiko Schröder, 2003
Parallel image processing 49
robot vision
• stereo vision
CCD
projector
CCD
thinning (skeletons or erosion), line detection (MM-HT),
trigonometry
© Heiko Schröder, 2003
Parallel image processing 50
Design a mathematical morphology algorithm (and demonstrate by means of example), that removes all isolated patterns of
size 2 (black black on white and white on black). It does not change any set of 3 neighbouring pixels with identical colour.
Write an algorithm that removes all squares of maximal size from a given image.
Write a program based on MM, that fills gaps in horizontal and vertical lines up to length 2, but does not prolong the ends of
lines.
© Heiko Schröder, 2003
Parallel image processing 52
Application specific
massive parallelism
Low cost alternatives
to polygons for
visualisation
contents
•scan-line image processing
•PIPS architecture
•from landscapes to 3D
•surface generation for CAD
© Heiko Schröder, 2003
Parallel image processing 54
basic architecture
1
high
resolution
real time
1024
© Heiko Schröder, 2003
Parallel image processing 55
PIPS (1990-94)
1M
bit
1M
bit
memory control
32x32 torus
16 bit parallel
communication
16 bit add
prefetch
BHP -- CSIRO -- NU -- ADFA 1.4 M
© Heiko Schröder, 2003
Parallel image processing 56
elementary operations
compress
horizontal
shear
vertical
© Heiko Schröder, 2003
shear
Parallel image processing 57
scan-line image processing
2x
shear
transpose
height
colour
© Heiko Schröder, 2003
rotate
P. Robertson 1986
A. Spray 1990-4
Parallel image processing 58
horizontal projection
f
x
a
h
m
y
x = f ((m-y) cos(a) + h sin(a) ) / ( f + m sin(a) - h cos(a))
© Heiko Schröder, 2003
Parallel image processing 59
transpose algorithm
• transpose:
1 diagonal/step
1024 steps
1step:
1 read
1 move (PE-PE)
1 write
© Heiko Schröder, 2003
Parallel image processing 60
HC/torus
diameter / bandwidth
4 bit wide
1024 nodes
12
56
Diameter 32
Diameter 10
56*4= 224
12*16=192
16 bit wide
© Heiko Schröder, 2003
Parallel image processing 61
tailored towards transpose
• transpose operation ---> torus
alternatives: hypercube, linear array,
hypercubic networks.
• off-the-shelf SRAM memory chips
determined performance.
• transpose:
read pixel, move pixel, write pixel.
• average distance of 32x32 torus is 16.
read and a write take 8 cycles each.
© Heiko Schröder, 2003
1M
bit
1M
bit
memory control
Parallel image processing 62
tailored towards interpolation
• Interpolation is the most frequent operation.
• linear interpolation (nearest neighbour, spline)
1 multiplication and 2 additions per pixel
(18 cycles)
• overlapping arithmetic and memory access
(prefetch 16 cycles)
y
h1
h2
y = h1 + (h2-h1)d
d
© Heiko Schröder, 2003
Parallel image processing 63
performance
• 1 perspective view (464 K)
– 1 rotation (170 K)
3 shears (3x36 K)
2 transpose (2x 31 K)
– 1 compress (41 K)
– 1 projection (219 K)
– 1 image output (34 K)
• 464 K x 50ns = 23.2 ms
• 43 frames / sec
© Heiko Schröder, 2003
Parallel image processing 64
performance parameters
• 20 GOPS (16 bit words)
• IEEE standard 32bit floating point:
– 200 instructions / floating point operation
– 100 MFLOPS / 1024 PEs
• fast floating point:
– 80 instructions / floating point operation
– 250 MFLOPS / 1024 PEs
• 5 Gbytes/s internal memory bandwidth
• 40 Gbytes/s inter-processor communication
© Heiko Schröder, 2003
Parallel image processing 65
performance
Main criterion: high throughput
machine A: price PA, time k tB
machine B: price k PA, time tB
k A-machines cost and produce
as much as one B-machine
evaluation criterion: Cost x time
(cost x period; AT; AP)
© Heiko Schröder, 2003
Parallel image processing 66
cost-performance
SUN
Time
cost
2 min
3K
360
100 K
500
MasPar 5 sec
1024
PIPS
1024
1/40 sec 20 K
© Heiko Schröder, 2003
cost x Time
1/2
Parallel image processing 67
from landscapes to 3D
© Heiko Schröder, 2003
Parallel image processing 68
Unsolved problems:
•partitioning 3D surfaces
into landscapes
•detail on demand
•target architecture
(distributed & parallel)
•...
© Heiko Schröder, 2003
Parallel image processing 69
partitioning
the surfaces
into pieces of
landscapes
Bez, May, Schroeder
© Heiko Schröder, 2003
Parallel image processing 70
partitioning algorithms
• fixed set of observer points?
• how many?
• observer position data dependent?
© Heiko Schröder, 2003
Parallel image processing 71
Detail on demand
1/256
1/64
1/16
1/4
1/1
© Heiko Schröder, 2003
Parallel image processing 72
1/16
1/4
1/16
1/64
1/256
© Heiko Schröder, 2003
Parallel image processing 73
Levels of resolution
1
ai  i , i  0,1,2,...
4
1
4
 ai 

3
1 1
4
Provide image at various levels of resolution
© Heiko Schröder, 2003
Parallel image processing 74
detail on demand
wavelet transform ?
•all data should be kept at several levels of resolution
R. Lang, P. Lenders, H. Schroeder (1995/6)
© Heiko Schröder, 2003
Parallel image processing 75
Wavelet transform
(simplified)
d11d12d 21d 22 ai 
d13a1 d 23a2 d ij 
d 31d 32d 41d 42
d 33a3 d 43a4
Low-pass-filter
High-pass-filter
xij  ai  d ij , j  1,2,3
3
xi 4  ai   d ik
k 1
Easy reconstruction!
© Heiko Schröder, 2003
Parallel image processing 76
Spiral (Rein Warmels)
© Heiko Schröder, 2003
Parallel image processing 77
Butterfly network for FFT
FFT
frequency spectrum
image classification
CM2
© Heiko Schröder, 2003
Parallel image processing 78
FFT without butterfly
1 3 5 7 9 11 13 15
2 4 6 8 10 12 14 16
1 2 5 6 9 10 13 14
3 4 7 8 11 12 15 16
1 3 2 4 9 11 10 12
5 7 6 8 13 15 14 16
1 5 3 7 2 6 4 8
9 13 11 15 10 14 12 16
© Heiko Schröder, 2003
Parallel image processing 79
target architecture ?
•
•
•
•
•
•
PCs: How many?
ISAs ! -- with every PC?
partitioning the screen amongst ISAs
distribution of data over PCs
ATM switch: PVM/MPI --- BSP ?
optical communication? Edinburgh? Jena?
© Heiko Schröder, 2003
Parallel image processing 80
visualise what?
•landscapes
•gallery of the future
•physical data
•simulations
•medical images
•CAD
© Heiko Schröder, 2003
Parallel image processing 81
CAD
control points:
+
+
9/16
3/16
+
+
+
+
+
+
+
+
+
+
+
+
+ 3/16 +
+
+
+
+
+
© Heiko Schröder, 2003
+
Catmul & Clark (78)
+
Pham & Schröder (89)
+1/16
+
Parallel image processing 82
(9A + 3C + 3D + B)/16
A
D
+
+
+
C
9 add-shift
9/4 per control point
3 per pixel
© Heiko Schröder, 2003
+
B
+
Parallel image processing 83
move algorithms ?
• routing algorithm (warping)?
–hot potato? (Kaufmann, Schröder 94)
• warping “cheaper” than general
routing?
© Heiko Schröder, 2003
Parallel image processing 84
scan-lines ?
• tiling? -- no transpose!
• hidden surface removal via
“z”-value
© Heiko Schröder, 2003
Parallel image processing 85
low cost, real-time,
high resolution
visualisation
can be done !!!