Transcript class08.ppt

15-213
“The course that gives CMU its Zip!”
Machine-Level Programming IV:
Structured Data
Feb 4, 2003
• Topics
• Arrays
• Structs
• Unions
class08.ppt
Basic Data Types
Integral
 Stored
& operated on in general registers
 Signed vs. unsigned depends on instructions used
Intel
byte
word
double word
GAS
b
w
l
Bytes
1
2
4
C
[unsigned] char
[unsigned] short
[unsigned] int
Floating Point
 Stored
& operated on in floating point registers
Intel
Single
Double
Extended
–2–
GAS
s
l
t
Bytes
4
8
10/12
C
float
double
long double
15-213, S’03
Array Allocation
Basic Principle
T A[L];
 Array of data type T and length L
 Contiguously allocated region of L * sizeof(T) bytes
char string[12];
x
x + 12
int val[5];
x
double a[4];
x
x+4
x+8
x+8
x + 16
x + 12
x + 16
x + 24
x + 20
x + 32
char *p[3];
x
–3–
x+4
x+8
15-213, S’03
Array Access
Basic Principle
T A[L];
 Array of data type T and length L
 Identifier A can be used as a pointer to array element 0
int val[5];
1
x
5
2
1
3
x+4
x+8
x + 12
x + 16
Reference Type
Value
val[4]
val
val+1
&val[2]
val[5]
*(val+1)
val + i
–4–
3
x
x+4
x+8
??
5
x+4i
int
int
int
int
int
int
int
*
*
*
*
x + 20
15-213, S’03
Array Example
typedef int zip_dig[5];
zip_dig cmu = { 1, 5, 2, 1, 3 };
zip_dig mit = { 0, 2, 1, 3, 9 };
zip_dig ucb = { 9, 4, 7, 2, 0 };
zip_dig cmu;
1
16
zip_dig mit;
20
0
36
zip_dig ucb;
 Declaration
 Example
2
24
2
40
9
56
Notes
5
28
1
44
4
60
1
32
3
48
7
64
3
9
52
2
68
36
56
0
72
76
“zip_dig cmu” equivalent to “int cmu[5]”
arrays were alloced in successive 20 byte blocks
 Not guaranteed to happen in general
–5–
15-213, S’03
Array Accessing Example
Computation




int get_digit(zip_dig z, int dig)
{
return z[dig];
}
Register %edx contains starting address of array
Register %eax contains array index
Desired digit at 4*%eax + %edx
Use memory reference (%edx,%eax,4)
Memory Reference Code
# %edx = z
# %eax = dig
movl (%edx,%eax,4),%eax
–6–
# z[dig]
15-213, S’03
Referencing Examples
zip_dig cmu;
1
16
zip_dig mit;
5
20
0
36
zip_dig ucb;
24
2
40
9
56
2
1
4
3
28
44
60
1
32
3
9
48
7
64
36
52
2
56
0
68
72
76
Code Does Not Do Any Bounds Checking!
Reference
mit[3]
mit[5]
mit[-1]
cmu[15]

Address
36
36
36
16
+
+
+
+
4* 3
4* 5
4*-1
4*15
Value
=
=
=
=
48
56
32
76
3
9
3
??
Guaranteed?
Yes
No
No
No
Out of range behavior implementation-dependent
 No guaranteed relative allocation of different arrays
–7–
15-213, S’03
Array Loop Exampleint zd2int(zip_dig z)
{
int i;
int zi
for (i
zi =
}
return
Original Source
 How
do we implement this?
 Can we improve it?
First step, convert to do-while
Next?
–8–
= 0;
= 0; i < 5; i++) {
10 * zi + z[i];
zi;
}
int zd2int(zip_dig z)
{
int i;
int zi = 0;
i = 0;
if (i < 5) {
do {
zi = 10 * zi + z[i];
i++;
} while (i < 5);
}
return zi;
}
15-213, S’03
Array Loop Example – convert to ptr
*(z+i);
z[i]  *(z+i)
Can we further improve this?
(hint: what does i do?)
i
0
(z+i) z
1
z+1
2
z+2
3
z+3
int zd2int(zip_dig z)
{
int i;
int zi = 0;
i = 0;
if (i < 5) {
do {
zi = 10 * zi + z[i];
i++;
} while (i < 5);
}
return zi;
}
4
z+4
5
z+5
Do we need z+i?
–9–
15-213, S’03
Array Loop Example – optimize
int zd2int(zip_dig z)
{
int i;
int zi = 0;
i = 0;
zend = z+5;
if (i < 5) {
if (z < zend) {
do {
zi = 10 * zi + *(z++);
i++;
while (z < zend);
} while (i < 5);
}
return zi;
}

i
0
(z+i) z
z++
z
1
z+1
z+1
2
z+2
z+2
3
z+3
z+3
4
z+4
z+4
5
z+5
z+5
Do we need i?
– 10 –
15-213, S’03
Array Loop Example – optimize
Can I do anything else?
int zd2int(zip_dig z)
{
int* zend;
int zi = 0;
zend = z+5;
if (z < zend) {
do {
zi = 10 * zi + *(z++);
} while (z < zend);
}
return zi;
}
– 11 –
15-213, S’03
Array Loop Example
Original Source
Transformed Version
 As
generated by GCC
 Express in do-while form
No need to test at entrance
 Convert
array code to
pointer code
 Eliminate loop variable i
– 12 –
int zd2int(zip_dig z)
{
int i;
int zi = 0;
for (i = 0; i < 5; i++) {
zi = 10 * zi + z[i];
}
return zi;
}
int zd2int(zip_dig z)
{
int zi = 0;
int *zend = z + 4;
do {
zi = 10 * zi + *z;
z++;
} while(z <= zend);
return zi;
}
15-213, S’03
Array Loop Implementation
Registers
%ecx z
%eax zi
%ebx zend
Computations
10*zi + *z implemented as
*z + 2*(zi+4*zi)
 z++ increments by 4

# %ecx = z
xorl %eax,%eax
leal 16(%ecx),%ebx
.L59:
leal (%eax,%eax,4),%edx
movl (%ecx),%eax
addl $4,%ecx
leal (%eax,%edx,2),%eax
cmpl %ebx,%ecx
jle .L59
– 13 –
int zd2int(zip_dig z)
{
int zi = 0;
int *zend = z + 4;
do {
zi = 10 * zi + *z;
z++;
} while(z <= zend);
return zi;
}
# zi = 0
# zend = z+4
#
#
#
#
#
#
5*zi
*z
z++
zi = *z + 2*(5*zi)
z : zend
if <= goto loop
15-213, S’03
Nested Array Example
#define PCOUNT 4
zip_dig pgh[PCOUNT] =
{{1, 5, 2, 0, 6},
{1, 5, 2, 1, 3 },
{1, 5, 2, 1, 7 },
{1, 5, 2, 2, 1 }};
zip_dig
pgh[4];
1 5 2 0 6 1 5 2 1 3 1 5 2 1 7 1 5 2 2 1
76
96
116
136
156
Declaration

“zip_dig pgh[4]”  “int pgh[4][5]”
Variable pgh denotes array of 4 elements
» Allocated contiguously
 Each element is an array of 5
int’s
» Allocated contiguously
“Row-Major”
– 14 –
ordering of all elements guaranteed
15-213, S’03
Nested Array Allocation
Declaration
T A[R][C];
 Array of data type T
 R rows, C columns
 Type T element needs K bytes
A[0][0]
•
•
•
A[0][C-1]
•
•
•
A[R-1][0] • • • A[R-1][C-1]
Array Size

• • •
R * C * K bytes
Arrangement

Row-Major Ordering
int A[R][C];
A
A
A
A
[0] • • • [0] [1] • • • [1]
[0]
[C-1] [0]
[C-1]
•
•
•
A
A
[R-1] • • • [R-1]
[0]
[C-1]
4*R*C Bytes
– 15 –
15-213, S’03
Nested Array Row Access
Row Vectors

A[i] is array of C elements
Each element of type T
 Starting address A + i * C * K

int A[R][C];
A[0]
A
[0]
[0]
A
– 16 –
A
[0] • • •
•••
[C-1]
A[i]
A
[i]
[0]
A[R-1]
A
A
A
[i] • • • [R-1] • • • [R-1]
•••
[C-1]
[0]
[C-1]
A+i*C*4
A+(R-1)*C*4
15-213, S’03
Nested Array Row Access Code
int *get_pgh_zip(int index)
{
return pgh[index];
}
Row Vector
pgh[index] is array of 5 int’s
 Starting address pgh+20*index

Code
Computes and returns address
 Compute as pgh + 4*(index+4*index)

# %eax = index
leal (%eax,%eax,4),%eax
leal pgh(,%eax,4),%eax
– 17 –
# 5 * index
# pgh + (20 * index)
15-213, S’03
Nested Array Element Access
Array Elements
A[i][j] is element of type T
 Address A + (i * C + j) * K

A
[i]
[j]
int A[R][C];
A
[0]
[0]
A
A[0]
A[i]
A
[0] • • •
•••
[C-1]
A
[i]
[j]
•••
A[R-1]
•••
A+i*C*4
A
A
• • • [R-1] • • • [R-1]
[0]
[C-1]
A+(R-1)*C*4
A+(i*C+j)*4
– 18 –
15-213, S’03
Nested Array Element Access Code
Array Elements
pgh[index][dig] is int
 Address:

pgh + 20*index + 4*dig
Code
 Computes
int get_pgh_digit
(int index, int dig)
{
return pgh[index][dig];
}
address
pgh + 4*dig + 4*(index+4*index)
 movl
performs memory reference
# %ecx = dig
# %eax = index
leal 0(,%ecx,4),%edx
leal (%eax,%eax,4),%eax
movl pgh(%edx,%eax,4),%eax
– 19 –
# 4*dig
# 5*index
# *(pgh + 4*dig + 20*index)
15-213, S’03
Strange Referencing Examples
zip_dig
pgh[4];
1 5 2 0 6 1 5 2 1 3 1 5 2 1 7 1 5 2 2 1
76
96
Reference Address
pgh[3][3]
pgh[2][5]
pgh[2][-1]
pgh[4][-1]
pgh[0][19]
76+20*3+4*3 = 148
76+20*2+4*5 = 136
76+20*2+4*-1 = 112
76+20*4+4*-1 = 152
76+20*0+4*19 = 152
pgh[0][-1] 76+20*0+4*-1 = 72
116
136
156
Value Guaranteed?
2
1
3
1
1
Yes
??
No
Yes
Yes
Yes
Yes
 Code
does not do any bounds checking
 Ordering of elements within array is guaranteed
– 20 –
15-213, S’03
Multi-Level Array Example
 Variable
univ denotes
array of 3 elements
 Each elem is a pointer
zip_dig cmu = { 1, 5, 2, 1, 3 };
zip_dig mit = { 0, 2, 1, 3, 9 };
zip_dig ucb = { 9, 4, 7, 2, 0 };
 Each
#define UCOUNT 3
int *univ[UCOUNT] = {mit, cmu, ucb};
 4 bytes
pointer points to
an array of int’s
cmu
univ
160
36
164
16
168
56
mit
1
16
20
0
ucb 36
56
– 21 –
5
2
24
2
40
9
28
1
44
4
60
1
32
3
48
7
64
3
9
52
2
68
36
56
0
72
76
15-213, S’03
Element Access in Multi-Level Array
Computation


int get_univ_digit(int index, int dig)
{
return univ[index][dig];
}
Element access
Mem[Mem[univ+4*index]+4*dig]
Must do two memory reads
 First get pointer to row array
 Then access element within array
leal 0(,%ecx,4),%edx
movl univ(%edx),%edx
movl (%edx,%eax,4),%eax
– 22 –
#
#
#
#
#
%ecx = index
%eax = dig
4*index
Mem[univ+4*index]
Mem[...+4*dig]
15-213, S’03
Array Element Accesses
Syntax is the same, computation is different!
Nested Array
Multi-Level Array
int get_pgh_digit
(int index, int dig)
{
return pgh[index][dig];
}
int get_univ_digit
(int index, int dig)
{
return univ[index][dig];
}
Element at
Element at
Mem[pgh+20*index+4*dig]
Mem[Mem[univ+4*index]+4*di
g]
cmu
1
5
2
1
3
univ
1 5 2 0 6 1 5 2 1 3 1 5 2 1 7 1 5 2 2 1
76
96
116
136
156
160
36
164
16
168
56
mit
16
ucb 36
56
– 23 –
20
0
24
2
40
9
44
4
60
28
1
48
7
64
32
3
52
2
68
36
9
56
0
72
76
15-213, S’03
Strange Referencing Examples
cmu
univ
160
36
164
16
168
56
mit
1
16
20
0
ucb 36
univ[2][3]
univ[1][5]
univ[2][-1]
univ[3][-1]
univ[1][12]

– 24 –

56+4*3
16+4*5
56+4*-1
??
16+4*12
2
9
Address
2
24
40
56
Reference
5
1
4
64
= 64
2
0
9
??
7
3
32
3
48
7
Value
= 68
= 36
= 52
28
44
60
1
9
52
2
68
36
56
0
72
76
Guaranteed?
Yes
No
No
No
No
Code does not do any bounds checking
Ordering of elements in different arrays not guaranteed
15-213, S’03
Using Nested Arrays
Strengths
C
compiler handles doubly
subscripted arrays
 Generates very efficient
code
 Avoids multiply in index
computation
Limitation
 Only
works if have fixed
array size
(*,k)
#define N 16
typedef int fix_matrix[N][N];
/* Compute element i,k of
fixed matrix product */
int fix_prod_ele
(fix_matrix a, fix_matrix b,
int i, int k)
{
int j;
int result = 0;
for (j = 0; j < N; j++)
result += a[i][j]*b[j][k];
return result;
}
(i,*)
Row-wise
A
B
Column-wise
– 25 –
15-213, S’03
Dynamic Nested Arrays
Strength
 Can create matrix of
arbitrary size
Programming
 Must do index computation
explicitly
Performance
 Accessing single element
costly
 Must do multiplication
movl 12(%ebp),%eax
movl 8(%ebp),%edx
imull 20(%ebp),%eax
addl 16(%ebp),%eax
movl (%edx,%eax,4),%eax
– 26 –
int * new_var_matrix(int n)
{
return (int *)
calloc(sizeof(int), n*n);
}
int var_ele
(int *a, int i,
int j, int n)
{
return a[i*n+j];
}
#
#
#
#
#
i
a
n*i
n*i+j
Mem[a+4*(i*n+j)]
15-213, S’03
Dynamic Array Multiplication
Without Optimizations
 Multiplies
 2 for subscripts
 1 for data
 Adds
 4 for array indexing
 1 for loop index
 1 for data
(*,k)
/* Compute element i,k of
variable matrix product */
int var_prod_ele
(int *a, int *b,
int i, int k, int n)
{
int j;
int result = 0;
for (j = 0; j < n; j++)
result +=
a[i*n+j] * b[j*n+k];
return result;
}
(i,*)
Row-wise
A
Column-wise
– 27 –
B
Can we optimize this?
15-213, S’03
Optimizing Dynamic Array Mult
/* Compute element i,k of
variable matrix product */
int var_prod_ele(int *a, int *b, int i, int k, int n)
{
int j;
int result = 0;
for (j = 0; j < n; j++)
result +=
a[i*n+j] * b[j*n+k];
return result;
}
iter
– 28 –
0
1
2
3
a index
i*n
i*n+4
i*n+8
i*n+12
b index
k
n+k
2*n+k
3*n+k
15-213, S’03
Optimizing Dynamic Array Mult
/* Compute element i,k of
variable matrix product */
int var_prod_ele(int *a, int *b, int i, int k, int n)
{
int j;
int result = 0;
for (j = 0; j < n; j++)
result +=
a[i*n+j] * b[j*n+k];
return result;
}
iter
– 29 –
0
1
2
3
a index
i*n
i*n+4
i*n+8
i*n+12
b index
k
n+k
2*n+k
3*n+k
15-213, S’03
Invariant Code Motion
/* Compute element i,k of
variable matrix product */
int var_prod_ele(int *a, int *b, int i, int k, int n)
{
int j;
int iTn = i*n;
int result = 0;
for (j = 0; j < n; j++)
iTn result +=
a[i*n+j] * b[j*n+k];
return result;
}
iter
– 30 –
0
1
2
3
a index
i*n
i*n+4
i*n+8
i*n+12
b index
k
n+k
2*n+k
3*n+k
15-213, S’03
Invariant Code Motion
/* Compute element i,k of
variable matrix product */
int var_prod_ele(int *a, int *b, int i, int k, int n)
{
int j;
int result = 0;
Anything else?
int iTn = i * n;
for (j = 0; j < n; j++)
result +=
a[iTn+j] * b[j*n+k];
return result;
}
iter
– 31 –
0
1
2
3
a index
i*n
i*n+4
i*n+8
i*n+12
b index
k
n+k
2*n+k
3*n+k
15-213, S’03
Induction Var + Strength Reduciton
/* Compute element i,k of
variable matrix product */
int var_prod_ele(int *a, int *b, int i, int k, int n)
{
int j;
int result = 0;
int jTnPk = k;
int iTn = i * n;
for (j = 0; j < n; j++)
jTnPk
result +=
a[iTn+j] * b[j*n+k]; jTnPk += n;
return result;
}
iter
– 32 –
0
1
2
3
a index
i*n
i*n+4
i*n+8
i*n+12
b index
k
n+k
2*n+k
3*n+k
15-213, S’03
Optimizing Dynamic Array Mult.
Optimizations
{
int j;
int result = 0;
for (j = 0; j < n; j++)
result +=
a[i*n+j] * b[j*n+k];
return result;
 Performed
when set
optimization level to -O2
Code Motion
 Expression
i*n can be
computed outside loop
}
{
int j;
int result = 0;
int iTn = i*n;
int jTnPk = k;
for (j = 0; j < n; j++) {
result +=
a[iTn+j] * b[jTnPk];
jTnPk += n;
}
return result;
Strength Reduction
 Incrementing
j has
effect of incrementing
j*n+k by n
Performance
 Compiler
– 33 –
can optimize
regular access patterns
}
15-213, S’03
Structures
Concept



Contiguously-allocated region of memory
Refer to members within structure by names
Members may be of different types
Memory Layout
struct rec {
int i;
int a[3];
int *p;
};
i
0
a
p
16 20
4
Accessing Structure Member
void
set_i(struct rec *r,
int val)
{
r->i = val;
}
– 34 –
Assembly
# %eax = val
# %edx = r
movl %eax,(%edx)
# Mem[r] = val
15-213, S’03
Generating Ptr to Structure Member
r
struct rec {
int i;
int a[3];
int *p;
};
Generating Pointer to
Array Element

Offset of each
structure member
determined at compile
time
i
0
a
p
4
16
r + 4 + 4*idx
int *
find_a
(struct rec *r, int idx)
{
return &r->a[idx];
}
# %ecx = idx
# %edx = r
leal 0(,%ecx,4),%eax
# 4*idx
leal 4(%eax,%edx),%eax # r+4*idx+4
– 35 –
15-213, S’03
Structure Referencing (Cont.)
C Code
struct rec {
int i;
int a[3];
int *p;
};
void
set_p(struct rec *r)
{
r->p =
&r->a[r->i];
}
– 36 –
i
0
a
4
i
0
p
16
a
4
16
Element i
# %edx = r
movl (%edx),%ecx
leal 0(,%ecx,4),%eax
leal 4(%edx,%eax),%eax
movl %eax,16(%edx)
#
#
#
#
r->i
4*(r->i)
r+4+4*(r->i)
Update r->p
15-213, S’03
Alignment
Aligned Data
 Primitive
data type requires K bytes
 Address must be multiple of K
 Required
on some machines; advised on IA32
 treated differently by Linux and Windows!
Motivation for Aligning Data
 Memory
accessed by (aligned) double or quad-words
 Inefficient to load or store datum that spans quad word
boundaries
 Virtual memory very tricky when datum spans 2 pages
Compiler
 Inserts
fields
– 37 –
gaps in structure to ensure correct alignment of
15-213, S’03
Specific Cases of Alignment
Size of Primitive Data Type:

1 byte (e.g., char)
 no restrictions on address

2 bytes (e.g., short)
 lowest 1 bit of address must be 02

4 bytes (e.g., int, float, char *, etc.)
 lowest 2 bits of address must be 002

8 bytes (e.g., double)
 Windows (and most other OS’s & instruction sets):
» lowest 3 bits of address must be 0002
 Linux:
» lowest 2 bits of address must be 002
» i.e., treated the same as a 4-byte primitive data type

12 bytes (long double)
 Linux:
» lowest 2 bits of address must be 002
» i.e., treated the same as a 4-byte primitive data type
– 38 –
15-213, S’03
Satisfying Alignment in Structures
Offsets Within Structure

Must satisfy element’s alignment requirement
Overall Structure Placement

Each structure has alignment requirement K
 Largest alignment of any element

struct S1 {
char c;
int i[2];
double v;
} *p;
Initial address & structure length must be
multiples of K
Example (under Windows):

K = 8, due to double element
c
p+0
i[0]
p+4
Multiple of 4
Multiple of 8
– 39 –
i[1]
p+8
v
p+16
p+24
Multiple of 8
Multiple of 8
15-213, S’03
Linux vs. Windows
struct S1 {
char c;
int i[2];
double v;
} *p;
Windows (including Cygwin):

K = 8, due to double element
c
p+0
i[0]
p+4
i[1]
v
p+8
p+16
Multiple of 4
Multiple of 8
Multiple of 8
p+24
Multiple of 8
Linux:

K = 4; double treated like a 4-byte data type
c
p+0
i[0]
p+4
Multiple of 4
Multiple of 4
– 40 –
i[1]
p+8
v
p+12
Multiple of 4
p+20
Multiple of 4
15-213, S’03
Overall Alignment Requirement
struct S2 {
double x;
int i[2];
char c;
} *p;
p must be multiple of:
8 for Windows
4 for Linux
x
i[0]
p+0
p+8
struct S3 {
float x[2];
int i[2];
char c;
} *p;
x[0]
p+0
– 41 –
p+12
c
p+16
Windows: p+24
Linux: p+20
p must be multiple of 4 (in either OS)
x[1]
p+4
i[1]
i[0]
p+8
i[1]
p+12
c
p+16
p+20
15-213, S’03
Ordering Elements Within Structure
struct S4 {
char c1;
double v;
char c2;
int i;
} *p;
10 bytes wasted space in Windows
c1
v
p+0
p+8
struct S5 {
double v;
char c1;
char c2;
int i;
} *p;
v
p+0
– 42 –
c2
p+16
i
p+20
p+24
2 bytes wasted space
c1 c2
p+8
i
p+12
p+16
15-213, S’03
Arrays of Structures
Principle


Allocated by repeating allocation for
array type
In general, may nest arrays &
structures to arbitrary depth
a[1].i
a[1].v
a+12
a+16
a[0]
a+0
– 43 –
a[1].j
a+20
a[1]
a+12
struct S6 {
short i;
float v;
short j;
} a[10];
a+24
•••
a[2]
a+24
a+36
15-213, S’03
Accessing Element within Array

Compute offset to start of structure
 Compute 12*i as 4*(i+2i)

struct S6 {
short i;
float v;
short j;
} a[10];
Access element according to its offset
within structure
 Offset by 8
 Assembler gives displacement as a + 8
» Linker must set actual value
short get_j(int idx)
{
return a[idx].j;
}
a[0]
a+0
a[i].i
a+12i
– 44 –
# %eax = idx
leal (%eax,%eax,2),%eax # 3*idx
movswl a+8(,%eax,4),%eax
• • •
a[i]
• • •
a+12i
a[i].v
a[i].j
a+12i+8
15-213, S’03
Satisfying Alignment within Structure
Achieving Alignment

Starting address of structure array must be
multiple of worst-case alignment for any element
struct S6 {
short i;
float v;
short j;
} a[10];
 a must be multiple of 4

Offset of element within structure must be
multiple of element’s alignment requirement
 v’s offset of 4 is a multiple of 4

Overall size of structure must be multiple of worstcase alignment for any element
 Structure padded with unused space to be 12 bytes
a[0]
• • •
a[i]
a+12i
a+0
a[1].i
Multiple of 4
a+12i
– 45 –
• • •
a[1].v
a[1].j
a+12i+4
Multiple of 4
15-213, S’03
Union Allocation
Principles
 Overlay union elements
 Allocate according to largest element
 Can only use one field at a time
struct S1 {
char c;
int i[2];
double v;
} *sp;
c
sp+0
– 46 –
sp+4
union U1 {
char c;
int i[2];
double v;
} *up;
c
i[0]
up+0
i[1]
v
up+4
up+8
(Windows alignment)
i[0]
sp+8
i[1]
v
sp+16
sp+24
15-213, S’03
Using Union to Access Bit Patterns
typedef union {
float f;
unsigned u;
} bit_float_t;
u
f
0
4
float bit2float(unsigned u)
{
bit_float_t arg;
arg.u = u;
return arg.f;
}
unsigned float2bit(float f)
{
bit_float_t arg;
arg.f = f;
return arg.u;
}
Get direct access to bit representation of float
 bit2float generates float with given bit pattern
 NOT the same as (float) u


– 47 –
float2bit generates bit pattern from float
 NOT the same as (unsigned) f
15-213, S’03
Byte Ordering Revisited
Idea



Short/long/quad words stored in memory as 2/4/8
consecutive bytes
Which is most (least) significant?
Can cause problems when exchanging binary data between
machines
Big Endian


Most significant byte has lowest address
PowerPC, Sparc
Little Endian


– 48 –
Least significant byte has lowest address
Intel x86, Alpha
15-213, S’03
Byte Ordering Example
union {
unsigned
unsigned
unsigned
unsigned
} dw;
char c[8];
short s[4];
int i[2];
long l[1];
c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7]
s[0]
s[1]
i[0]
s[2]
s[3]
i[1]
l[0]
– 49 –
15-213, S’03
Byte Ordering Example (Cont).
int j;
for (j = 0; j < 8; j++)
dw.c[j] = 0xf0 + j;
printf("Characters 0-7 ==
[0x%x,0x%x,0x%x,0x%x,0x%x,0x%x,0x%x,0x%x]\n",
dw.c[0], dw.c[1], dw.c[2], dw.c[3],
dw.c[4], dw.c[5], dw.c[6], dw.c[7]);
printf("Shorts 0-3 ==
[0x%x,0x%x,0x%x,0x%x]\n",
dw.s[0], dw.s[1], dw.s[2], dw.s[3]);
printf("Ints 0-1 == [0x%x,0x%x]\n",
dw.i[0], dw.i[1]);
printf("Long 0 == [0x%lx]\n",
dw.l[0]);
– 50 –
15-213, S’03
Byte Ordering on x86
Little Endian
f0
f1
f2
f3
f4
f5
f6
f7
c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7]
LSB
MSB
LSB
s[0]
MSB
LSB
s[1]
LSB
LSB
s[2]
MSB
MSB
s[3]
LSB
i[0]
LSB
MSB
MSB
i[1]
MSB
l[0]
Print
Output on Pentium:
Characters
Shorts
Ints
Long
– 51 –
0-7
0-3
0-1
0
==
==
==
==
[0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]
[0xf1f0,0xf3f2,0xf5f4,0xf7f6]
[0xf3f2f1f0,0xf7f6f5f4]
[f3f2f1f0]
15-213, S’03
Byte Ordering on Sun
Big Endian
f0
f1
f2
f3
f4
f5
f6
f7
c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7]
MSB
LSB
MSB
s[0]
LSB
MSB
s[1]
MSB
MSB
s[2]
LSB
LSB
s[3]
MSB
i[0]
MSB
LSB
LSB
i[1]
LSB
l[0]
Output on Sun:
Characters
Shorts
Ints
Long
– 52 –
0-7
0-3
0-1
0
==
==
==
==
Print
[0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]
[0xf0f1,0xf2f3,0xf4f5,0xf6f7]
[0xf0f1f2f3,0xf4f5f6f7]
[0xf0f1f2f3]
15-213, S’03
Byte Ordering on Alpha
Little Endian
f0
f1
f2
f3
f4
f5
f6
f7
c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7]
LSB
MSB
LSB
s[0]
MSB
LSB
s[1]
LSB
MSB
LSB
s[2]
MSB
s[3]
LSB
i[0]
MSB
MSB
i[1]
LSB
MSB
l[0]
Print
Output on Alpha:
Characters
Shorts
Ints
Long
– 53 –
0-7
0-3
0-1
0
==
==
==
==
[0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]
[0xf1f0,0xf3f2,0xf5f4,0xf7f6]
[0xf3f2f1f0,0xf7f6f5f4]
[0xf7f6f5f4f3f2f1f0]
15-213, S’03
Summary
Arrays in C



Contiguous allocation of memory
Pointer to first element
No bounds checking
Compiler Optimizations



Compiler often turns array code into pointer code (zd2int)
Uses addressing modes to scale array indices
Lots of tricks to improve array indexing in loops
Structures


Allocate bytes in order declared
Pad in middle and at end to satisfy alignment
Unions


– 54 –
Overlay declarations
Way to circumvent type system
15-213, S’03