Transcript class08.ppt
15-213
“The course that gives CMU its Zip!”
Machine-Level Programming IV:
Structured Data
Feb 4, 2003
• Topics
• Arrays
• Structs
• Unions
class08.ppt
Basic Data Types
Integral
Stored
& operated on in general registers
Signed vs. unsigned depends on instructions used
Intel
byte
word
double word
GAS
b
w
l
Bytes
1
2
4
C
[unsigned] char
[unsigned] short
[unsigned] int
Floating Point
Stored
& operated on in floating point registers
Intel
Single
Double
Extended
–2–
GAS
s
l
t
Bytes
4
8
10/12
C
float
double
long double
15-213, S’03
Array Allocation
Basic Principle
T A[L];
Array of data type T and length L
Contiguously allocated region of L * sizeof(T) bytes
char string[12];
x
x + 12
int val[5];
x
double a[4];
x
x+4
x+8
x+8
x + 16
x + 12
x + 16
x + 24
x + 20
x + 32
char *p[3];
x
–3–
x+4
x+8
15-213, S’03
Array Access
Basic Principle
T A[L];
Array of data type T and length L
Identifier A can be used as a pointer to array element 0
int val[5];
1
x
5
2
1
3
x+4
x+8
x + 12
x + 16
Reference Type
Value
val[4]
val
val+1
&val[2]
val[5]
*(val+1)
val + i
–4–
3
x
x+4
x+8
??
5
x+4i
int
int
int
int
int
int
int
*
*
*
*
x + 20
15-213, S’03
Array Example
typedef int zip_dig[5];
zip_dig cmu = { 1, 5, 2, 1, 3 };
zip_dig mit = { 0, 2, 1, 3, 9 };
zip_dig ucb = { 9, 4, 7, 2, 0 };
zip_dig cmu;
1
16
zip_dig mit;
20
0
36
zip_dig ucb;
Declaration
Example
2
24
2
40
9
56
Notes
5
28
1
44
4
60
1
32
3
48
7
64
3
9
52
2
68
36
56
0
72
76
“zip_dig cmu” equivalent to “int cmu[5]”
arrays were alloced in successive 20 byte blocks
Not guaranteed to happen in general
–5–
15-213, S’03
Array Accessing Example
Computation
int get_digit(zip_dig z, int dig)
{
return z[dig];
}
Register %edx contains starting address of array
Register %eax contains array index
Desired digit at 4*%eax + %edx
Use memory reference (%edx,%eax,4)
Memory Reference Code
# %edx = z
# %eax = dig
movl (%edx,%eax,4),%eax
–6–
# z[dig]
15-213, S’03
Referencing Examples
zip_dig cmu;
1
16
zip_dig mit;
5
20
0
36
zip_dig ucb;
24
2
40
9
56
2
1
4
3
28
44
60
1
32
3
9
48
7
64
36
52
2
56
0
68
72
76
Code Does Not Do Any Bounds Checking!
Reference
mit[3]
mit[5]
mit[-1]
cmu[15]
Address
36
36
36
16
+
+
+
+
4* 3
4* 5
4*-1
4*15
Value
=
=
=
=
48
56
32
76
3
9
3
??
Guaranteed?
Yes
No
No
No
Out of range behavior implementation-dependent
No guaranteed relative allocation of different arrays
–7–
15-213, S’03
Array Loop Exampleint zd2int(zip_dig z)
{
int i;
int zi
for (i
zi =
}
return
Original Source
How
do we implement this?
Can we improve it?
First step, convert to do-while
Next?
–8–
= 0;
= 0; i < 5; i++) {
10 * zi + z[i];
zi;
}
int zd2int(zip_dig z)
{
int i;
int zi = 0;
i = 0;
if (i < 5) {
do {
zi = 10 * zi + z[i];
i++;
} while (i < 5);
}
return zi;
}
15-213, S’03
Array Loop Example – convert to ptr
*(z+i);
z[i] *(z+i)
Can we further improve this?
(hint: what does i do?)
i
0
(z+i) z
1
z+1
2
z+2
3
z+3
int zd2int(zip_dig z)
{
int i;
int zi = 0;
i = 0;
if (i < 5) {
do {
zi = 10 * zi + z[i];
i++;
} while (i < 5);
}
return zi;
}
4
z+4
5
z+5
Do we need z+i?
–9–
15-213, S’03
Array Loop Example – optimize
int zd2int(zip_dig z)
{
int i;
int zi = 0;
i = 0;
zend = z+5;
if (i < 5) {
if (z < zend) {
do {
zi = 10 * zi + *(z++);
i++;
while (z < zend);
} while (i < 5);
}
return zi;
}
i
0
(z+i) z
z++
z
1
z+1
z+1
2
z+2
z+2
3
z+3
z+3
4
z+4
z+4
5
z+5
z+5
Do we need i?
– 10 –
15-213, S’03
Array Loop Example – optimize
Can I do anything else?
int zd2int(zip_dig z)
{
int* zend;
int zi = 0;
zend = z+5;
if (z < zend) {
do {
zi = 10 * zi + *(z++);
} while (z < zend);
}
return zi;
}
– 11 –
15-213, S’03
Array Loop Example
Original Source
Transformed Version
As
generated by GCC
Express in do-while form
No need to test at entrance
Convert
array code to
pointer code
Eliminate loop variable i
– 12 –
int zd2int(zip_dig z)
{
int i;
int zi = 0;
for (i = 0; i < 5; i++) {
zi = 10 * zi + z[i];
}
return zi;
}
int zd2int(zip_dig z)
{
int zi = 0;
int *zend = z + 4;
do {
zi = 10 * zi + *z;
z++;
} while(z <= zend);
return zi;
}
15-213, S’03
Array Loop Implementation
Registers
%ecx z
%eax zi
%ebx zend
Computations
10*zi + *z implemented as
*z + 2*(zi+4*zi)
z++ increments by 4
# %ecx = z
xorl %eax,%eax
leal 16(%ecx),%ebx
.L59:
leal (%eax,%eax,4),%edx
movl (%ecx),%eax
addl $4,%ecx
leal (%eax,%edx,2),%eax
cmpl %ebx,%ecx
jle .L59
– 13 –
int zd2int(zip_dig z)
{
int zi = 0;
int *zend = z + 4;
do {
zi = 10 * zi + *z;
z++;
} while(z <= zend);
return zi;
}
# zi = 0
# zend = z+4
#
#
#
#
#
#
5*zi
*z
z++
zi = *z + 2*(5*zi)
z : zend
if <= goto loop
15-213, S’03
Nested Array Example
#define PCOUNT 4
zip_dig pgh[PCOUNT] =
{{1, 5, 2, 0, 6},
{1, 5, 2, 1, 3 },
{1, 5, 2, 1, 7 },
{1, 5, 2, 2, 1 }};
zip_dig
pgh[4];
1 5 2 0 6 1 5 2 1 3 1 5 2 1 7 1 5 2 2 1
76
96
116
136
156
Declaration
“zip_dig pgh[4]” “int pgh[4][5]”
Variable pgh denotes array of 4 elements
» Allocated contiguously
Each element is an array of 5
int’s
» Allocated contiguously
“Row-Major”
– 14 –
ordering of all elements guaranteed
15-213, S’03
Nested Array Allocation
Declaration
T A[R][C];
Array of data type T
R rows, C columns
Type T element needs K bytes
A[0][0]
•
•
•
A[0][C-1]
•
•
•
A[R-1][0] • • • A[R-1][C-1]
Array Size
• • •
R * C * K bytes
Arrangement
Row-Major Ordering
int A[R][C];
A
A
A
A
[0] • • • [0] [1] • • • [1]
[0]
[C-1] [0]
[C-1]
•
•
•
A
A
[R-1] • • • [R-1]
[0]
[C-1]
4*R*C Bytes
– 15 –
15-213, S’03
Nested Array Row Access
Row Vectors
A[i] is array of C elements
Each element of type T
Starting address A + i * C * K
int A[R][C];
A[0]
A
[0]
[0]
A
– 16 –
A
[0] • • •
•••
[C-1]
A[i]
A
[i]
[0]
A[R-1]
A
A
A
[i] • • • [R-1] • • • [R-1]
•••
[C-1]
[0]
[C-1]
A+i*C*4
A+(R-1)*C*4
15-213, S’03
Nested Array Row Access Code
int *get_pgh_zip(int index)
{
return pgh[index];
}
Row Vector
pgh[index] is array of 5 int’s
Starting address pgh+20*index
Code
Computes and returns address
Compute as pgh + 4*(index+4*index)
# %eax = index
leal (%eax,%eax,4),%eax
leal pgh(,%eax,4),%eax
– 17 –
# 5 * index
# pgh + (20 * index)
15-213, S’03
Nested Array Element Access
Array Elements
A[i][j] is element of type T
Address A + (i * C + j) * K
A
[i]
[j]
int A[R][C];
A
[0]
[0]
A
A[0]
A[i]
A
[0] • • •
•••
[C-1]
A
[i]
[j]
•••
A[R-1]
•••
A+i*C*4
A
A
• • • [R-1] • • • [R-1]
[0]
[C-1]
A+(R-1)*C*4
A+(i*C+j)*4
– 18 –
15-213, S’03
Nested Array Element Access Code
Array Elements
pgh[index][dig] is int
Address:
pgh + 20*index + 4*dig
Code
Computes
int get_pgh_digit
(int index, int dig)
{
return pgh[index][dig];
}
address
pgh + 4*dig + 4*(index+4*index)
movl
performs memory reference
# %ecx = dig
# %eax = index
leal 0(,%ecx,4),%edx
leal (%eax,%eax,4),%eax
movl pgh(%edx,%eax,4),%eax
– 19 –
# 4*dig
# 5*index
# *(pgh + 4*dig + 20*index)
15-213, S’03
Strange Referencing Examples
zip_dig
pgh[4];
1 5 2 0 6 1 5 2 1 3 1 5 2 1 7 1 5 2 2 1
76
96
Reference Address
pgh[3][3]
pgh[2][5]
pgh[2][-1]
pgh[4][-1]
pgh[0][19]
76+20*3+4*3 = 148
76+20*2+4*5 = 136
76+20*2+4*-1 = 112
76+20*4+4*-1 = 152
76+20*0+4*19 = 152
pgh[0][-1] 76+20*0+4*-1 = 72
116
136
156
Value Guaranteed?
2
1
3
1
1
Yes
??
No
Yes
Yes
Yes
Yes
Code
does not do any bounds checking
Ordering of elements within array is guaranteed
– 20 –
15-213, S’03
Multi-Level Array Example
Variable
univ denotes
array of 3 elements
Each elem is a pointer
zip_dig cmu = { 1, 5, 2, 1, 3 };
zip_dig mit = { 0, 2, 1, 3, 9 };
zip_dig ucb = { 9, 4, 7, 2, 0 };
Each
#define UCOUNT 3
int *univ[UCOUNT] = {mit, cmu, ucb};
4 bytes
pointer points to
an array of int’s
cmu
univ
160
36
164
16
168
56
mit
1
16
20
0
ucb 36
56
– 21 –
5
2
24
2
40
9
28
1
44
4
60
1
32
3
48
7
64
3
9
52
2
68
36
56
0
72
76
15-213, S’03
Element Access in Multi-Level Array
Computation
int get_univ_digit(int index, int dig)
{
return univ[index][dig];
}
Element access
Mem[Mem[univ+4*index]+4*dig]
Must do two memory reads
First get pointer to row array
Then access element within array
leal 0(,%ecx,4),%edx
movl univ(%edx),%edx
movl (%edx,%eax,4),%eax
– 22 –
#
#
#
#
#
%ecx = index
%eax = dig
4*index
Mem[univ+4*index]
Mem[...+4*dig]
15-213, S’03
Array Element Accesses
Syntax is the same, computation is different!
Nested Array
Multi-Level Array
int get_pgh_digit
(int index, int dig)
{
return pgh[index][dig];
}
int get_univ_digit
(int index, int dig)
{
return univ[index][dig];
}
Element at
Element at
Mem[pgh+20*index+4*dig]
Mem[Mem[univ+4*index]+4*di
g]
cmu
1
5
2
1
3
univ
1 5 2 0 6 1 5 2 1 3 1 5 2 1 7 1 5 2 2 1
76
96
116
136
156
160
36
164
16
168
56
mit
16
ucb 36
56
– 23 –
20
0
24
2
40
9
44
4
60
28
1
48
7
64
32
3
52
2
68
36
9
56
0
72
76
15-213, S’03
Strange Referencing Examples
cmu
univ
160
36
164
16
168
56
mit
1
16
20
0
ucb 36
univ[2][3]
univ[1][5]
univ[2][-1]
univ[3][-1]
univ[1][12]
– 24 –
56+4*3
16+4*5
56+4*-1
??
16+4*12
2
9
Address
2
24
40
56
Reference
5
1
4
64
= 64
2
0
9
??
7
3
32
3
48
7
Value
= 68
= 36
= 52
28
44
60
1
9
52
2
68
36
56
0
72
76
Guaranteed?
Yes
No
No
No
No
Code does not do any bounds checking
Ordering of elements in different arrays not guaranteed
15-213, S’03
Using Nested Arrays
Strengths
C
compiler handles doubly
subscripted arrays
Generates very efficient
code
Avoids multiply in index
computation
Limitation
Only
works if have fixed
array size
(*,k)
#define N 16
typedef int fix_matrix[N][N];
/* Compute element i,k of
fixed matrix product */
int fix_prod_ele
(fix_matrix a, fix_matrix b,
int i, int k)
{
int j;
int result = 0;
for (j = 0; j < N; j++)
result += a[i][j]*b[j][k];
return result;
}
(i,*)
Row-wise
A
B
Column-wise
– 25 –
15-213, S’03
Dynamic Nested Arrays
Strength
Can create matrix of
arbitrary size
Programming
Must do index computation
explicitly
Performance
Accessing single element
costly
Must do multiplication
movl 12(%ebp),%eax
movl 8(%ebp),%edx
imull 20(%ebp),%eax
addl 16(%ebp),%eax
movl (%edx,%eax,4),%eax
– 26 –
int * new_var_matrix(int n)
{
return (int *)
calloc(sizeof(int), n*n);
}
int var_ele
(int *a, int i,
int j, int n)
{
return a[i*n+j];
}
#
#
#
#
#
i
a
n*i
n*i+j
Mem[a+4*(i*n+j)]
15-213, S’03
Dynamic Array Multiplication
Without Optimizations
Multiplies
2 for subscripts
1 for data
Adds
4 for array indexing
1 for loop index
1 for data
(*,k)
/* Compute element i,k of
variable matrix product */
int var_prod_ele
(int *a, int *b,
int i, int k, int n)
{
int j;
int result = 0;
for (j = 0; j < n; j++)
result +=
a[i*n+j] * b[j*n+k];
return result;
}
(i,*)
Row-wise
A
Column-wise
– 27 –
B
Can we optimize this?
15-213, S’03
Optimizing Dynamic Array Mult
/* Compute element i,k of
variable matrix product */
int var_prod_ele(int *a, int *b, int i, int k, int n)
{
int j;
int result = 0;
for (j = 0; j < n; j++)
result +=
a[i*n+j] * b[j*n+k];
return result;
}
iter
– 28 –
0
1
2
3
a index
i*n
i*n+4
i*n+8
i*n+12
b index
k
n+k
2*n+k
3*n+k
15-213, S’03
Optimizing Dynamic Array Mult
/* Compute element i,k of
variable matrix product */
int var_prod_ele(int *a, int *b, int i, int k, int n)
{
int j;
int result = 0;
for (j = 0; j < n; j++)
result +=
a[i*n+j] * b[j*n+k];
return result;
}
iter
– 29 –
0
1
2
3
a index
i*n
i*n+4
i*n+8
i*n+12
b index
k
n+k
2*n+k
3*n+k
15-213, S’03
Invariant Code Motion
/* Compute element i,k of
variable matrix product */
int var_prod_ele(int *a, int *b, int i, int k, int n)
{
int j;
int iTn = i*n;
int result = 0;
for (j = 0; j < n; j++)
iTn result +=
a[i*n+j] * b[j*n+k];
return result;
}
iter
– 30 –
0
1
2
3
a index
i*n
i*n+4
i*n+8
i*n+12
b index
k
n+k
2*n+k
3*n+k
15-213, S’03
Invariant Code Motion
/* Compute element i,k of
variable matrix product */
int var_prod_ele(int *a, int *b, int i, int k, int n)
{
int j;
int result = 0;
Anything else?
int iTn = i * n;
for (j = 0; j < n; j++)
result +=
a[iTn+j] * b[j*n+k];
return result;
}
iter
– 31 –
0
1
2
3
a index
i*n
i*n+4
i*n+8
i*n+12
b index
k
n+k
2*n+k
3*n+k
15-213, S’03
Induction Var + Strength Reduciton
/* Compute element i,k of
variable matrix product */
int var_prod_ele(int *a, int *b, int i, int k, int n)
{
int j;
int result = 0;
int jTnPk = k;
int iTn = i * n;
for (j = 0; j < n; j++)
jTnPk
result +=
a[iTn+j] * b[j*n+k]; jTnPk += n;
return result;
}
iter
– 32 –
0
1
2
3
a index
i*n
i*n+4
i*n+8
i*n+12
b index
k
n+k
2*n+k
3*n+k
15-213, S’03
Optimizing Dynamic Array Mult.
Optimizations
{
int j;
int result = 0;
for (j = 0; j < n; j++)
result +=
a[i*n+j] * b[j*n+k];
return result;
Performed
when set
optimization level to -O2
Code Motion
Expression
i*n can be
computed outside loop
}
{
int j;
int result = 0;
int iTn = i*n;
int jTnPk = k;
for (j = 0; j < n; j++) {
result +=
a[iTn+j] * b[jTnPk];
jTnPk += n;
}
return result;
Strength Reduction
Incrementing
j has
effect of incrementing
j*n+k by n
Performance
Compiler
– 33 –
can optimize
regular access patterns
}
15-213, S’03
Structures
Concept
Contiguously-allocated region of memory
Refer to members within structure by names
Members may be of different types
Memory Layout
struct rec {
int i;
int a[3];
int *p;
};
i
0
a
p
16 20
4
Accessing Structure Member
void
set_i(struct rec *r,
int val)
{
r->i = val;
}
– 34 –
Assembly
# %eax = val
# %edx = r
movl %eax,(%edx)
# Mem[r] = val
15-213, S’03
Generating Ptr to Structure Member
r
struct rec {
int i;
int a[3];
int *p;
};
Generating Pointer to
Array Element
Offset of each
structure member
determined at compile
time
i
0
a
p
4
16
r + 4 + 4*idx
int *
find_a
(struct rec *r, int idx)
{
return &r->a[idx];
}
# %ecx = idx
# %edx = r
leal 0(,%ecx,4),%eax
# 4*idx
leal 4(%eax,%edx),%eax # r+4*idx+4
– 35 –
15-213, S’03
Structure Referencing (Cont.)
C Code
struct rec {
int i;
int a[3];
int *p;
};
void
set_p(struct rec *r)
{
r->p =
&r->a[r->i];
}
– 36 –
i
0
a
4
i
0
p
16
a
4
16
Element i
# %edx = r
movl (%edx),%ecx
leal 0(,%ecx,4),%eax
leal 4(%edx,%eax),%eax
movl %eax,16(%edx)
#
#
#
#
r->i
4*(r->i)
r+4+4*(r->i)
Update r->p
15-213, S’03
Alignment
Aligned Data
Primitive
data type requires K bytes
Address must be multiple of K
Required
on some machines; advised on IA32
treated differently by Linux and Windows!
Motivation for Aligning Data
Memory
accessed by (aligned) double or quad-words
Inefficient to load or store datum that spans quad word
boundaries
Virtual memory very tricky when datum spans 2 pages
Compiler
Inserts
fields
– 37 –
gaps in structure to ensure correct alignment of
15-213, S’03
Specific Cases of Alignment
Size of Primitive Data Type:
1 byte (e.g., char)
no restrictions on address
2 bytes (e.g., short)
lowest 1 bit of address must be 02
4 bytes (e.g., int, float, char *, etc.)
lowest 2 bits of address must be 002
8 bytes (e.g., double)
Windows (and most other OS’s & instruction sets):
» lowest 3 bits of address must be 0002
Linux:
» lowest 2 bits of address must be 002
» i.e., treated the same as a 4-byte primitive data type
12 bytes (long double)
Linux:
» lowest 2 bits of address must be 002
» i.e., treated the same as a 4-byte primitive data type
– 38 –
15-213, S’03
Satisfying Alignment in Structures
Offsets Within Structure
Must satisfy element’s alignment requirement
Overall Structure Placement
Each structure has alignment requirement K
Largest alignment of any element
struct S1 {
char c;
int i[2];
double v;
} *p;
Initial address & structure length must be
multiples of K
Example (under Windows):
K = 8, due to double element
c
p+0
i[0]
p+4
Multiple of 4
Multiple of 8
– 39 –
i[1]
p+8
v
p+16
p+24
Multiple of 8
Multiple of 8
15-213, S’03
Linux vs. Windows
struct S1 {
char c;
int i[2];
double v;
} *p;
Windows (including Cygwin):
K = 8, due to double element
c
p+0
i[0]
p+4
i[1]
v
p+8
p+16
Multiple of 4
Multiple of 8
Multiple of 8
p+24
Multiple of 8
Linux:
K = 4; double treated like a 4-byte data type
c
p+0
i[0]
p+4
Multiple of 4
Multiple of 4
– 40 –
i[1]
p+8
v
p+12
Multiple of 4
p+20
Multiple of 4
15-213, S’03
Overall Alignment Requirement
struct S2 {
double x;
int i[2];
char c;
} *p;
p must be multiple of:
8 for Windows
4 for Linux
x
i[0]
p+0
p+8
struct S3 {
float x[2];
int i[2];
char c;
} *p;
x[0]
p+0
– 41 –
p+12
c
p+16
Windows: p+24
Linux: p+20
p must be multiple of 4 (in either OS)
x[1]
p+4
i[1]
i[0]
p+8
i[1]
p+12
c
p+16
p+20
15-213, S’03
Ordering Elements Within Structure
struct S4 {
char c1;
double v;
char c2;
int i;
} *p;
10 bytes wasted space in Windows
c1
v
p+0
p+8
struct S5 {
double v;
char c1;
char c2;
int i;
} *p;
v
p+0
– 42 –
c2
p+16
i
p+20
p+24
2 bytes wasted space
c1 c2
p+8
i
p+12
p+16
15-213, S’03
Arrays of Structures
Principle
Allocated by repeating allocation for
array type
In general, may nest arrays &
structures to arbitrary depth
a[1].i
a[1].v
a+12
a+16
a[0]
a+0
– 43 –
a[1].j
a+20
a[1]
a+12
struct S6 {
short i;
float v;
short j;
} a[10];
a+24
•••
a[2]
a+24
a+36
15-213, S’03
Accessing Element within Array
Compute offset to start of structure
Compute 12*i as 4*(i+2i)
struct S6 {
short i;
float v;
short j;
} a[10];
Access element according to its offset
within structure
Offset by 8
Assembler gives displacement as a + 8
» Linker must set actual value
short get_j(int idx)
{
return a[idx].j;
}
a[0]
a+0
a[i].i
a+12i
– 44 –
# %eax = idx
leal (%eax,%eax,2),%eax # 3*idx
movswl a+8(,%eax,4),%eax
• • •
a[i]
• • •
a+12i
a[i].v
a[i].j
a+12i+8
15-213, S’03
Satisfying Alignment within Structure
Achieving Alignment
Starting address of structure array must be
multiple of worst-case alignment for any element
struct S6 {
short i;
float v;
short j;
} a[10];
a must be multiple of 4
Offset of element within structure must be
multiple of element’s alignment requirement
v’s offset of 4 is a multiple of 4
Overall size of structure must be multiple of worstcase alignment for any element
Structure padded with unused space to be 12 bytes
a[0]
• • •
a[i]
a+12i
a+0
a[1].i
Multiple of 4
a+12i
– 45 –
• • •
a[1].v
a[1].j
a+12i+4
Multiple of 4
15-213, S’03
Union Allocation
Principles
Overlay union elements
Allocate according to largest element
Can only use one field at a time
struct S1 {
char c;
int i[2];
double v;
} *sp;
c
sp+0
– 46 –
sp+4
union U1 {
char c;
int i[2];
double v;
} *up;
c
i[0]
up+0
i[1]
v
up+4
up+8
(Windows alignment)
i[0]
sp+8
i[1]
v
sp+16
sp+24
15-213, S’03
Using Union to Access Bit Patterns
typedef union {
float f;
unsigned u;
} bit_float_t;
u
f
0
4
float bit2float(unsigned u)
{
bit_float_t arg;
arg.u = u;
return arg.f;
}
unsigned float2bit(float f)
{
bit_float_t arg;
arg.f = f;
return arg.u;
}
Get direct access to bit representation of float
bit2float generates float with given bit pattern
NOT the same as (float) u
– 47 –
float2bit generates bit pattern from float
NOT the same as (unsigned) f
15-213, S’03
Byte Ordering Revisited
Idea
Short/long/quad words stored in memory as 2/4/8
consecutive bytes
Which is most (least) significant?
Can cause problems when exchanging binary data between
machines
Big Endian
Most significant byte has lowest address
PowerPC, Sparc
Little Endian
– 48 –
Least significant byte has lowest address
Intel x86, Alpha
15-213, S’03
Byte Ordering Example
union {
unsigned
unsigned
unsigned
unsigned
} dw;
char c[8];
short s[4];
int i[2];
long l[1];
c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7]
s[0]
s[1]
i[0]
s[2]
s[3]
i[1]
l[0]
– 49 –
15-213, S’03
Byte Ordering Example (Cont).
int j;
for (j = 0; j < 8; j++)
dw.c[j] = 0xf0 + j;
printf("Characters 0-7 ==
[0x%x,0x%x,0x%x,0x%x,0x%x,0x%x,0x%x,0x%x]\n",
dw.c[0], dw.c[1], dw.c[2], dw.c[3],
dw.c[4], dw.c[5], dw.c[6], dw.c[7]);
printf("Shorts 0-3 ==
[0x%x,0x%x,0x%x,0x%x]\n",
dw.s[0], dw.s[1], dw.s[2], dw.s[3]);
printf("Ints 0-1 == [0x%x,0x%x]\n",
dw.i[0], dw.i[1]);
printf("Long 0 == [0x%lx]\n",
dw.l[0]);
– 50 –
15-213, S’03
Byte Ordering on x86
Little Endian
f0
f1
f2
f3
f4
f5
f6
f7
c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7]
LSB
MSB
LSB
s[0]
MSB
LSB
s[1]
LSB
LSB
s[2]
MSB
MSB
s[3]
LSB
i[0]
LSB
MSB
MSB
i[1]
MSB
l[0]
Print
Output on Pentium:
Characters
Shorts
Ints
Long
– 51 –
0-7
0-3
0-1
0
==
==
==
==
[0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]
[0xf1f0,0xf3f2,0xf5f4,0xf7f6]
[0xf3f2f1f0,0xf7f6f5f4]
[f3f2f1f0]
15-213, S’03
Byte Ordering on Sun
Big Endian
f0
f1
f2
f3
f4
f5
f6
f7
c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7]
MSB
LSB
MSB
s[0]
LSB
MSB
s[1]
MSB
MSB
s[2]
LSB
LSB
s[3]
MSB
i[0]
MSB
LSB
LSB
i[1]
LSB
l[0]
Output on Sun:
Characters
Shorts
Ints
Long
– 52 –
0-7
0-3
0-1
0
==
==
==
==
Print
[0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]
[0xf0f1,0xf2f3,0xf4f5,0xf6f7]
[0xf0f1f2f3,0xf4f5f6f7]
[0xf0f1f2f3]
15-213, S’03
Byte Ordering on Alpha
Little Endian
f0
f1
f2
f3
f4
f5
f6
f7
c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7]
LSB
MSB
LSB
s[0]
MSB
LSB
s[1]
LSB
MSB
LSB
s[2]
MSB
s[3]
LSB
i[0]
MSB
MSB
i[1]
LSB
MSB
l[0]
Print
Output on Alpha:
Characters
Shorts
Ints
Long
– 53 –
0-7
0-3
0-1
0
==
==
==
==
[0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]
[0xf1f0,0xf3f2,0xf5f4,0xf7f6]
[0xf3f2f1f0,0xf7f6f5f4]
[0xf7f6f5f4f3f2f1f0]
15-213, S’03
Summary
Arrays in C
Contiguous allocation of memory
Pointer to first element
No bounds checking
Compiler Optimizations
Compiler often turns array code into pointer code (zd2int)
Uses addressing modes to scale array indices
Lots of tricks to improve array indexing in loops
Structures
Allocate bytes in order declared
Pad in middle and at end to satisfy alignment
Unions
– 54 –
Overlay declarations
Way to circumvent type system
15-213, S’03