Transcript Slides

Lecture 18:
Deep C
Garbage
Collection
CS201j: Engineering Software
University of Virginia
Computer Science
David Evans
http://www.cs.virginia.edu/evans
Menu
• Pointers in C
– Pointer Arithmetic
• Type checking in C
• Why is garbage collection hard in C?
7 November 2002
CS 201J Fall 2002
2
What are those arrows really?
Heap
Stack
sb
“hello”
7 November 2002
CS 201J Fall 2002
3
Pointers
• In Java, an object reference is really just
an address in memory
– But Java doesn’t let programmers manipulate
addresses directly
Heap
Stack
0x80496f0
0x80496f4
0x80496f8
sb
0x80496f8
0x80496fb
hell
o\0\0\0
0x8049700
0x8049704
0x8049708
7 November 2002
CS 201J Fall 2002
4
Pointers in C
• Addresses in memory
• Programs can manipulate addresses
directly
&expr
*expr
7 November 2002
Evaluates to the address of the
location expr evaluates to
Evaluates to the value stored in
the address expr evaluates to
CS 201J Fall 2002
5
&*%&@#*!
int f (void) {
int s = 1;
int t = 1;
int *ps = &s;
int **pps = &ps;
int *pt = &t;
s == 1, t == 1
**pps = 2;
s == 2, t == 1
pt = ps;
*pt = 3;
t = s;
}
7 November 2002
s == 3, t == 1
s == 3, t == 3
CS 201J Fall 2002
6
Rvalues and Lvalues
What does = really mean?
int f (void) {
int s = 1;
int t = 1;
t = s;
t = 2;
}
7 November 2002
left side of = is an “lvalue”
it evaluates to a location (address)!
right side of = is an “rvalue”
it evaluates to a value
There is an implicit * when a variable is
used as an rvalue!
CS 201J Fall 2002
7
BLISS Aside
• BLISS [Wulf71]
• Made getting values explicit
s = .t;
Puts the value in the location t in the location s
7 November 2002
CS 201J Fall 2002
8
Parameter Passing in C
• Actual parameters are rvalues
void swap (int a, int b) {
int tmp = b; b = a; a = tmp;
}
int main (void) {
int i = 3;
int j = 4;
swap (i, j);
The value of i (3) is passed, not its location!
…
swap does nothing
}
7 November 2002
CS 201J Fall 2002
9
Parameter Passing in C
• Can pass addresses around
void swap (int *a, int *b) {
int tmp = *b; *b = *a; *a = tmp;
}
int main (void) {
int i = 3;
int j = 4;
swap (&i, &j);
The value of &i is passed, which is the address of i
…
}
7 November 2002
CS 201J Fall 2002
10
int *value (void)
{
int i = 3;
return &i;
}
void callme (void)
{
int x = 35;
}
Beware!
> splint value.c
Splint 3.0.1.7 --- 08 Aug 2002
value.c: (in function value)
value.c:4:10: Stack-allocated storage &i reachable from
return value: &i
A stack reference is pointed to by an external reference
when the function returns. The stack-allocated storage is
destroyed after the call, leaving a dangling reference.
(Use -stackref to inhibit warning)
…
But it could really be anything!
int main (void) {
int *ip;
ip = value ();
printf (“*ip == %d\n", *ip);
callme ();
printf ("*ip == %d\n", *ip);
}
7 November 2002
*ip == 3
*ip == 35
CS 201J Fall 2002
11
Manipulating Addresses
char s[6];
s[0] = ‘h’;
expr1[expr2] in C is just syntactic sugar for
s[1] = ‘e’;
*(expr1 + expr2)
s[2]= ‘l’;
s[3] = ‘l’;
s[4] = ‘o’;
s[5] = ‘\0’;
printf (“s: %s\n”, s);
s: hello
7 November 2002
CS 201J Fall 2002
12
Obfuscating C
char s[6];
*s = ‘h’;
*(s + 1) = ‘e’;
2[s] = ‘l’;
3[s] = ‘l’;
*(s + 4) = ‘o’;
5[s] = ‘\0’;
printf (“s: %s\n”, s);
s: hello
7 November 2002
CS 201J Fall 2002
13
Fun with Pointer Arithmetic
int match (char *s, char *t) {
int count = 0;
while (*s == *t) { count++; s++; t++; }
return count;
}
int main (void)
{
char s1[6] = "hello"; The \0 is invisible!
char s2[6] = "hohoh";
}
&s2[1]
&(*(s2 + 1))
 s2 + 1
printf ("match: %d\n", match (s1, s2));
printf ("match: %d\n", match (s2, s2 + 2));
printf ("match: %d\n", match (&s2[1], &s2[3]));
7 November 2002
CS 201J Fall 2002
match: 1
match: 3
match: 2
14
Condensing match
int match (char *s, char *t) {
int count = 0;
while (*s == *t) { count++; s++; t++; }
return count;
}
int match (char *s, char *t) {
char *os = s;
while (*s++ == *t++);
return s – os - 1;
}
s++ evaluates to spre, but changes the value of s
Hence, C++ has the same value as C, but has unpleasant side effects.
7 November 2002
CS 201J Fall 2002
15
Type Checking in C
• Java: only allow programs the compiler
can prove are type safe
Exception: run-time type errors for downcasts and array element stores.
• C: trust the programmer. If she really
wants to compare apples and oranges, let
her.
7 November 2002
CS 201J Fall 2002
16
Type Checking
int main (void)
{
char *s = (char *) 3;
printf ("s: %s", s);
}
Windows2000
(earlier versions of Windows would just crash the whole machine)
7 November 2002
CS 201J Fall 2002
17
In Praise of Type Checking
int match (int *s, int *t) {
int *os = s;
while (*s++ == *t++);
return s - os;
}
int main (void)
{
char s1[6] = "hello";
char s2[6] = "hello";
}
printf ("match: %d\n", match (s1, s2));
match: 2
7 November 2002
CS 201J Fall 2002
18
Different Matching
int different (int *s, int *t) {
int *os = s;
while (*s++ != *t++);
return s - os;
}
int main (void)
{
char s1[6] = "hello";
printf ("different: %d\n", different ((int *)s1, (int *)s1 + 1));
}
different: 29
7 November 2002
CS 201J Fall 2002
19
So, why is it hard to garbage
collect C?
7 November 2002
CS 201J Fall 2002
20
Mark and Sweep (Java version)
active = all objects on stack
while (!active.isEmpty ())
newactive = { }
foreach (Object a in active)
mark a as reachable
foreach (Object o that a points to)
if o is not marked
newactive = newactive U { o }
active = newactive
sweep () // remove unmarked objects on heap
7 November 2002
CS 201J Fall 2002
21
Mark and Sweep (C version?)
active = all pointers on stack
while (!active.isEmpty ())
newactive = { }
foreach (pointer a in active)
mark *a as reachable
foreach (address p that a points to)
if *p is not marked
newactive = newactive U { *p }
active = newactive
sweep () // remove unmarked objects on heap
7 November 2002
CS 201J Fall 2002
22
GC Challenges
char *f (void) {
char *s = (char *) malloc (sizeof (char) * 100);
s = s + 20;
*s = ‘a’;
return s – 20;
}
There may be objects that only have pointers to their middle!
7 November 2002
CS 201J Fall 2002
23
GC Challenges
char *f (void) {
char *s = (char *) malloc (sizeof (char) * 100);
int x = (int) s;
s = 0;
return (char *) x;
}
There may be objects that are reachable through values
that have non-pointer apparent types!
7 November 2002
CS 201J Fall 2002
24
GC Challenges
char *f (void) {
char *s = (char *) malloc (sizeof (char) * 100);
int x = (int) s;
x = x - &f;
s = 0;
return (char *) (x + &f);
}
There may be objects that are reachable through values
that have non-pointer apparent types and have values that don’t
even look like addresses!
7 November 2002
CS 201J Fall 2002
25
Why not just do reference
counting?
Where can you store the references?
Remember C programs can access memory
directly, better not change how objects are
stored!
7 November 2002
CS 201J Fall 2002
26
Summary
• Garbage collection depends on:
– Knowing which values are addresses
– Knowing that objects without references
cannot be reached
• Both of these are problems in C
• Nevertheless, there are some garbage
collectors for C.
– Change meaning of some programs
– Slow down programs a lot
– Are not able to find all garbage
7 November 2002
CS 201J Fall 2002
27
Charge
• PS6 due Tuesday
• Exam 2 out Thursday
– Send review questions if
you want a review class
• Remaining classes:
– Java Byte Codes
– Security
– Concurrency without
synchronization
– Project Management
7 November 2002
CS 201J Fall 2002
Garbage Collectors
(COAX, Seoul, 18 June 2002)
28