Transcript one

CS711
Capabilities Part I
Greg Morrisett
Cornell University
Motivation
Java, TAL, PCC, etc. all depend upon GC to
reclaim (heap) storage
– the issue is type invariance
• once allocated, the type cannot change
• hence, recycling memory must be done via meta-level
– GC increases the TCB and decreases flexibility
• most collectors have complicated invariants and
interfaces to the generated code
– e.g., descriptor tables, safe points, object tags, etc.
– GC has overhead
• garbage is a semantic notion  all general purpose
collectors are conservative
• space overheads, throughput overheads, latency
overheads, etc.
7/12/2016
Lang. Based Security
2
Regions
Tofte & Talpin:
– memory organized into a stack of regions
• each region held objects with similar lifetimes
• think of as "growable" stack frames
– region operations:
• letregion r in e end
– create region r, run e, then deallocate r
• (e1,e2) @ r
– allocate the pair into region r
– all region operations are very simple
• constant time (with small constants)
• very simple run-time (easier to verify correctness)
– functions can be parameterized by regions
7/12/2016
Lang. Based Security
3
Runtime Organization
Regions are linked
lists of pages.
Arbitrary inter-region
references.
Region handles can be
freely passed around
and used for allocation.
runtime stack
7/12/2016
Lang. Based Security
4
Example
letregion r1 in
let fun length[ra,rb](x: int*int@ra list @ rb)@r1 =
if (x = []) then 0 else 1 + length[ra,rb](tl x)
fun gen[ra,rb](x:int,y:int*int@ra list @ rb)@r1 =
if (x = 0) then y else cons[rb]((x,y)@ra, y)
letregion r2 in
let x = gen[r2,r1](100, [])
in
length[r2,r1](x)
end
end
7/12/2016
Lang. Based Security
5
Type System for T&T
• Allocated objects include the region in their
type (e.g., int*int @ r)
• Basic judgment is of the form:
G; S |- e : T
where G gives types to variables and S is the
set of regions that e can access.
– at first blush, this is just the set of regions that are
in scope
• The typing rule for legregion:
G;S + {r} |- e : T
r not in FV(T)
G;S |- letregion r in e end : T
7/12/2016
Lang. Based Security
6
Problem
• Regions can escape their scope:
letregion
let x :
f :
in
f : int
end
r in
int*int @r = (3,4) @ r
int -> int = fn y => y + (#1 x)
-> int
• The letregion rule only works when you can't
hide regions within the type of an object.
– closures hide types in their environments
– objects hide types in their instance variables
– ADTs (existentials) hide types period
7/12/2016
Lang. Based Security
7
Solution: leak types
• For closures, leak the set of regions
they will access when run.
letregion r in
let x : int*int @r = (3,4) @ r
f : int -> int = fn y => y + (#1 x)
in
f : int -> (int, {r})
end
• Now letregion is sound again.
• But consider...
7/12/2016
Lang. Based Security
8
Effect Polymorphism
letregion r in
let x : int*int @r = (3,4) @ r
f : int->(int,{r}) = fn y => y + (#1 x)
g : int->(int,{}) = fn y => y
h = if (random()) f else g
...
Leaking information makes things less re-usable.
T&T wanted to infer region annotations and region placement so
they needed some way to get joins.
Subtyping is one possibility (e.g., T1->(T2,S1) <= T1->(T2,S2)
when S1 is a subset of S2.)
Solution they used was to support polymorphism over sets of
regions.
7/12/2016
Lang. Based Security
9
Effect Polymorphism in Action
• Consider a function like map:
– map (f, x) =
case x of
Nil => Nil
| hd::tl => (f x)::(map (f, tl))
– what set of regions does it touch?
• depends upon what regions f touches
• All r1,r2,r3,a,b,S.(a -> (b,S) * a list@r1)@r3
-> (b list@r2, S+{r1,r2,r3})
7/12/2016
Lang. Based Security
10
Dangling Pointers Okay
TT admits dangling pointers – just can't
use them.
letregion r1
let g = letregion r2
let x : (int*int)@r2 list@r1 =
gen[r2,r1](100,[])
in
fn () => length x : int->(int,{r1})
end
in
g ()
end
7/12/2016
Lang. Based Security
11
Big Problem
• No "tail-calls" for letregion.
fun f[r](x:data @ r) =
letregion r2 in
let y:data @ r2 = process[r1,r2](data) in
f[r2](y)
end
end
• Ultimate bad situation: CPS
letregion r in
let x = (* small computation that uses r *) in
k (x) (* k is the rest of the program *)
end
end
7/12/2016
Lang. Based Security
12
Capabilities
• What we really want is to be able to free
a region at any point.
• That is, whether or not a region is live is
separate from whether or not it's in
scope.
• At different program points, the same
region might be live and dead.
7/12/2016
Lang. Based Security
13
Naive Solution
• G; S1 |- e : T;S2
G; S1 |- c : S2
– as before, S1 is the set of live regions
– S2 is the set of regions live after e
•
•
•
•
G;S |- let r = newregion : S + {r}
G;S + {r} |- freeregion r: S
This works great – locally.
An alternative interpretation is that we're
changing the type of r from "allocated" to
"deallocated".
7/12/2016
Lang. Based Security
14
Region Polymorphism
• As before, problems arise if we can
create aliases for something whose type
wants to change.
fun f[r1,r2](x:data @ r2) = (freeregion r1; process(x))
r = newregion; d = new_data[r](); f[r,r](d)
• Region polymorphism gives us a way to
create aliasing.
7/12/2016
Lang. Based Security
15
Solution Part 2
• If you're going to free something, then
require that it cannot alias anything else
(i.e., it's affine).
• If, in addition, you want to guarantee
that the region will eventually be freed,
then make it linear.
fun f[r1,r2](x:data @ r2) = (freeregion r1; process(x))
r = newregion; d = new_data[r](); f[r,r](d)
7/12/2016
Lang. Based Security
16
Problem with Linearity
• Very little code becomes re-usable
when we're not freeing something.
– i.e., can't code up TT's letregion
• We'd like some way to temporarily allow
a region to have aliases, but by the time
we free the region, ensure there are no
other aliases.
– Note: letregion does this for us since you
can't return a region from a function.
– But in CPS, there is no return, only call.
7/12/2016
Lang. Based Security
17
Forgetting and Remembering
• Capability calculus lets us "forget" that a region is
unique through subtyping:
{ r } <= { r }
• Bounded subtyping lets us recover uniqueness:
f : All S<={ r }.(data, k: (S,result)->Ans)
• Caller may "know" r (i.e., r can be freed)
– passes in k that knows this too
– callee cannot free r because it doesn't have the capability to
do so (i.e., r)
– but it can't call the continuation k until it restores the state to
what it was upon entry (i.e., no hidden copies of r squirreled
away somewhere.)
– corresponds to attenuation & amplification of access control
7/12/2016
Lang. Based Security
18
Cool Results
• Once you have this general capability
mechanism, you can code up:
– Tofte & Talpin's letregion at CPS level
– Copying GC [Wang & Appel]
– Point: Capabilities stronger than either
• If you make the initial continuation be
the empty set of regions, then the
program must free all regions before
terminating.
7/12/2016
Lang. Based Security
19
More Generally
• We can change the types of things
– Must abstract the store and aliasing relationships
in someway.
– Linearity is not good enough – to little
polymorphism.
– Note that tracking relationships of regions is much
easier than individual objects
– You still don't see a list of objects where the heads
are allocated in individual regions.
• Bounded subtyping gives us attenuation and
amplification.
– in this case, a tradeoff – you can either duplicate
or free but not both.
7/12/2016
Lang. Based Security
20