The Practice of Type Theory in Programming Languages Robert Harper Carnegie Mellon University August, 2000

Download Report

Transcript The Practice of Type Theory in Programming Languages Robert Harper Carnegie Mellon University August, 2000

The Practice of Type Theory
in Programming Languages
Robert Harper
Carnegie Mellon University
August, 2000
Acknowledgements
• Thanks to Reinhard Wilhelm for inviting
me to speak!
• Thanks to my colleagues, former, and
current students at Carnegie Mellon.
An Old Story
• Once upon a time (es war einmal), there were
those who thought that typed high-level
programming languages would save the
world.
– Ensure safety of executed code.
– Support reasoning and verification.
– Run efficiently (enough) on stock hardware.
• “If we all programmed in Pascal (or Algol or
Simula or …), all of our problems would be
solved.”
What Happened Instead
• Things didn’t worked out quite as
expected or predicted.
– COTS software is mostly written in lowlevel, unsafe languages (ie, C, C++)
– Some ideas have been adopted (eg,
objects and classes), most haven’t.
– Developers have learned to work with lessthan-perfect languages, achieving
astonishing results.
Languages Ride Again
• But the world has changed: strong safety
assurances are more important than ever.
– Mobile code on the internet.
– Increasing reliance on software in “real life”.
• Schneider made a strong case for languagebased security mechanisms.
– “Languages aren’t just languages any more.”
– Rich body of work on logics, semantics, type
systems, verification, compilation.
Language-Based Security
• Key idea: program analysis is more
powerful than execution monitoring.
• This talk is about one approach to
taking this view seriously, typed
certifying compilation.
Type Theory and Languages
• Type theory has emerged as the central
organizing principle for language …
– Design: genericity, abstraction, and
modularity mechanisms.
– Implementation: type inference, flow
analysis.
– Semantics: domain theory, logical relations.
What is a Type System?
• A type system is a syntactic discipline
for enforcing levels of abstraction.
– Ensures that bad things do not happen.
• A type system rules out programs.
– Adding a function to a string
– Interpreting an integer as a pointer
– Violating interfaces
What is a Type System?
• How can this be a good thing?
– Expressiveness arises from strictures:
restrictions entail stronger invariants
– Flexibility arises from controlled relaxation
of strictures, not from their absence.
• A type system is fundamentally a
verification tool that suffices to ensure
invariants on execution behavior.
Types Induce Invariants
• Types induce invariants on programs.
– If e : int, then its value must be an integer.
– If e : int  int, then it must be a function
taking and yielding integers.
– If e : filedesc, then it must have been
obtained by a call to open.
– If e : int{H}, then no “low clearance”
expression can read its value.
Types Induce Invariants
• These invariants provide
– Safety properties: well-typed programs do
not “go wrong”.
– Equational properties: when are two
expressions interchangeable in all
contexts.
– Representation independence
(parametricity).
Types as Safety Certificates
• Typing is a sufficient condition for these
invariants to hold.
– Well-typed implies well-behaved.
– Not (necessarily) checkable at run-time!
• Types form a certificate of safety.
– Type checking = safety checking.
– A practical sufficient condition for safety.
The HLL Assumption
• This is well and good, but …
– Programs are compiled to unsafe, low-level
machine code.
– We want to know that the object code is safe.
• HLL assumption: trust the correctness of the
compiler and run-time system.
– A huge assumption.
– Spurred much research in compiler correctness.
Certifying Compilers
• Idea: propagate types from the source
to the object code.
– Can be checked by a code recipient.
– Avoids reliance on compiler correctness.
• Based on a new approach to
compilation.
– Typed intermediate languages.
– Type-directed translation.
Typed Intermediate
Languages
• Generalize syntax-directed translation
to type-directed translation.
– intermediate languages come equipped
with a type system.
– compiler transformations translate both a
program and its type.
– translation preserves typing: if e:T then
e*:T* after translation
Typed Intermediate
Languages
• Classical syntax-directed translation:
Source = L1  L2  …  Ln = Target
:
T1
• Type system applies to the source
language only.
– Type check, then throw away types.
Typed Intermediate
Languages
• Type-directed translation:
Source = L1  L2  …  Ln = Target
:
:
:
T1  T2  …  Tn
• Maintain types during compilation.
– Translate a program and its type.
– Types guide translation process.
Typed Object Code
• Typed Assembly Language (TAL)
– type information ensures safety
– generated by compiler
– very close to standard x86 assembly
• Type information captures
– types of registers and stack
– type assumptions at branch targets (including join
points)
• Relies heavily on polymorphism!
– eg, callee-saves registers, enforcing abstraction
Typed Assembly Language
fact:
ALL rho.{r1:int, sp:{r1:int, sp:rho}::rho}
jgz r1, positive
mov r1,1
ret
positive:
push r1 ; sp : int::{t1:int,sp:rho}::rho
sub r1,r1,1
call fact[int::{r1:int,sp:rho}::rho]
imul r1,r1,r2
pop r2
; sp : {r1:int,sp:rho}:: ret
Tracking Stronger Properties
• Familiar type systems go a long way.
– Ensures minimal sanity of code.
– Ensures compliance with interfaces.
– Especially if you have polymorphism.
• Refinement types take a step further.
– Track value range invariants.
– Array bounds checks, null pointer checks,
red-black invariants, etc.
Refinement Types
• First idea: subset types.
e : { x : T | P(x) } iff e:T and |= P(e)
• Examples:
– Pascal-like sub-ranges
0..n = { n : int | 0  n < length(A) }
– Non-null objects
– Red-black condition on RBT’s
Refinement types
• Checking value range properties is
undecidable!
– eg, cannot decide if 0  e < 10 for general
expressions e
• Checker must include a theorem prover
to validate object code.
– either complex and error prone, or
– too weak to be useful
Refinement Types
• Second idea: proof carrying code.
(e, ) : { x:T | P(x) } iff e:T and  |- P(e)
• Provide a proof of the range property.
– How to obtain it?
– How to represent it?
• Verifier checks the types and the proof.
– using a proof checker, not a proof finder
Finding Proofs
• To use A[n] safely, we must prove that
0  n  size(A).
• If we insert a run-time check, it’s easy!
– if 0  n  size(A) then *(A+4n) else fail
• In general we must find proofs.
– Instrumented analysis methods.
– Programmer declarations.
Representing Proofs
• How do we represent the proofs?
– Need a formal logic for reasoning about
value range properties (for example).
– Need a proof checker for each such
formalism.
• But which logic should we use?
– How do we accommodate change?
– Which properties are of interest?
Logical Frameworks
• The LF logical framework is a universal
language for defining logical systems.
– Captures uniformities of a large class of logical
systems.
– Provides a formal definition language for logical
systems.
• Proof checking is reduced to a very simple
form of type checking.
– One type checker yields many proof checkers!
General Certified Code
• The logic is part of the safety certificate!
– Logic of type safety.
– Logic of value ranges.
– Logic of space requirements.
• Proofs are LF terms for that logic.
– Checker is parameterized on specification of the
logic (an LF “signature”).
– LF type checker checks proofs in any logic
(provided it is formalized in LF).
Some Challenges
• Can certified compilation really be made
practical?
– TALC [Morrisett] for “safe C”.
– TILT [CMU] for Standard ML [in progress].
– SML/NJ [Yale] for Standard ML [in
progress].
– Touchstone [Necula, Lee] for “safe C”.
Some Challenges
• Can refinements be made useful and
practical?
– Dependent ML [Pfenning, Xi]
– Dependently-Typed Assembly [Harper, Xi]
• Experience with ESC is highly relevant.
– A difference is that refinements are built in
to the language.
Some Predictions
• Certifying compilation will be standard
technology.
– Code will come equipped with checkable safety
certificates.
• Type systems will become the framework for
building practical development tools.
– Part of the program text.
– Mechanically checkable.
Further Information
http://www.typetheory.com