Programming Languages Research and You: What Miracles Are We Cooking Up These Days? #1

Download Report

Transcript Programming Languages Research and You: What Miracles Are We Cooking Up These Days? #1

Programming Languages
Research and You:
What Miracles Are We Cooking Up These Days?
#1
Dog and Pony Show
•
•
•
•
•
•
High-Level Summary
I am Wes Weimer
I do PL research (plays well with others)
I may not be as cool as Jack
But I do have money
And I’m looking for students
Break-out session: see ~weimer webpage
– Likely: September 11, 3:30pm
#2
Talk Outline
•
•
•
•
What is PL research in general?
What have we done in the past?
Possible cool future research …
Hint: write down a key phrase, email or talk
to me later …
projects.
Professor?
an grad student …
#3
Don’t We Already Have Compilers?
#4
Dismal View Of PL Research
C++
Java
(or C#)
#5
PL Research: Qu’est-ce que c’est?
• Study programs and
languages
• 2002 US Annual Cost of
Software Errors: $60B
– 0.6% of the GDP (NIST)
– Cost of 1 bug: $2k-$10k
• Programs as artifacts
– What should they be doing?
– Are they doing it?
– Are they making mistakes
instead?
– How might we fix them?
• Language Design
– Make some things easier
e.g., compare Ruby / Python to C++
#6
Program Analyses
• We write programs that analyze (or
transform) other programs
– cf. testing, >50% of a project’s budget
– Alias analyses, shape analyses, verifiers, …
• Doomed in theory but successful in practice
Simplest
examples:
dataflow
analyses
and type
systems
#7
Domain-Specific
Bug-Finding
• Embedded components (e.g., cellphones)
are programmed with special languages
• Most large projects include their own
custom languages (e.g., simulations,
Work with Jason
macros, mIRC scripts, game engines)
Lawrence to apply PL
• These are harder to debug and have
techniques to vertex
special semantics (= meanings)
shader code (e.g.,
• Example: UnrealScript is C-ish but has
caching, generating,
type qualifiers like transient and travel
optimization, …)
•
Example: “Players of [The Sims 2] are complaining that their
artfully-crafted homes and mansions are beginning to resemble
the Twilight Zone, thanks to an artifact of the game's design
that causes hacks to spread like viruses from user to unwitting
user.” SecurityFocus 2004-2005
• My research: found >800 real bugs in 4M
LOC, proposed a new language feature to
fix all of those errors, evaluated it with
case studies
#8
Big Example #1: CCured
• Make systems programs as safe as Java but as fast as C
– Safe = memory safety and type safety
• Take an important C program (e.g., apache, bind, openssl)
• Run a program analysis to classify all of the pointers in that
program:
– Safe Pointer
– Sequence Pointer
– Wild Pointer
= no arithmetic, no casts
= pointer arithmetic (i++), no casts
= anything goes
• Take that classification and transform the program:
– Safe Pointer
– Sequence Pointer
– Wild Pointer
= add a null check
= add bounds (and null) checks
= add full dynamic type checking
• Resulting program is provably safe
• But is < 30% slower than the original (cf. Purify: 50x slower)
#9
Big Example #2:
Specification Mining
SF.openSession
1
S.close
5
S.beginTransaction
2
3
T.commit
error
T.rollback
4
error
• In order to find bugs automatically, we must know what the
program should be doing
• Formal partial-correctness specification
• Problem: hard to get them in practice
• Our approach: learn them (machine learning)
• Learn English grammar from high school student essays …
• Take advantage of program structure and error information
(e.g., program is more likely to make a mistake near
unexpected errors)
• On 1MLOC, our specification miner
– learns 5.3x as many real specs as the next best miner
– has 25x fewer false positives
– those mined specs found 430 real bugs (vs 50-auto or 172-by-hand)
#10
Big Example #3: Error Reporting
• PL + Theory + Software Engineering
• Even when we find bugs automatically, they
often are not fixed!
• Bug reports are too confusing
• Our approach: include a candidate patch
that provably fixes that bug and introduces
no new known bugs
• Helps maintainer to understand code and
bug report – bug is more likely to be fixed
#11
How would that work?
• Somewhat like a spell-checker
• Take the violated specification and construct a new
FSM that will find the closest “not mistake” to the
mistake we’re fixing
• This gives a plan for fixing the bug: use PL to match
that plan back to the source code
• Does it work? Trial with 76 bug reports. 66% of
those with patches were addressed, vs. only 21% of
those without (statistically significant)
A
x
B
y
C
z
Given bug “xz” we
produce “x(ins y)z”
D
x
y
z
A,0
B,0
C,0
D,0
ε (ins x)
ε (ins y)
ε (ins z)
x (del x)
x (del x)
x
y
z
y (del y) A,1
B,1
C,1
D,1 y (del y)
z (del z)
z (del z)
ε (ins x)
ε (ins y)
ε (ins z)
A,2
x
B,2
y
C,2
z
D,2
#12
Program Analyses For Security
• Don’t want rogue programs to send our info to MS
or turn us into botnet zombies
• Could we detect that (type systems for secure
information flow, format string vulnerabilities,
setuid analyses, …)?
• Could we prevent that (bytecode verifiers, proofcarrying code, “data execution prevention”, …)
• Example: buffer overruns and remote code
injections will soon be defeated. We believe the
next wave of attacks will go after non-control data.
We have a program analysis to find such critical
data and a transformation to protect it
automatically (using OS and VM hardware support).
• Dave Evans and I have grant money for a student to
work on any project that combines PL and Security
#13
Program Analysis And Privacy
• Project with Nina Mishra
• Specification mining problem: need traces!
• Can mutually distrusting parties work together to learn
specifications without giving away who has more bugs? Yes!
• Can Google, MSN, Yahoo, etc., share search and advertising
and click-through data in such a way they can still make
advertising money, defeat click-fraud, but without giving
away who you are?
#14
How Do We Do Research?
• Analysis and Design – create type systems, invent
transformations, take ideas from field X and use
them in field Y, …
• Proofs – structural induction, type safety,
construction, … (we’ve got Greek letters)
• Experiments – build systems and test them,
falsifiable claims, reproducibility, …
#15
PL: Cosmic Mayonnaise
• Two favorite areas? No problem!
• Since most of computing involves programs, it’s easy to
form a research project that crosscuts PL and …
– Security, Embedded Systems, Software Engineering: as before
– Systems: analyze J2EE ecommerce apps, distributed peer-to-peer
programs, “managed code operating systems”, concurrency, etc.
– Graphics: analyze the OpenGL or Direct3D aspects of programs,
provide better support for programming on graphics cards, …
– Databases: add transactional or ACID semantics to languages, verify
inlined SQL, support persistent objects, use DB techniques on
program traces, safely inject query plans
– Theory: we make heavy use of DFAs (lexing), PDAs (parsing), NFAs
(policies), linear logic (resource mgmt), temporal logic (fairness),
approximation algorithms (more than graph-coloring register alloc)
– Machine Learning: specification mining, profiling, as before, …
– Other: out of space on the slide …
#16
Faked
Photo-Ops
• Showing me
having fun with
my grad students
in exotic locales …
#17
The Breakfast of Champions
• At PL Research, we’ve pretty much got it all:
theory and practice, glitzy killer apps and hardcore fundamental problems. There’s a lot to do,
and that’s why we need people like you.
• Talk to your doctor of philosophy to see whether
PL® is right for you. Side effects were generally mild
and included reliable software, resistance to viruses,
increased hacking opportunities, decreased development
times, disappearing deadlocks and race conditions, ironclad
APIs, firmer theoretical bases, better specifications, more
privacy, …
#18
Summary
• Wes Weimer – PL Research – Wants Students
• Program Analyses – find critical data, analyze
shader code, …
• Program Transformations – CCured, security,
fixing bugs, better bug reports, …
• Bug-Finding – security bugs, type safety,
memory safety, resource usage, …
• Specification Mining – partial correctness
• PL And Privacy – specifications, search data
• PL For Security – non-control data attacks, …
#19
Any Questions?
• Even if you don’t care about PL (sigh!) I
would be happy to give advice about CS
research, industry and grad school.
• Want more info? This was just an appetizer!
• Breakout session (likely September 11, 3:30)
#20
Big Example #X: SLAM
• Verify critical properties of software or find bugs
• Take an important program (e.g., a device driver)
• Merge it with a property (e.g., no deadlocks, asynchronous
IRP handling, BSD sockets, database transactions, …)
• Transform the result into a boolean program
– Same control flow, but only boolean variables
• Use a model checker to explore the resulting state space
– Result 1: program provably satisfies property
– Result 2: program violates property right here on line 92,376!
#21
Helping Out Testing
• Finding bugs (e.g., bugs in Linux, bugs in Windows
device drivers, bugs in Java systems software, …)
• Preventing bugs (change the language, or add a
step to the “make” process, cf. PREfast)
• Automatically generating test cases
• Limiting test cases that must be run on a check-in
#22