Programming Languages Research and You: What Miracles Are We Cooking Up These Days? #1
Download ReportTranscript Programming Languages Research and You: What Miracles Are We Cooking Up These Days? #1
Programming Languages Research and You: What Miracles Are We Cooking Up These Days? #1 Dog and Pony Show • • • • • • High-Level Summary I am Wes Weimer I do PL research (plays well with others) I may not be as cool as Jack But I do have money And I’m looking for students Break-out session: see ~weimer webpage – Likely: September 11, 3:30pm #2 Talk Outline • • • • What is PL research in general? What have we done in the past? Possible cool future research … Hint: write down a key phrase, email or talk to me later … projects. Professor? an grad student … #3 Don’t We Already Have Compilers? #4 Dismal View Of PL Research C++ Java (or C#) #5 PL Research: Qu’est-ce que c’est? • Study programs and languages • 2002 US Annual Cost of Software Errors: $60B – 0.6% of the GDP (NIST) – Cost of 1 bug: $2k-$10k • Programs as artifacts – What should they be doing? – Are they doing it? – Are they making mistakes instead? – How might we fix them? • Language Design – Make some things easier e.g., compare Ruby / Python to C++ #6 Program Analyses • We write programs that analyze (or transform) other programs – cf. testing, >50% of a project’s budget – Alias analyses, shape analyses, verifiers, … • Doomed in theory but successful in practice Simplest examples: dataflow analyses and type systems #7 Domain-Specific Bug-Finding • Embedded components (e.g., cellphones) are programmed with special languages • Most large projects include their own custom languages (e.g., simulations, Work with Jason macros, mIRC scripts, game engines) Lawrence to apply PL • These are harder to debug and have techniques to vertex special semantics (= meanings) shader code (e.g., • Example: UnrealScript is C-ish but has caching, generating, type qualifiers like transient and travel optimization, …) • Example: “Players of [The Sims 2] are complaining that their artfully-crafted homes and mansions are beginning to resemble the Twilight Zone, thanks to an artifact of the game's design that causes hacks to spread like viruses from user to unwitting user.” SecurityFocus 2004-2005 • My research: found >800 real bugs in 4M LOC, proposed a new language feature to fix all of those errors, evaluated it with case studies #8 Big Example #1: CCured • Make systems programs as safe as Java but as fast as C – Safe = memory safety and type safety • Take an important C program (e.g., apache, bind, openssl) • Run a program analysis to classify all of the pointers in that program: – Safe Pointer – Sequence Pointer – Wild Pointer = no arithmetic, no casts = pointer arithmetic (i++), no casts = anything goes • Take that classification and transform the program: – Safe Pointer – Sequence Pointer – Wild Pointer = add a null check = add bounds (and null) checks = add full dynamic type checking • Resulting program is provably safe • But is < 30% slower than the original (cf. Purify: 50x slower) #9 Big Example #2: Specification Mining SF.openSession 1 S.close 5 S.beginTransaction 2 3 T.commit error T.rollback 4 error • In order to find bugs automatically, we must know what the program should be doing • Formal partial-correctness specification • Problem: hard to get them in practice • Our approach: learn them (machine learning) • Learn English grammar from high school student essays … • Take advantage of program structure and error information (e.g., program is more likely to make a mistake near unexpected errors) • On 1MLOC, our specification miner – learns 5.3x as many real specs as the next best miner – has 25x fewer false positives – those mined specs found 430 real bugs (vs 50-auto or 172-by-hand) #10 Big Example #3: Error Reporting • PL + Theory + Software Engineering • Even when we find bugs automatically, they often are not fixed! • Bug reports are too confusing • Our approach: include a candidate patch that provably fixes that bug and introduces no new known bugs • Helps maintainer to understand code and bug report – bug is more likely to be fixed #11 How would that work? • Somewhat like a spell-checker • Take the violated specification and construct a new FSM that will find the closest “not mistake” to the mistake we’re fixing • This gives a plan for fixing the bug: use PL to match that plan back to the source code • Does it work? Trial with 76 bug reports. 66% of those with patches were addressed, vs. only 21% of those without (statistically significant) A x B y C z Given bug “xz” we produce “x(ins y)z” D x y z A,0 B,0 C,0 D,0 ε (ins x) ε (ins y) ε (ins z) x (del x) x (del x) x y z y (del y) A,1 B,1 C,1 D,1 y (del y) z (del z) z (del z) ε (ins x) ε (ins y) ε (ins z) A,2 x B,2 y C,2 z D,2 #12 Program Analyses For Security • Don’t want rogue programs to send our info to MS or turn us into botnet zombies • Could we detect that (type systems for secure information flow, format string vulnerabilities, setuid analyses, …)? • Could we prevent that (bytecode verifiers, proofcarrying code, “data execution prevention”, …) • Example: buffer overruns and remote code injections will soon be defeated. We believe the next wave of attacks will go after non-control data. We have a program analysis to find such critical data and a transformation to protect it automatically (using OS and VM hardware support). • Dave Evans and I have grant money for a student to work on any project that combines PL and Security #13 Program Analysis And Privacy • Project with Nina Mishra • Specification mining problem: need traces! • Can mutually distrusting parties work together to learn specifications without giving away who has more bugs? Yes! • Can Google, MSN, Yahoo, etc., share search and advertising and click-through data in such a way they can still make advertising money, defeat click-fraud, but without giving away who you are? #14 How Do We Do Research? • Analysis and Design – create type systems, invent transformations, take ideas from field X and use them in field Y, … • Proofs – structural induction, type safety, construction, … (we’ve got Greek letters) • Experiments – build systems and test them, falsifiable claims, reproducibility, … #15 PL: Cosmic Mayonnaise • Two favorite areas? No problem! • Since most of computing involves programs, it’s easy to form a research project that crosscuts PL and … – Security, Embedded Systems, Software Engineering: as before – Systems: analyze J2EE ecommerce apps, distributed peer-to-peer programs, “managed code operating systems”, concurrency, etc. – Graphics: analyze the OpenGL or Direct3D aspects of programs, provide better support for programming on graphics cards, … – Databases: add transactional or ACID semantics to languages, verify inlined SQL, support persistent objects, use DB techniques on program traces, safely inject query plans – Theory: we make heavy use of DFAs (lexing), PDAs (parsing), NFAs (policies), linear logic (resource mgmt), temporal logic (fairness), approximation algorithms (more than graph-coloring register alloc) – Machine Learning: specification mining, profiling, as before, … – Other: out of space on the slide … #16 Faked Photo-Ops • Showing me having fun with my grad students in exotic locales … #17 The Breakfast of Champions • At PL Research, we’ve pretty much got it all: theory and practice, glitzy killer apps and hardcore fundamental problems. There’s a lot to do, and that’s why we need people like you. • Talk to your doctor of philosophy to see whether PL® is right for you. Side effects were generally mild and included reliable software, resistance to viruses, increased hacking opportunities, decreased development times, disappearing deadlocks and race conditions, ironclad APIs, firmer theoretical bases, better specifications, more privacy, … #18 Summary • Wes Weimer – PL Research – Wants Students • Program Analyses – find critical data, analyze shader code, … • Program Transformations – CCured, security, fixing bugs, better bug reports, … • Bug-Finding – security bugs, type safety, memory safety, resource usage, … • Specification Mining – partial correctness • PL And Privacy – specifications, search data • PL For Security – non-control data attacks, … #19 Any Questions? • Even if you don’t care about PL (sigh!) I would be happy to give advice about CS research, industry and grad school. • Want more info? This was just an appetizer! • Breakout session (likely September 11, 3:30) #20 Big Example #X: SLAM • Verify critical properties of software or find bugs • Take an important program (e.g., a device driver) • Merge it with a property (e.g., no deadlocks, asynchronous IRP handling, BSD sockets, database transactions, …) • Transform the result into a boolean program – Same control flow, but only boolean variables • Use a model checker to explore the resulting state space – Result 1: program provably satisfies property – Result 2: program violates property right here on line 92,376! #21 Helping Out Testing • Finding bugs (e.g., bugs in Linux, bugs in Windows device drivers, bugs in Java systems software, …) • Preventing bugs (change the language, or add a step to the “make” process, cf. PREfast) • Automatically generating test cases • Limiting test cases that must be run on a check-in #22