Introduction to SimCorp’s new corporate PowerPoint template

Download Report

Transcript Introduction to SimCorp’s new corporate PowerPoint template

Parsing APL
for Static
Analysis
Speaker: Anders Schack-Nielsen, Ph.D.
Sept. 23rd 2014
Outline
• Background and Motivation
• Variable Types
• Static Analysis Tool
• Parsing APL
• Kind Inference
• BNF Grammar
2
Background
• APL codebase in SimCorp:
• 68000 functions
• 1.7m lines of code
• 215 APL developers actively developing and maintaining this codebase
• Additional functions and developers covering utilities, etc.
Motivation – example
• Programmer A writes function foo.
• A makes certain assumptions about
the input arguments.
∇ foo args
⍝: args should be ...
mat1 mat2 strings←args
...
(implicit assumptions)
• A documents his assumptions.
• Programmer K writes function bar
and calls foo.
• K has read the header of foo so he
knows what sort of arguments to
supply.
• For good measure he also tests it.
∇
∇ bar
...
foo mat1 mat2 strings
...
∇
Motivation – what can go wrong?
• Translating:
• A’s assumptions  documentation of foo  K’s understanding
• A lot can be missed, misinterpreted, or left out
• Test might not catch this
• Maintenance:
• Updates to foo
• Updates to bar
• Assumptions change – requires a synchronous update in three places to
be correct.
Solution: Variable Types
• Formalize assumptions – make them checkable.
• Introduce variable types and static analysis.
• Check header specification.
• Check foo against its header.
• Check the call to foo from bar.
∇ foo args
⍝: args[1] : mat1 As vtINT[;]
⍝:
[2] : mat2 As vtINT[mat1:1;mat1:2]
⍝:
[3] : strings As vtCHAR[][]
mat1 mat2 strings←args
...
∇
Static Analysis Tool
• First type checker was introduced in SimCorp 10 years ago.
• Worked well, but had many flaws.
• Recently, the tool has been rewritten from scratch.
• Many interesting challenges, e.g. parsing APL.
• 8k lines of F# including 500 lines of FsLex/FsYacc.
• Understands the semantics of all APL symbols and controlflow constructs.
• New type checker catches many things the old did not,
e.g. potentially all rank errors.
7
Real life example
∇ r←y textStringRemove x;h
⍝2: y As vtSTRING|vtSTRING[] : (string1)(string2).....
⍝3: x As vtSTRING|vtCHAR[;] : text vector or matrix
⍝4: r As vtSTRING|vtCHAR[;] : resulting text vector or m
...
∇
...
dbsource←' 'textStringRemove dbsource
tokens←'('textSplitAt')'textStringRemove dbsource
...
vtSTRING is a short-hand for vtCHAR[]
8
Parsing APL
• APL is statically un-parsable!
• However, it becomes parsable with only a few very minor restrictions.
• In fact, we can make an LALR(1) parser:
It is possible to define a completely disambiguated
BNF grammar, allowing us to code-generate the
parser using Yacc. I.e. we can parse APL from left
to right with only a single token lookahead and no
backtracking.
9
Parsing APL
x/¨y
MonadicApply
OperatorApply
OperatorApply
FunctionVariable(x)
10
ArrayVariable(y)
Each
Reduce
DyadicApply
ArrayVariable(x)
ArrayVariable(y)
OperatorApply
Replicate
Each
Parsing APL
• Values come in 3 kinds: Arrays, Functions, and Operators.
• Sequences of Arrays form vectors.
• Functions associate to the right.
• Operators associate to the left.
• Parsing needs complete kind information.
• Solution: Separate parsing in two steps with a kind inference algorithm
sandwiched in-between:
1. Parse control-flow and matching parentheses, effectively representing
expressions as mere token trees.
2. Do kind inference on the token trees.
3. Parse the token trees as full-fledged expressions.
11
Kind Inference
• Kind inference naturally proceeds from left to right:
• Consider e.g.: “x.y”, “x/y”, “x[y]”
• Left-to-right, depth-first scan:
• Individual tokens can be inferred based on the kinds of the tokens to the
left of it.
• Parenthesized expressions can have their compound kind inferred based
on the kinds of their subparts.
• Tag all tokens with their kind and all left-parentheses with the compound
kind they enclose.
12
Inferring compound kinds
• Kind sequence rewrite algorithm:
• Uses an elaboration into 5 kinds: Array (A), Function (F), Namespace indexer (.), Monadic
operator (M), and Dyadic operator (D).
K  K (done)
A A Ks  A Ks
A . Ks  Ks
A F Ks  A (done)
K D D Ks  A (done) // outer product
F F Ks  A (done)
F A Ks  A (done)
[AF] M Ks  F Ks
K D A A Ks  K D A Ks
K D A . Ks  K D Ks
[AF] D [AF] Ks  F Ks
*Assumes a minor preprocessing step that wraps “A . F” with parentheses. Also
slightly simplified assuming no “A . D” or “A . M”.
13
BNF Grammar (sample excerpt)
Expr:
| Vector Func Expr { DyadicApply(vector $1, $2, $3) }
| FuncLeftmost Expr { MonadicApply($1, $2) }
| Vector
{ vector $1 }
Vector:
| SimpleExprLeftmost
{ [$1] }
| SimpleExprLeftmost SimpleVector { $1 :: $2 }
SimpleExprLeftmost:
| AtomicExpr
{ $1 }
| Vector LBRACKET IdxList RBRACKET { Index(vector $1, $3) }
| NameSpaceExprLeftmost AtomicExpr { NameSpace($1, $2) }
AtomicExpr:
| LPAREN Expr RPAREN { $2 }
| IDARRAY
{ IdenArray($1) }
| INT
{ Value(parseInt($1)) }
| FLOAT
{ Value(Float(parseDouble($1))) }
| STRING
{ Value(parseStringValue($1)) }
| APLVALUE
{ Value(AplNil(parseNiladic($1))) }
Func:
| Func MonadicOperator
| Func DyadicOperatorFuncFunc SimpleFunc
| JOT DyadicOperatorFuncFunc SimpleFunc
| SimpleFunc
14
{
{
{
{
MonadicOpApply($1, $2) }
DyadicOpApply($2, FF($1, $3)) }
DyadicOpApply($2, FF(AplFunction(OuterProduct), $3)) }
$1 }
Restrictions – the fine print
• What were those restrictions to allow parsing?
• Defined operators need a static description of whether their operands
are functions or arrays. This is not a problem in practice.
• We need an environment describing all global variables and functions.
We need this anyway to typecheck function calls.
• (Minor quirk related to the :Until-:AndIf construction.)
15
16