Refactoring Functional Programs Simon Thompson with Huiqing Li Claus Reinke www.cs.kent.ac.uk/projects/refactor-fp Session 2 AFP04 Overview Review mini-project. Implementation of HaRe. Larger-scale examples. Case study. AFP04

Download Report

Transcript Refactoring Functional Programs Simon Thompson with Huiqing Li Claus Reinke www.cs.kent.ac.uk/projects/refactor-fp Session 2 AFP04 Overview Review mini-project. Implementation of HaRe. Larger-scale examples. Case study. AFP04

Refactoring Functional
Programs
Simon Thompson
with
Huiqing Li
Claus Reinke
www.cs.kent.ac.uk/projects/refactor-fp
Session 2
AFP04
2
Overview
Review mini-project.
Implementation of HaRe.
Larger-scale examples.
Case study.
AFP04
3
Mini-project feedback
Refactorings performed.
Refactorings and language features?
Machine support feasible? Useful?
‘Not-quite’ refactorings? Support possible here?
AFP04
4
Examples
Argument permutations (NB partial application).
(Un)group arguments.
Slice function for a component of its result.
Error handling / exception handling.
AFP04
5
More examples
Introduce type synonym, selectively.
Introduce ‘branded’ type.
Modify the return type of a function from T to
Maybe T, Either T S, [T].
Ditto for input types … and modify variable names
correspondingly.
AFP04
6
Implementing HaRe
AFP04
7
Proof of concept …
To show proof of concept it is enough to:
build a stand-alone tool,
work with a subset of the language,
pretty print the results of refactorings.
AFP04
8
… or a useful tool?
Integrate with existing program development
tools: stand-alone program links to editors
emacs and vim, any other IDEs also possible.
Work with the complete language: Haskell 98?
Preserve the formatting and comments in the
refactored source code.
Allow users to extend and script the system.
AFP04
9
The refactorings in HaRe
Rename
Move def between modules
Delete /add to exports
Delete
Clean imports
Lift / Demote
Make imports explicit
Introduce definition
Remove definition
Data type to ADT
Unfold
All these refactorings
Generalise
are module aware.
Add / remove params
AFP04
10
The Implementation of HaRe
Information
gathering
Pre-condition
checking
Program
transformation
Program
rendering
AFP04
11
Information needed
Syntax: replace the function called sq, not the
variable sq …… parse tree.
Static semantics: replace this function sq, not all
the sq functions …… scope information.
Module information: what is the traffic between
this module and its clients …… call graph.
Type information: replace this identifier when it is
used at this type …… type annotations.
AFP04
12
Infrastructure: decisions
Build a tool that can interoperate with emacs,
vim, … yet act separately.
Leverage existing libraries for processing
Haskell 98, for tree transformation … as few
modifications as possible.
Be as portable as possible, in the Haskell space.
Abstract interface to compiler internals?
AFP04
13
Haskell landscape (end 2002)
Parser:
many
Type checker:
few
Tree transformations:
few
Difficulties
Haskell 98 vs. Haskell extensions.
Libraries: proof of concept vs. distributable.
Source code regeneration.
Real project
AFP04
14
Programatica
Project at OGI to build a Haskell system …
… with integral support for verification at various
levels: assertion, testing, proof etc.
The Programatica project has built a Haskell front
end in Haskell, supporting syntax, static, type
and module analysis …
… freely available under BSD licence.
AFP04
15
The Implementation of HaRe
Information
gathering
Pre-condition
checking
Program
transformation
Program
rendering
AFP04
16
First steps … lifting and friends
Use the Haddock parser … full Haskell given in
500 lines of data type definitions.
Work by hand over the Haskell syntax: 27 cases
for expressions …
Code for finding free variables, for instance …
AFP04
17
Finding free variables ‘by hand’
instance FreeVbls HsExp where
freeVbls (HsVar v) = [v]
freeVbls (HsApp f e)
= freeVbls f ++ freeVbls e
freeVbls (HsLambda ps e)
= freeVbls e \\ concatMap paramNames ps
freeVbls (HsCase exp cases)
= freeVbls exp ++ concatMap freeVbls cases
freeVbls (HsTuple _ es)
= concatMap freeVbls es
… etc.
AFP04
18
This approach
Boilerplate code … 1000 lines for 100 lines of
significant code.
Error prone: significant code lost in the noise.
Want to generate the boiler plate and the tree
traversals …
… DriFT: Winstanley, Wallace
… Strafunski: Lämmel and Visser
AFP04
19
Strafunski
Strafunski allows a user to write general (read
generic), type safe, tree traversing programs,
with ad hoc behaviour at particular points.
Top-down / bottom up, type preserving / unifying,
full
AFP04
stop
one
20
Strafunski in use
Traverse the tree accumulating free variables
from components, except in the case of lambda
abstraction, local scopes, …
Strafunski allows us to work within Haskell …
Other options? Generic Haskell,
Template Haskell, AG, …
AFP04
21
Rename an identifier
rename:: (Term t)=>PName->HsName->t->Maybe t
rename oldName newName = applyTP worker
where
worker = full_tdTP (idTP ‘adhocTP‘ idSite)
idSite :: PName -> Maybe PName
idSite v@(PN name orig)
| v == oldName
= return (PN newName orig)
idSite pn = return pn
AFP04
22
The coding effort
Transformations: straightforward in Strafunski …
… the chore is implementing conditions that the
transformation preserves meaning.
This is where much of our code lies.
AFP04
23
Move f from module A to B
Is f defined at the top-level of B?
Are the free variables in f accessible within module B?
Will the move require recursive modules?
Remove the definition of f from module A.
Add the definition to module B.
Modify the import/export lists in module A, B and the
client modules of A and B if necessary.
Change uses of A.f to B.f or f in all affected modules.
Resolve ambiguity.
AFP04
24
The Implementation of HaRe
Information
gathering
Pre-condition
checking
Program
transformation
Program
rendering
AFP04
25
Program rendering example
-- This is an example
module Main where
sumSquares x y = sq x + sq y
where sq :: Int->Int
sq x = x ^ pow
pow = 2 :: Int
main = sumSquares 10 20
Promote the definition of sq to top level
AFP04
26
Program rendering example
module Main where
sumSquares x y
= sq pow x + sq pow y where pow = 2 :: Int
sq :: Int->Int->Int
sq pow x = x ^ pow
main = sumSquares 10 20
Using a pretty printer: comments lost and layout
quite different.
AFP04
27
Program rendering example
-- This is an example
module Main where
sumSquares x y = sq x + sq y
where sq :: Int->Int
sq x = x ^ pow
pow = 2 :: Int
main = sumSquares 10 20
Promote the definition of sq to top level
AFP04
28
Program rendering example
-- This is an example
module Main where
sumSquares x y = sq pow x + sq pow y
where pow = 2 :: Int
sq :: Int->Int->Int
sq pow x = x ^ pow
main = sumSquares 10 20
Layout and comments preserved.
AFP04
29
Token stream and AST
White space and comments in the token stream.
Modification of the AST guides the modification of
the token stream.
After a refactoring, the program source is
extracted from the token stream not the AST.
Heuristics associate comments with program
entities.
AFP04
30
Production tool
Programatica
parser and
type checker
AFP04
Refactor
using a
Strafunski
engine
Render code
from the
token stream
and
syntax tree.
31
Production tool (optimised)
Programatica
parser and
type checker
AFP04
Refactor
using a
Strafunski
engine
Pass lexical
information to
update the
syntax tree
and so avoid
reparsing
Render code
from the
token stream
and
syntax tree.
32
What have we learned?
Emerging Haskell libraries make it practical(?)
Efficiency and robustness
• type checking large systems,
• linking,
• editor script languages (vim, emacs).
Limitations of editor interactions.
Reflections on Haskell itself.
AFP04
33
Reflections on Haskell
Cannot hide items in an export list (cf import).
Field names for prelude types?
Scoped class instances not supported.
‘Ambiguity’ vs. name clash.
‘Tab’ is a nightmare!
Correspondence principle fails …
AFP04
35
Correspondence
Operations on definitions and operations on
expressions can be placed in one to one
correspondence
(R.D.Tennent, 1980)
AFP04
36
Correspondence
Definitions
Expressions
where
let
fxy=e
\x y -> e
fx
| g1 = e1
| g2 = e2
f x = if g1 then e1
g2 … …
AFP04
else if
37
Function clauses
fx
| g1 = e1
f x = if g1 then e1
g2 …
else if
fx
| g2 = e2
Can ‘fall through’ a function
clause … no direct
correspondence in the
expression language.
AFP04
No clauses for
anonymous functions …
no reason to omit them.
38
Work in progress
‘Fold’ against definitions … find duplicate code.
All, some or one? Effect on the interface …
fx=…e…e…
Traditional program transformations
• Short-cut fusion
• Warm fusion
AFP04
39
Where next?
Opening up to users: API or little language?
Link with other IDEs (and front ends?).
Detecting ‘bad smells’.
More useful refactorings supported by us.
Working without source code.
AFP04
40
API
Refactorings
Refactoring
utilities
Strafunski
Haskell
AFP04
41
DSL
Combining forms
Refactorings
Refactoring
utilities
Strafunski
Haskell
AFP04
42
Larger-scale examples
More complex examples in the functional domain;
often link with data types.
Dawning realisation that can some refactorings
are pretty powerful.
Bidirectional … no right answer.
AFP04
43
Algebraic or abstract type?
data Tr a
flatten :: Tr a -> [a]
= Leaf a |
Node a (Tr a) (Tr a)
Tr
Leaf
Node
flatten (Leaf x) = [x]
flatten (Node s t)
= flatten s ++
flatten t
AFP04
44
Algebraic or abstract type?
Tr
data Tr a
= Leaf a |
Node a (Tr a) (Tr a)
isLeaf = …
isNode = …
…
isLeaf
isNode
leaf
left
right
mkLeaf
mkNode
AFP04
flatten :: Tr a -> [a]
flatten t
| isleaf t = [leaf t]
| isNode t
= flatten (left t)
++ flatten (right t)
45
Algebraic or abstract type?

Pattern matching syntax is
more direct …
… but can achieve a
considerable amount with
field names.
Other reasons? Simplicity
(due to other refactoring
steps?).
AFP04

Allows changes in the
implementation type without
affecting the client: e.g.
might memoise
Problematic with a primitive
type as carrier.
Allows an invariant to be
preserved.
46
Outside or inside?
Tr
data Tr a
isLeaf
= Leaf a |
isNode
Node a (Tr a) (Tr a)
flatten :: Tr a -> [a]
leaf
flatten t
left
| isleaf t = [leaf t]
isLeaf = …
right
isNode = …
mkLeaf
= flatten (left t)
…
mkNode
++ flatten (right t)
AFP04
| isNode t
47
Outside or inside?
Tr
data Tr a
isLeaf
= Leaf a |
isNode
Node a (Tr a) (Tr a)
leaf
left
isLeaf = …
right
isNode = …
mkLeaf
mkNode
flatten t = …
AFP04
flatten
48
Outside or inside?


If inside and the type is
reimplemented, need to
reimplement everything in
the signature, including
flatten.
If inside can modify the
implementation to memoise
values of flatten, or to give a
better implementation using
the concrete type.
The more outside the
better, therefore.
Layered types possible: put
the utilities in a privileged
zone.
AFP04
49
Memoise flatten :: Tr a->[a]
data Tree a
= Leaf { val::a } |
Node { val::a,
left,right::(Tree a) }
leaf = Leaf
data Tree a
= Leaf { val::a,
flatten:: [a] } |
Node { val::a,
left,right::(Tree a),
flatten::[a] }
node = Node
flatten (Leaf x) = [x]
flatten (Node x l r) =
(x : (flatten l ++ flatten r))
AFP04
leaf x
= Leaf x [x]
node x l r
= Node x l r (x : (flatten l ++
flatten r))
50
Memoise flatten
Invisible outside the implementation module, if
tree type is already an ADT.
Field names in Haskell make it particularly
straightforward.
AFP04
51
Data type or existential type?
data Shape
= Circle Float |
Rect Float Float
data Shape
= forall a. Sh a => Shape a
class Sh a where
area :: Shape -> Float
area :: a -> Float
area (Circle f) = pi*r^2
perim :: a -> Float
area (Rect h w) = h*w
data Circle = Circle Float
perim :: Shape -> Float
perim (Circle f) = 2*pi*r
instance Sh Circle
perim (Rect h w) = 2*(h+w)
area (Circle f) = pi*r^2
perim (Circle f) = 2*pi*r
data Rect = Rect Float
instance Sh Rect
area (Rect h w) = h*w
perim (Rect h w) = 2*(h+w)
AFP04
52
Constructor or constructor?
data Expr
= Epsilon | .... |
Then Expr Expr |
Star Expr
data Expr
= Epsilon | .... |
Then Expr Expr |
Star Expr |
Plus Expr
plus e = Then e (Star e)
AFP04
53
Monadification: expressions
data Expr
= Lit Integer |
-- Literal integer value
Vbl Var |
-- Assignable variables
Add Expr Expr |
-- Expression addition: e1+e2
Assign Var Expr
-- Assignment: x:=e
type Var = String
type Store = [ (Var, Integer) ]
lookup :: Store -> Var -> Integer
lookup st x = head [ i | (y,i) <- st, y==x ]
update :: Store -> Var -> Integer -> Store
update st x n = (x,n):st
AFP04
54
Monadification: evaulation
eval :: Expr ->
evalST :: Expr ->
Store -> (Integer, Store)
State Store Integer
eval (Lit n) st
= (n,st)
evalST (Lit n)
= do
return n
eval (Vbl x) st
= (lookup st x,st)
evalST (Vbl x)
= do
st <- get
return (lookup st x)
AFP04
55
Monadification: evaulation 2
eval :: Expr ->
evalST :: Expr ->
Store -> (Integer, Store)
State Store Integer
eval (Add e1 e2) st
= (v1+v2, st2)
where
(v1,st1) = eval e1 st
(v2,st2) = eval e2 st1
eval (Assign x e) st
= (v, update st' x v)
where
(v,st') = eval e st
AFP04
evalST (Add e1 e2)
= do
v1 <- evalST e1
v2 <- evalST e2
return (v1+v2)
evalST (Assign x e)
= do
v <- evalST e
st <- get
put (update st x v)
return v
56
Classes and instances
Type Store = [Int]
empty :: Store
empty = []
get :: Var -> Store -> Int
get v st = head [ i | (var,i) <- st, var==v]
set :: Var -> Int -> Store -> Store
set v i = ((v,i):)
AFP04
57
Classes and instances
Type Store = [Int]
empty :: Store
get :: Var -> Store -> Int
set :: Var -> Int -> Store -> Store
empty = []
get v st = head [ i | (var,i) <- st, var==v]
set v i = ((v,i):)
AFP04
58
Classes and instances
class Store a where
empty :: a
get :: Var -> a -> Int
set :: Var -> Int -> a -> a
instance Store [Int] where
empty = []
get v st = head [ i | (var,i) <- st, var==v]
set v i = ((v,i):)
Need newtype wrapper in Haskell 98 …
end
AFP04
59
Understanding a program
Take a working semantic tableau system written
by an anonymous 2nd year student …
… refactor to understand its behaviour.
Nine stages of unequal size.
Reflections afterwards.
AFP04
62
An example tableau
((AC)((AB)C))
((AB)C)
(AC)
(AB)
C
A
A
AFP04




C
B
Make B True
Make A and C False
63
v1: Name types
Built-in types
[Prop]
[[Prop]]
used for branches and
tableaux respectively.
Modify by adding
Change required
throughout the program.
Simple edit: but be aware of
the order of substitutions:
avoid
type Branch = Branch
type Branch = [Prop]
type Tableau = [Branch]
AFP04
64
v2: Rename functions
Existing names
Add test datum.
tableaux
removeBranch
remove
become
Discovered some edits
undone in stage 1.
tableauMain
removeDuplicateBranches
removeBranchDuplicates
and add comments
clarifying the (intended)
behaviour.
AFP04
Use of the type checker to
catch errors.
test
will be useful later?
65
v3: Literate  normal script
Change from literate form:
Comment …
Editing easier: implicit
assumption was that it was
a normal script.
> tableauMain tab
> = ...
to
-- Comment …
Could make the switch
completely automatic?
tableauMain tab
= ...
AFP04
66
v4: Modify function definitions
From explicit recursion:
displayBranch
:: [Prop] -> String
displayBranch [] = []
displayBranch (x:xs)
= (show x) ++ "\n" ++
displayBranch xs
to
displayBranch
Abstraction: move from
explicit list representation to
operations such as map and
concat which could be over any
collection type.
First time round added
incorrect (but type correct)
redefinition … only spotted at
next stage.
:: Branch -> String
displayBranch
Version control: un/redo etc.
= concat . map (++"\n") . map show
AFP04
67
v5: Algorithms and types (1)
removeBranchDup :: Branch -> Branch
removeBranchDup [] = []
removeBranchDup (x:xs)
| x == findProp x xs = [] ++ removeBranchDup xs
| otherwise
= [x] ++ removeBranchDup xs
findProp :: Prop -> Branch -> Prop
findProp z [] = FALSE
findProp z (x:xs)
| z == x = x
| otherwise = findProp z xs
AFP04
68
v5: Algorithms and types (2)
removeBranchDup :: Branch -> Branch
removeBranchDup [] = []
removeBranchDup (x:xs)
| findProp x xs
= [] ++ removeBranchDup xs
| otherwise
= [x] ++ removeBranchDup xs
findProp :: Prop -> Branch -> Bool
findProp z [] = False
findProp z (x:xs)
| z == x = True
| otherwise = findProp z xs
AFP04
69
v5: Algorithms and types (3)
removeBranchDup :: Branch -> Branch
removeBranchDup = nub
findProp :: Prop -> Branch -> Bool
findProp = elem
AFP04
70
v5: Algorithms and types (4)
removeBranchDup :: Branch -> Branch
removeBranchDup = nub
Fails the test! Two duplicate branches output, with different
ordering of elements.
The algorithm used is the 'other' nub algorithm, nubVar:
nub [1,2,0,2,1] = [1,2,0]
nubVar [1,2,0,2,1] = [0,2,1]
Code using lists in a particular order to represent sets.
AFP04
71
v6: Library function to module
Add the definition:
nubVar = …
Editing easier: implicit
assumption was that it was
a normal script.
to the module
ListAux.hs
Could make the switch
completely automatic?
and replace the definition
by
import ListAux
AFP04
72
v7: Housekeeping
Remanings: including foo
and bar and contra (becomes
notContra).
Generally cleans up the
script for the next
onslaught.
An instance of filter,
looseEmptyLists
is defined using filter, and
subsequently inlined.
Put auxiliary function into a
where clause.
AFP04
73
v8: Algorithm (1)
splitNotNot :: Branch -> Tableau
splitNotNot ps = combine (removeNotNot ps) (solveNotNot ps)
removeNotNot :: Branch -> Branch
removeNotNot [] = []
removeNotNot ((NOT (NOT _)):ps) = ps
removeNotNot (p:ps) = p : removeNotNot ps
solveNotNot :: Branch -> Tableau
solveNotNot [] = [[]]
solveNotNot ((NOT (NOT p)):_) = [[p]]
solveNotNot (_:ps) = solveNotNot ps
AFP04
74
v8: Algorithm (2)
splitXXX removeXXX solveXXX
for each of nine rules.
The algorithm applies rules in a prescribed order, using an
integer value to pass information between functions.
Aim: generic versions of split
remove
solve
Change order of rule application … effect on duplicates.
Add map sort to top level pipeline before duplicate removal.
AFP04
75
v9: Replace lists by sets.
Wholesale replacement of lists by a Set library.
map
mapSet
foldrfoldSet
(careful!)
filter filterSet
The library exposes the representation: pick, flatten.
Use with discretion … further refactoring possible.
Library needed to be augmented with
primRecSet :: (a -> Set a -> b -> b) -> b -> Set a -> b
AFP04
76
v9: Replace lists by sets (2)
Drastic simplification: no explicit worries about
… ordering (and equality), (removal of) duplicates.
Hard to test intermediate stages: type change is all or
nothing …
… work with dummy definitions and the type checker.
Further opportunities: why choose one rule from a set
when could apply to all elements at once? Gets away from
picking on one value (and breaking the set interface).
AFP04
77
Conclusions of the case study
Heterogeneous process: some small, some large.
Are all these stages strictly refactorings: some
semantic changes always necessary too?
Importance of type checking for hand refactoring
… … and testing when any semantic changes.
Undo, redo, reordering the refactorings … CVS.
In this case, directional … not always the case.
AFP04
78
Teaching and learning design
Exciting prospect of using a refactoring tool as an
integral part of an elementary programming
course.
Learning a language: learn how you could modify
the programs that you have written …
… appreciate the design space, and
… the features of the language.
AFP04
79
Conclusions
Refactoring + functional programming: good fit.
Real benefit from using available libraries
… with work.
Want to use the tool in building itself.
Much more to do than we have time for.
AFP04
80