Transcript Document

2008 Working Conference on Reverse
Engineering
Grokking
Software Architecture
Richard C. Holt
Software Architecture Group (SWAG)
School of Computer Science, University of Waterloo, Canada
1
Retrospective
1998
2008
Ten years ago.
WCRE most
influential paper.
“Structural
Manipulations of
Software
Architecture using
Tarski Relational
Algebra”
Today.
Retrospective.
“Grokking
Software
Architecture”
17 papers in WCRE
2
Grokking Software Architecture
Grokking
Software architecture
3
Overview of Talk: 4 Parts
•
•
•
•
Part 1. 1998 paper: Hopes & claims
Part 2. Software Architecture
Part 3. Formalizing Boxology
Part 4. ROP: Relation-Oriented Programming
& Grok-Like Languages
4
Part 1. 1998 paper:
Hopes & claims
• Represent software architecture as a typed graph
– Graphs with “colors” of edges & nodes
• Manipulate & visualize these architectural graphs
• Manipulations can be specified algebraically -- and automatically executed
In brief: Formalize architectural diagrams and reap the benefits
arising from the corresponding mathematics.
5
Top View of As-Built Software
Architecture (250KLOC System)
6
View of One Subsystem of
the 250 KLOC System
Optimiz
DS.ss
ds
dsinit
dslvbb
dslvrg
dselim
mrgs
mdlv
lvlist
include
PL_
memuse
GEN
VN
dbg
SUPPORT
FLOW
7
CS 746G Topics in Software
Architecture
University of Waterloo
1) CS746 in Winter 1998 Linux (Operating System)
2) CS746 in Winter 1999 Apache (Web Server)
3) CS746 in Winter 2000 Mozilla (Web Browser)
4) CS746 in Winter 2001 Eazel Nautilus (File Manager)
5) CS798 in Winter 2002 Postgres et al (Data Base)
6) CS746 in Winter 2003 EMACS et al (Editor)
7) CS746 in Winter 2004 Gnumeric (Spreadsheet)
8) CS746 in Fall 2004 Mozilla (Web Browser -- again)
9) CS746 in Fall 2005 Open Office (Open Source Office Suite)
10)CS746 in Fall 2006 Asterisk (Open Phone Switch)
11)CS746 in Fall 2008 MySQL
8
Process of View Creation
Source code
Parser
Facts extracted
from code
Clustering
Hierarchic
decomposition
Grok:
Fact manipulator
Layouter
Browser
Architectural
diagram 9
Transformations to do Hiding
T
a
c
d
S
b
e
f
V
g
h
Graph G
d
T
a
V
b
Graph H = hide(hide(G,T),V)
e
f
Graph I = hideExt(G, S)
10
Lifting Calls Up to File Level
call is a procedure call
fileCall is a file level call
File
main.c
File
fileCall
funcDef
main
Procedure body
start.h
funcDcl
call
startup
Procedure header
fileCall := funcDef o call o inv funcDcl
11
Part 2. Software Architecture:
Boxology Approach
• Software architecture:
–
–
–
–
–
–
What is it?
State of practice
How is it represented
Keep It simple
Models & tools
Views of architecture
• Extracting As-Built architecture
12
Software Architecture:
What is it?
• Confusion. I have a sneaking suspicion
that ‘architecture’ is one of the most
overused and least understood terms in
professional software development circles.
Gorton
• Consensus. Architecture captures system
structure in terms of components [parts]
and how they interact. Gorton
13
Software Architecture:
State of the Practice
• “It’s common for there to be little or no
documentation covering the architecture in
many projects.” Gorton
• “I'm hopeless when it comes to
documentation.” Torvalds
• “The architecture that actually predominates
in practice is the ‘big ball of mud’ ” Foote et al
14
Software as Spaghetti
Foote et al
15
Software Architecture:
How is it Represented in Practice?
• …predominant tools used for architecture
documentation are Microsoft Word, Visio
and Power Point Gorton
• What’s needed: Concepts, notations and
tools that are
– easy to use and
– help us produce useful, understandable
documentation
16
KISS: Keep it Simple Stupid
“Any fool can make things bigger, more complex, and
more violent. It takes a touch of genius - and a lot of
courage - to move in the opposite direction.” Einstein
“Make everything as simple as possible, but not simpler.”
Einstein
17
Models and Tools for Software
Architecture
• “UML has, for better or (many would say)
worse, become the industry standard ADL
[Architecture Design Language]” Shaw
• UML “lacks, however, a robust suite of
tools for analysis, consistency checking”
Shaw
18
UML Component Diagram:
Box and Arrow Diagram
id Component View
OrderProcessing
«table»
read
writeQ
1
NewOrders
MailQueue
1
1
readQ
validate
1
CustomerSystem
1
1
1
writeQ
1
OrderQueue
readQ
SendEmail
1
send
1
MailServer
OrderSystem
19
Gorton
Views of Software Architecture
Kruchten
End user
As-Built
View
Users’ View
Programmers
& software
managers
Scenarios
Concurrency
View
Integrator
Deployment
View
System Engineer
20
Extracting the As-Built
Architecture from the Code
• “Reverse engineering is the process of
analyzing a subject system to create
representations of the system at a higher
level of abstraction.” Chikofsky
• Relational approach.
– Parse the code to produce relations, e.g
• (call, P, Q) means proc P calls Q
– Manipulate edges into as-built architecture
21
Boxology as a Central ADL
(Architectural Design Language)
• “The most widely used design notation [for
software architecture] is informal ‘block
and arrow’ diagrams.” Gorton
22
Cross Fertilization!! Rev Eng,
S/W Arch, Relational Approach
• Reverse engineering
– Architecture extraction
– As-Built view: Code is king
– Traceability
• Software architecture
– Need for representation & tools
– Simplicity & utility
• Relational approach
– Boxology
– Formalization --- Tarski algebra
23
Part 3. Formalizing Boxology
• Boxology is the “Representation of an
organized structure as a graph of labeled
nodes (‘boxes’) and connections between
them (as lines or arrows).” Wikipedia
• “Toward boxology: preliminary
classification of architectural styles”
Shaw
24
Example Typed Graph
r
C
a
I b
U
v
w
a
E
x
U
r
y
I
z
C
C
v U w
C
C
b
E
C
C
x U y
C = { (r,a), (r,b), (a,v), (a,w) (a,x), (b,y), (b,z) }
I = { (a,b) }
E = { (b,y) }
U = { (v,w), (x,y) }
z
25
Boxology is Just Scribbling?
• Box & arrow diagrams
–
–
–
–
–
Are just scribbles? No
Formalized by typed graphs
Visualized as (nested) boxes & arrows
Manipulated by Tarski algebra etc.
Exchanged as
• Triples (RSF), extended to TA, or GXL or …
26
Boxology has Semantics? Yes
• Compare to BNF
– Semantics by informal attachment to productions
• Compare to Codd’s relational approach
– Semantics by interpretation of tables.
• Semantics by attributes & descriptions
– Separation of concerns
– Structure then semantics
• Use box/arrow diagrams as underlying formalism
for software architecture (Mini-MOF?)
27
Adding Algebra to Boxology
• Tables then Codd relational algebra
– N-ary relations
• Boxes/arrows then Tarski relational algebra
– Binary relations
28
Example Typed Graph
r
C
a
I b
U
v
w
a
E
x
U
r
y
I
z
C
C
v U w
C
C
b
E
C
C
x U y
C = { (r,a), (r,b), (a,v), (a,w) (a,x), (b,y), (b,z) }
I = { (a,b) }
E = { (b,y) }
U = { (v,w), (x,y) }
z
29
Tarski Algebraic Operators
Union
Intersection
Difference
Inverse
Composition
Identity
Transitive Cl.
Reflex. T.C.
I + E = {(a,b), (b,y)}
E ^ C = {(b,y)}
C - E = {(r,a), (r,b), (a,v), (a,w), (a,x), (b,z)}
inv E = {(y,b)}
I o E = {(a,y)}
id = {(r,r), (a, a), (b,b), (w,w) … }
C+ = {(r,a), (r, b), (r,v), (r,w), (r,x), (r,y),
(r,z), (a,v), (a,w), (a,x), (b,y), (b,z)}
C* = ID + C+
30
TA Schemas for Box and Arrow
Diagrams
• A Schema in TA
call
proc
instance
instance
– Determines
•
•
•
•
ref
var
instance instance
p
q
x
Types of boxes
call
ref
Types of edges
Allowed connectivity between edges
Supports inheritance in schemas
y
– Also attributes (strings) on boxes & on edges
Malton WCRE 2005
31
Why Formalize Boxology??
Cause it Makes Our Life Better
• Clear understanding & clear specification
– What does RSF meaning?
– Meaning is independent of implementation
– Clarifies deeper concepts, e.g., expressiveness
•
•
•
•
Generality
Progress in reverse engineering
Progress in software architecture
Not just scribbling
32
Part 4. ROP: Relation-Oriented
Programming &
Grok-Like Languages
• A paradigm shift
33
Example: Mickey Eats Swiss Cheese
• Mickey . eat
– Swiss
– Roquefort
• eat . Mickey
The “eat” relation
Garfield
Fluffy
Mickey
Nancy
Swiss
Roquefort
– Garfield
– Fluffy
• eat o eat
–
–
–
–
(Garfield
(Garfield
(Fluffy
(Fluffy
• eat+
– ,,,
Swiss)
Roquefort)
Swiss)
Roquefort)
34
Example ROP/Grok Program:
Is relation R a tree?
How you would program this test …
35
Grok Program: Is R a Tree?
Pseudo code
if R has no loops &
R has one root &
R has only single parents
then
put “R is a tree”
Assume each node is a source or target of the contain C relation
36
Grok Program: Is R a Tree?
Pseudo code
Grok code
if # ( R+ ^ ID ) = 0
if R has no loops
Does transitive closure of R
have any self-loops? Yes
R
a
R
b
R
c
R
d
37
Grok Program: Is R a Tree?
Pseudo code
Grok code
if # ( R+ ^ ID ) = 0 &
if R has no loops &
# (dom R - rng R) = 1
R has one root
Does R have exactly
one source? Yes
a
dom
b
d
c
e
f
g
rng
38
Grok Program: Is R a Tree?
Pseudo code
Grok code
if # ( R+ ^ ID ) = 0 &
if R has no loops &
R has one root &
# (dom R - rng R) = 1 &
R has only single parents
# ((R o inv R) - ID) != 0
R o inv R
b
d
inv R
R
a
Does my child have
another parent? Yes
c
39
Grok Program: Is R a Tree?
Pseudo code
if R has no loops &
R has one root &
R has only single parents
then
put “R is a tree”
Grok code
if # ( R+ ^ ID ) = 0 &
# (dom R - rng R) = 1 &
# ((R o inv R) - ID) != 0
then
put “R is a tree”
Moral: Relational progamming is not like low level
(Java level) programming. Loops typically disappear.
40
Notation:
Does it Matter?
By relieving the brain of all unnecessary work,
a good notation sets it free to concentrate on
more advanced problems, and, in effect,
increases the mental power of the race.
Alfred North Whitehead
41
Wins & Losses Using Tarski
Algebra
• Wins
– Good for computing new edges, for finding
properties of edges, eg, nodes in loops, leaves,
etc.
• Losses
– Not good for locating patterns involving several
nodes, e.g., find complete connected sub-graphs
42
Notation:
Grok (Tarski) vs. Crocopat
My parent’s (P) children (C) are my (reflexive) siblings (S)
y
P
C
S
S := P o C
Grok
C
P
x
S
z
S(x,z) := EX(y, P(x,y) & C(y,z))
Crocopat
Should Crocopat add Tarski operators??
43
Characterizing Grok-Like
Languages
• Relational
• Useful for software analysis
• Expressiveness
– How powerful can a query be?
• Codd algebra and Crocopat are more powerful.
– How well can a query meet our needs? How writeable?
How readable?
• Performance of implementation
– Can hold large graphs?
– Fast enough to manipulate large graphs?
44
Performance of Grok-Like
Languages
• Size & speed: OK for --–
–
–
–
–
–
–
Grok & Crocopat
All memory resident, no disk access
Hundreds of thousands of edges
Modeling million-line systems
Most operations not more than a few seconds
Crocopat scales up a bit more for transitive closure
House keeping, e.g., time to read files, is critical
Need to test on 64-bit implementations
45
Data Structures for Binary
Relations
• Tables: One for each type of relation
• Single table of triples Grok
• Linked lists
– Pointers and nodes
Lsedit, JGrok (caches sorted lists)
• BDD: Binary Decision Diagram
–
–
–
–
DBMS
Relview, Crocopat
Memory efficient storage of binary relations
Works well with dense graphs
Proven useful RelView, Crocopat
Surprising (to me): BDD efficient for transitive closure
46
Grok-Like Languages
Language Author
Prolog
Colmerauer
et al.
SQL
Chamberlin
& Boyce
GraphLog Consens et
al.
Relview
Berghammer
et al.
Grok
Holt
RPA
Feijs et al.
GReQL
Kullbach &
Winter
JGrok
Wu
CrocoPat Beyer
Date
1972
1974
1989
1993
Discussion
of
Grok-Like
Languages
1996
1998
1999
2001
2003
PS: Paul Klint’s relational language ...
47
Progress:
Using Grok-Like Languages
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Enforce architecture rules. Holt 96, Feijs 98, Knodel 08
Lift dependency edges. Holt 98, Feijs 1998
Find design pattern instances. Consens 98, Beyer 02
Find violations of patterns. Guo 99
Find anti-patterns. vanEmden 02, Feijs 98
Change impact analysis. Feijs 98
Specify extraction from syntax. Lin 08
Find source of dependency. Fahmy 01, Feijs 98
Locate uses of protocols. Wu 01
Type inference using transitive closure. vanDeursen 99
48
Conclusions
Grokking Software Architecture
49
Conclusions
• Typed graphs nicely formalize various software structures
• Software architecture can benefit from a ROP approach
• Tarski algebra, added to boxology, is elegant
– Does not handle multi-node patterns
• Grok-like (ROP) languages are elegant and sufficiently
efficient
– ROP is high level, is faster, more reliable, more flexible
• Lots of
– Work done so far
– Room for more work
50