How To Address Rapidly Changing Data Representations in an Evolving Scientific Domain Using Aspect-oriented Programming Techniques + Overview of Bioinformatics at NEU. Karl Lieberherr ([email protected]) College.

Download Report

Transcript How To Address Rapidly Changing Data Representations in an Evolving Scientific Domain Using Aspect-oriented Programming Techniques + Overview of Bioinformatics at NEU. Karl Lieberherr ([email protected]) College.

How To Address Rapidly Changing Data
Representations in an Evolving Scientific
Domain Using Aspect-oriented
Programming Techniques +
Overview of Bioinformatics at NEU.
Karl Lieberherr ([email protected])
College of Computer and Information
Science
Northeastern University
Boston
3/7/2003
Bioinformatics
1
Motivation
 From: Computational Challenges in
Structural and Functional Genomics by J.
Head-Gordon, IBM SYSTEMS JOURNAL,
VOL 40, NO 2, 2001.
3/7/2003
Bioinformatics
2
Some Quotes From HeadGordon.
 Although techniques for warehousing
techniques are as vital in the sciences as in
business, functional warehouses tailored for
specific scientific needs are few and far
between.
 A key technical reason for this discrepancy
is that our understanding of the concepts
being explored in an evolving scientific
domain change constantly, leading to rapid
changes in data representation.
3/7/2003
Bioinformatics
3
Some Quotes From HeadGordon (Refinement).
 … evolving scientific domain change
constantly, leading to rapid changes in data
representation.
 Not only changes in data representation but
also changes in interfaces – need protection
against changes in interfaces.
 Examples: additional or modified fields or
arguments; additional or modified types.
3/7/2003
Bioinformatics
4
More Quotes From HeadGordon.
 When the format of source data changes, the
warehouse must be updated to read that source or
it will not function properly. The bulk of these
modifications involve extremely tedious, low-level
translation and integration tasks that typically
require the full attention of both database and
domain experts. Given the lack of the ability to
automate this work, warehouse maintenance costs
are prohibitive, and warehouse “up-times” severely
restricted.
3/7/2003
Bioinformatics
5
Protect Against Changes.
 Protection against changes in data representation and interfaces.
Traditional technique: information-hiding is good to protect
against changes in data representation. Does not help with
changes to interfaces.
 Need more than information hiding to protect against interface
changes: restriction through shy programming, called Adaptive
Programming (AP).
Implementation
Interface
Information Hiding
3/7/2003
Client
Shy Programming
Bioinformatics
6
Problem with Information Hiding
 Shy Programming builds on the observation that
traditional black-box composition is not
restricting enough. We use the slogan:
information hiding is not hiding enough.
Blackbox composition isolates the
implementation from the interface, but does not
decouple the interface from its clients.
3/7/2003
Bioinformatics
7
Cover unimportant parts of the
interface
 To permit interfaces to evolve, self-discipline is
required to prevent from programming
extensively against the interface. Certain parts of
the interface are best left as if they were covered.
Implementation
Interface
Information Hiding
3/7/2003
Client
Shy Programming
Bioinformatics
8
Shy Programming =
Adaptive Programming
 This disciplined programming is referred to as
shy programming. Shy programming lets the
program recover from (or adapt to) interface
changes. Shy programming is also called
Adaptive Programming (AP). This is similar to
the shyness metaphor in the Law of Demeter
(LoD): structure evolves over time, thus
communicate with just a subset of the visible
objects.
3/7/2003
Bioinformatics
9
Decoupling of Interface
 We summarize the commonalities and differences
between black-box composition and Shy Programming
into two principles.
– Black-box Principle: the representation of objects can be
changed without affecting clients.
– Shy-Programming Principle: the interface of objects can be
changed within certain parameters without affecting clients.
 It is important to notice that the Shy-Programming
Principle builds on top of the Black-Box principle.
3/7/2003
Bioinformatics
10
Want to learn about organizing bioinformatics knowledge.
Manager Metaphor.
M
G
W
 A manager M is managing a set of group leaders
G, each one managing a set of workers W. We
consider issues related to informing M and
requesting information from M. We use this
example to illustrate three points.
– Micromanager – no information restriction.
– Shyness – helps information restriction.
– Complex requests – help information restriction and
optimization.
3/7/2003
Bioinformatics
11
Manager Metaphor.
 Micromanager – no information restriction.
– If the manager is a micromanager (a manager that
wants to know about and rely on all the details of the
worker’s projects), the managing approach is brittle
because when there is a change in the details of one
of the worker’s projects, the manager needs to be
notified.
M
G
W
3/7/2003
Bioinformatics
12
Manager Metaphor.
 Micromanager – no information restriction (continued).
– An object-oriented program written in the usual way
corresponds to the manager that likes to micromanage. It is
full of detailed knowledge of the class graph. An alternative
way of formulating the same idea is to observe that it is good
when the workers are shy. A shy worker will only share
minimal, high-level information with the group leader. And
this will prevent a brittle situation where the group leaders
and manager rely on too much detail.
M
G
W
3/7/2003
Bioinformatics
13
Manager Metaphor.
 Shyness – helps information restriction
M
G
W
– It is good for the workers to be shy and only talk to their
group leader and not to the manager directly. (Shyness has
two facets: talk only to a few friends AND share minimal
information with them. Here we use the first facet while in the
previous point we used the second facet.) The group leader
will abstract the information from the workers and only pass
on the abstract information to the manager. This will prevent
the manager from micromanaging. This variant can be viewed
as an application of the Law of Demeter (LoD) which states
that an object should talk only to closely related objects. The
closely related object for a worker is the group leader and not
the manager.
3/7/2003
Bioinformatics
14
Manager Metaphor.
 Shyness – helps information restriction
(continued).
– The motivation is that when things change at the
worker level, the manager does not have to be
informed necessarily. The group leader will be
informed and will decide whether the information
needs to be passed up.
M
shielded
G
W
3/7/2003
Bioinformatics
15
Manager Metaphor.
 Complex requests – help information restriction and
optimization.
– The manager does not want to be bothered by many simple
requests from the many workers. Instead the manager prefers
to get a complex request from time to time from a group
manager. The complex request offers the manager the
possibility to see all the requests as a whole and to optimize
the overall result which would not be possible if simple
requests come one by one and need to be satisfied
immediately before the totality of all simple requests is seen.
3/7/2003
Bioinformatics
16
Manager Metaphor.
 Complex requests – help information restriction
and optimization (continued).
– The same point applies to programming: instead of
sending an object a lot of individual data access
requests, it is better to send one complex request that
can be treated as a whole and optimized accordingly.
3/7/2003
Bioinformatics
17
Aspect-oriented Programming
(AOP).
 AOP is programming with aspects. An aspect is
a complex request to modify the execution of a
program. May expose a large interface. This can
be implemented efficiently by inserting code at
compile time into the program. An aspect should
be shy with respect to the program it modifies.
3/7/2003
Bioinformatics
18
AOSD: not every concern fits into
a component: crosscutting
CM1 CM2 CM3 CM4 CM5 CM6
CR1 x
CR2
x
CR3
CR4 x
x
x
x
x
Goal: find new component structures that encapsulate “rich” concerns
3/7/2003
Bioinformatics
19
A Reusable Aspect.
abstract public aspect RemoteExceptionLogging {
abstract pointcut logPoint();
abstract
after() throwing (RemoteException e): logPoint() {
log.println(“Remote call failed in: ” +
thisJoinPoint.toString() +
“(” + e + “).”);
}
}
public aspect MyRMILogging extends RemoteExceptionLogging {
pointcut logPoint():
call(* RegistryServer.*.*(..)) ||
call(private * RMIMessageBrokerImpl.*.*(..));
}
3/7/2003
Bioinformatics
20
Good Aspects Are Shy.
abstract aspect CapabilityChecking {
pointcut invocations(Caller c):
this(c) && call(void Service.doService(String));
pointcut workPoints(Worker w):
target(w) && call(void Worker.doTask(Task));
pointcut perCallerWork(Caller c, Worker w):
cflow(invocations(c)) && workPoints(w);
before (Caller c, Worker w): perCallerWork(c, w) {
w.checkCapabilities(c);
}
}
3/7/2003
Bioinformatics
21
Lessons From Manager
Metaphor.
 Information hiding does not hide enough.
Information hiding makes all public interfaces
available and (Micromanager) makes the point
that only an abstraction of those interfaces
should be visible at higher levels.
3/7/2003
Bioinformatics
22
Lessons From Manager
Metaphor (Continued).
 In Shy Programming, only high-level information about
the class or call graph is visible at the (shy)
programming level and this shields the program from
many changes to the class or call graph in the same way
as the manager is shielded from many of the changes in
the workers’ projects. The role of the group leader is
played by the glue code that maps high-level
information to low-level information and vice-versa.
Shy Programming is graph-shy.
3/7/2003
Bioinformatics
23
Application to Bioinformatics
Knowledge
 Need shy programming and shy knowledge
representation techniques for
Bioinformatics.
 Need domain-specific languages to define
function in a structure-shy way.
3/7/2003
Bioinformatics
24
Another Good Example of AOP.
find all persons waiting at any bus stop on a bus route
busStops
BusRoute
buses
BusList
0..*
BusStopList
OO solution:
one method
for each red
class
BusStop
waiting
passengers
Bus
PersonList
Person
3/7/2003
0..*
Bioinformatics
0..*
25
find all persons waiting at any bus stop on a bus route
Traversal Strategy.
from BusRoute through BusStop to Person
A complex request
busStops
BusRoute
BusStopList
buses
0..*
BusStop
BusList
0..*
waiting
passengers
Bus
PersonList
Person
3/7/2003
Bioinformatics
0..*
26
find all persons waiting at any bus stop on a bus route
Robustness of Strategy.
from BusRoute through BusStop to Person
Complex request is class-graph shy
BusRoute
buses
BusList
0..*
villages
VillageList
0..*
Village
busStops
0..*
BusStop
waiting
passengers
Bus
PersonList
Person
3/7/2003
BusStopList
Bioinformatics
0..*
27
Writing Aspect-oriented
Programs With Strategies.
String WPStrategy=“from BusRoute through BusStop to Person”
class BusRoute {
int countWaitingPersons() {
A complex request
Integer result = (Integer)
Main.cg.traverse(this, WPStrategy,
new Visitor(){ int r ;
public void before(Person host){ r++; }
public void start() { r = 0;}
public Object getReturnValue()
{return new Integer(r);}
Complex request
}); return result.intValue();}
plays role of
}
manager
Complex request is class-graph shy
3/7/2003
Bioinformatics
28
Writing Aspect-Oriented
Programs With Strategies.
String WPStrategy=“from BusRoute through BusStop to Person”
// Prepare current class graph
Main.cg = new ClassGraph();
int r = aBusRoute.countWaitingPersons();
3/7/2003
Bioinformatics
29
ObjectGraph: in UML Notation.
Route1:BusRoute
busStops
buses
:BusList
:BusStopList
CentralSquare:BusStop
waiting
:PersonList
Bus15:Bus
passengers
:PersonList
Joan:Person
Paul:Person
Seema:Person
Eric:Person
3/7/2003
Bioinformatics
30
ObjectGraphSlice.
Route1:BusRoute
busStops
buses
BusList
:BusStopList
CentralSquare:BusStop
waiting
:PersonList
Bus15:Bus
passengers
:PersonList
Joan:Person
Paul:Person
Seema:Person
Eric:Person
3/7/2003
Bioinformatics
31
Summary So Far.
 Aspect-oriented software development helps
to create software that is
– More flexible; supports easy adaptation to
rapidly changing interfaces.
– Easier to understand and also shorter.
– Supports the Shy Programming Principle.
3/7/2003
Bioinformatics
32
Institute for Complex Scientific Software
Institute Home Page:
http://www.icss.neu.edu/
3/7/2003
Bioinformatics
33
What?
 Problem driving institute:
– Complexity of building software
systems to enable scientific
research
Objective:

– Develop general methodologies
for building complex scientific
software using latest computer
science research
3/7/2003
Bioinformatics
34
Goals.
Applications
Scientific
Software
Solutions
The Institute
New
Methodologies
Computer Science
3/7/2003
Bioinformatics
35
Applicable Computer Science
Research.






Aspect-Oriented Software Development
Software Components
Parallelism
Domain Specific Languages
Visualization
Knowledge-Based Support Systems
3/7/2003
Bioinformatics
36
Three Testbeds.
 THEMATICS (M. Ondrechen; protein function from
structure; high external visibility)
– Proc. Nat. Academy of Science publication
– Featured in popular scientific magazines: Nature,
American Chemical Society, Science Daily
 Subsurface Sensing and Imaging (many
Institute participants from this area)
 Parallel Geant4 (CERN; Cooperman, Reucroft
and Swain; particle matter interaction -- million line
program)
3/7/2003
Bioinformatics
37
Some Other Faculty Highlights.
 Valentin Ilyin.
– Protein structure analysis: novel
structural alignment method which
produces high quality alignments.
– visual analytical bioinformatics
interface (Friend).
 Roger Giese.
– The long term goal is to learn whether
the measurement of DNA adducts in people
can help to individualize cancer
prevention, analogous to the measurement
of cholesterol as a biomarker for risk of
a heart attack.
3/7/2003
Bioinformatics
38
Some Other Faculty Highlights.
 Bob Futrelle.
– I'm particularly interested in the
relations between bio-ontologies
and text and diagrams.
3/7/2003
Bioinformatics
39
Conclusions
 Northeastern University and the Institute for
Complex Scientific Software create
knowledge of significant interest to
bioinformatics.
 Aspect-Oriented Software Development is a
useful technology for the rapidly evolving
area of bioinformatics.
3/7/2003
Bioinformatics
40
The End
3/7/2003
Bioinformatics
41
PathSet Algorithm
 We have developed an efficient graph
search algorithm that solves the
following problem:
 Input:
– Graph G1 = (V1, E1) with source s and target
t.
– Graph G2 = (V2, E2) where V1 is a subset of
V2.
 Question: Does G2 contain a path that
is an expansion of a path in G1 from s
to t (the algorithm works even if s and
t are sets of nodes.)
3/7/2003
Bioinformatics
42
Explanation.
 Given a path p, a path p' is called an
expansion, if p' can be obtained by
inserting one or more elements between
elements of p.
 More generally, we can find a third
graph that succinctly represents all
possible such paths in G2.
 Do you see applications of such an
algorithm in biology?
3/7/2003
Bioinformatics
43
Motivation.
 G1 is a “small” graph that lists “important”
nodes.
 G2 is a “large” graph in which we want to
recognize paths that are expansions of
paths in the the “small” graph.
 Expansions of paths may contain additional
nodes that are “noise” nodes.
3/7/2003
Bioinformatics
44
Notes
 There is a path in G2 iff the traversal graph
of G1 and G2 is not empty.
 G1 may have exponentially many paths
from s to t.
3/7/2003
Bioinformatics
45
Topic Switch.
3/7/2003
Bioinformatics
46
Lessons From Manager
Metaphor (Continued).
 AOP is related to (Micromanager) through the
observation that aspects should be loosely coupled to
the base programs they modify. The aspect should not
be brittle with respect to the detailed calling structure of
the base program in the same way as the manager
should not rely on the details of the workers’ project.
There is an intermediary, called glue code, that maps the
aspect to the detailed usage context. AOP is call-graph
shy.
3/7/2003
Bioinformatics
47