Samad Paydar Web Technology Lab. Ferdowsi University of Mashhad 10

Download Report

Transcript Samad Paydar Web Technology Lab. Ferdowsi University of Mashhad 10

Samad Paydar
Web Technology Lab.
Ferdowsi University of Mashhad
10th August 2011





Introduction
Software ontology models
Semantic web query methods for software
analysis
Experimental evaluation
Conclusion
2
2


In order for software to be developed,
maintained and evolved
It is required that it is understood
 How code works
 Developers’ decisions

Some reasons





Development team changes
Programmers forget what they have done
Undocumented code
Outdated comments
Multiple versions
3
3

Therefore a code comprehension framework
is needed
 Mainly composed of two major steps
▪ Converting source code to an internal representation
▪ Performing queries
4
4

Further
 Open source movement
 Software complexity
 Libraries dependent on other ones
Software that is developed locally is a node in a
world-wide network of interlinked source code
Global Call Graph
5
5


Each node in this cloud should exhibit its
information in an open, accessible and
uniquely identifiable way
Therefore “we propose the usage of semantic
technologies such as OWL, RDF and SPARQL
as a software comprehension framework with
the abilities to be interlinked with other
projects”
6
6

Three models for different aspects of code
1. Software Ontology Model (SOM)
2. Bug Ontology Model (BOM)
3. Version Ontology Model (VOM)
Connected to related ontologies





DOAP
SIOC
FOAF
WF
7
8


Based on FAMIX (FAMOOS Information
Exchange Model)
A programming language independent
model for representing object-oriented
source code
9


For specifying the relations between files,
releases, and revisions of software projects
Based on the data model of Subversion
10

Based on the bug-tracking system Bugzilla
11

Two non-standard extensions of SPARQL
 iSPARQL (Imprecise SPARQL)
 SPARQL-ML (SPARQL Machine learning)
12

Introduces the idea of “virtual triples”
 Are not matched against the underlying ontology
graph, but used to configure similarity joins
 Which pairs of variables should be joined and
compared using a certain type of similarity
measure
13
14



An extension of SPARQL with knowledge
discovery capabilities
A tool for efficient relational data mining on
Semantic Web data
Enables the Statistical Relational Learning
(SLR) methods such as Relational Probability
Trees (RPTs) and Relational Bayesian
Classifiers (RBCs)
15

Learning phase (building prediction model)
16

Test phase (making prediction)
17


4 years (2004-2007) of the proceedings of
ICSE Workshop on Mining Software
Repositories (MSR) are surveyed
Most actively investigated software analysis
tasks are determined
18
19


Dataset: 206 releases of the
org.eclipse.compare plug-in for Eclipse
(average of about 150 Java classes per
version) + bug tracking information
Exported to OWL
20



Task 1: software evolution analysis
Applicability of iSPARQL to software
evolution visualization (i.e. visualization of
code changes foe a certain time span)
Compared all the classes of one major release
with another major release with different
similarity strategies
21
22


Task 2: computing source code metrics
Calculating OO software design metrics
23


Changing methods (CM) and changing
classes (CC)
A method that is invoked by many other
methods has a higher risk of causing defect in
presence of chance
24
25


Number of methods (NOM) and number of
attributes (NOA)
As indicators of GOD classes
26
27

Number of bugs (NOB) and number of
revisions (NOR)
28



Task 3: detection of code smells
Task 4: defeat and evolution density
Task 5: bug prediction
29




A novel approach to analyze software
systems using Semantic Web technologies
EvoOnt provides the basis for representing
source code and metadata in OWL
This representation reduces analysis tasks to
simple queries in SPARQL (or its extensions)
A limitation: loss of some information due to
the use of FAMIX-based ontology model
30



Language constructs like if-else are not
modeled
Measurements cannot conducted at the level
of statements
One of the greatest impediments towards
widespread use of EvoOnt : current lack of
high-performance industrial-strength triplestores & reasoning engines
31