Investigating a Semantic Metrics Suite for Object-Oriented Design Dr. Letha Etzkorn (PI)

Download Report

Transcript Investigating a Semantic Metrics Suite for Object-Oriented Design Dr. Letha Etzkorn (PI)

Investigating a Semantic Metrics Suite for
Object-Oriented Design
Dr. Letha Etzkorn (PI)
Ms. Cara Stein
Dr. Glenn Cox
Dr. Sampson Gholston
Dr. Dawn Utley
Dr. Phil Farrington
The University of Alabama in Huntsville
A Semantic Metrics Suite for ObjectOriented Design
Standard software metrics have some
problems!
 They are implementation dependent since
they are calculated strictly from the code
 They count code items; sometimes it is
arguable whether the items counted
accurately reflect the qualities the metrics are
supposed to measure

Does # of lines of code always reflect complexity?
The University of Alabama in Huntsville
A Semantic Metrics Suite for ObjectOriented Design
Segment #1 and Segment #2 do the same thing but have very
different LOC metrics:
Code Segment #1:
test[--cnt]-> test =
val1[input_count++].counter + val2[tmp_count--]->mycount;
Code Segment #2:
temp = val1[input_count].counter + val2[tmp_count]->mycount;
input_count++;
tmp_count--;
--cnt;
test[cnt]->test = temp;
The University of Alabama in Huntsville
A Semantic Metrics Suite for ObjectOriented Design
Object-Oriented software metrics have some
similar problems:
 Including constructor or destructor functions
in the calculation of the Lack of Cohesion in
Methods metric can cause the metric to fail
The University of Alabama in Huntsville
A Semantic Metrics Suite for ObjectOriented Design


These problems with software metrics all
result due to the metrics being calculated on
syntactic aspects of the code.
The solution is to define metrics based on
semantic aspects of code (“what the code
means”, the code design versus the code
implementation)

Program Understanding!!!
The University of Alabama in Huntsville
A Semantic Metrics Suite for ObjectOriented Design
Program Understanding
 Includes any activity that uses dynamic or
static methods to reveal program properties
The University of Alabama in Huntsville
A Semantic Metrics Suite for ObjectOriented Design
Three kinds of program understanding
approaches

Algorithmic approaches


Knowledge-based Approaches


Annotate programs with formal specifications
Knowledge-base is mapped to program concepts
Graph parsing approaches

Program turned into a flow graph, then matched to a
knowledge-base of flow graphs
The University of Alabama in Huntsville
A Semantic Metrics Suite for ObjectOriented Design
Two kinds of knowledge-based program
understanding approaches

Look at code only


Kozaczynski, Ning and Engberts; Harandi and Ning
Informal tokens only

Biggerstaff, Mitbander and Webster; Etzkorn and Davis
The University of Alabama in Huntsville
A Semantic Metrics Suite for ObjectOriented Design
Etzkorn Informal Tokens approach:
 Used natural language processing and
information extraction techniques



Used informal tokens: comments and identifier
names
Used a knowledge-base consisting of a
hierarchical semantic network
Originally implemented in the PATRicia system
(Program Analysis Tool for Reuse)
The University of Alabama in Huntsville
A Semantic Metrics Suite for ObjectOriented Design
Original purpose of the PATRicia system was to
identify reusable components in existing
object-oriented software.
Included two parts:
 Was the code useful in an area of interest?
 Was the code “good enough” to use?
The University of Alabama in Huntsville
A Semantic Metrics Suite for ObjectOriented Design
To answer: Was the code useful in an
area of interest?

Developed the Etzkorn Informal tokens program
understanding approach
To answer: Was the code “good enough”
to use?

Used Object-Oriented software metrics
The University of Alabama in Huntsville
A Semantic Metrics Suite for ObjectOriented Design
The PATRicia system produced:
 2 reports from informal tokens section



A list of areas, with definitions, covered by a class
or class hierarchy
A description of the operation of a class in natural
language
1 report from metrics section

Values of various OO metrics for classes and class
hierarchies
The University of Alabama in Huntsville
A Semantic Metrics Suite for ObjectOriented Design
Description of Operation Report:
Class wxbItem:
-- minimizes a window.
HINT: A button that can be described by a color descriptor and a
left descriptor
can minimize a window.
-- focuses an <object>
HINT: It is possible to focus an area.
-- tracks a mouse.
HINT: It is possible to track a mouse that can own a button.
The University of Alabama in Huntsville
A Semantic Metrics Suite for ObjectOriented Design
The effectiveness of the PATRicia system
informal tokens approach has been
demonstrated for:
GUI packages
 Mathematical software
Using information extraction-based metrics for:
 Recall
 Precision
 overgeneration

The University of Alabama in Huntsville
A Semantic Metrics Suite for ObjectOriented Design

Currently have published 17 refereed
articles from research related to the
PATRicia system



Knowledge-based journals and conferences
Natural language journals
Software metrics/software engineering
journals
The University of Alabama in Huntsville
A Semantic Metrics Suite for ObjectOriented Design
Using the PATRicia system, we:
 Noticed various problems with
traditional software metrics
 Came to realize that knowledge-based
program understanding could be used
to develop new metrics independent of
the syntax of the code
The University of Alabama in Huntsville
A Semantic Metrics Suite for ObjectOriented Design
The knowledge-base in the PATRicia
system is:
 A weighted, hierarchical semantic
network
 At the lowest level, based on conceptual
graphs
The University of Alabama in Huntsville
A Semantic Metrics Suite for ObjectOriented Design
What is a conceptual graph?
 A knowledge representation technique
 Often used in natural language
understanding, especially in natural
language generation
The University of Alabama in Huntsville
A Semantic Metrics Suite for ObjectOriented Design
A conceptual graph example:
CAT
STAT
SIT
LOC
The University of Alabama in Huntsville
MAT
A Semantic Metrics Suite for ObjectOriented Design
Conceptual graphs, concepts in conceptual graphs
infer
CURSOR
LOC
INSERT
OBJ
TEXT
infer
Interface layer—consists of keywords tagged with the
part of speech (noun, adjective, verb, etc.)
The University of Alabama in Huntsville
A Semantic Metrics Suite for ObjectOriented Design


The various semantic metrics in the
semantic metrics suite are defined in
terms of a conceptual graph-based
knowledge-base.
The PATRicia system knowledge-base is
conceptual graph-based; however, it
includes an inferencing scheme that is
outside the conceptual graph definition.
The University of Alabama in Huntsville
A Semantic Metrics Suite for ObjectOriented Design

Semantic complexity metrics measure
the domain complexity rather than the
implementation complexity:





Semantic Class Definition Entropy (SCDE)
Class Domain Complexity (CDC)
Relative Class Domain Complexity (RCDC)
Key Class Identification (KCI)
Class Interface Complexity
The University of Alabama in Huntsville
A Semantic Metrics Suite for ObjectOriented Design
Class Domain Complexity (CDC):


CDC = Σi=1m
|concept + conceptual relations| X weight
1 + number of conceptual relations linking the
current concept to another concept recognized by
the class. Concepts linking to concepts in another
class are not included in the count. Only outgoing
conceptual relations are included in the count (to
prevent counting the same conceptual relation
twice)
The University of Alabama in Huntsville
A Semantic Metrics Suite for ObjectOriented Design

Semantic Class Definition Entropy:


Based on information theory. Entropy is
the measure of the amount of information.
Validated and Published in “A Semantic
Entropy Metric,” Etzkorn, L., Gholston, S.,
Hughes. W., The Journal of Software
Maintenance and Evolution, Vol. 14, 2002,
pp. 293-310
The University of Alabama in Huntsville
A Semantic Metrics Suite for ObjectOriented Design

Semantic Class Definition Entropy (cont’d):

Amount of information in an alphabetic string


Ii = -log2 Pi
Probability of the I most frequently occurring
domain related concepts or keywords



Pi = fi /N1
fi = number of occurrences of I most frequently
occurring domain related concepts or keywords
N1 = total number of non-unique domain related
concepts or keywords
The University of Alabama in Huntsville
A Semantic Metrics Suite for ObjectOriented Design

Semantic Class Definition Entropy
(cont’d):

Average amount of information contributed
by each domain-related concept or
keyword in a class definition



H = - Σi=1n1 (Pi log2 Pi)
n1 = total number of unique domain-related
concepts or keywords belonging to a class
Pi = fi /N1
The University of Alabama in Huntsville
A Semantic Metrics Suite for ObjectOriented Design

Semantic Class Definition Entropy
(SCDE):




SCDE = - Σi=1n1 (fi /N1 log2 fi /N1 )
n1 = total number of unique domain-related
concepts or keywords belonging to a class
fi = number of occurrences of I most frequently
occurring domain related concepts or keywords
N1 = total number of non-unique domain
related concepts or keywords
The University of Alabama in Huntsville
A Semantic Metrics Suite for ObjectOriented Design

Proposed Semantic Metrics were published in:

Etzkorn, Letha and Delugach, Harry, "Towards a
Semantic Metrics Suite for Object-Oriented Design,"
Proceedings of the 34th International Conference on
Technology of Object-Oriented Languages and
Systems, TOOLS 34 (TOOLS USA), July 30-August 4,
2000, IEEE Computer Society Press, Los Alamitos, CA,
2000, pp. 71-80.

In this paper, the semantic metrics were
validated theoretically using:


Weyucker’s criteria for metric definitions
Briand and Melo’s criteria for cohesion metrics
The University of Alabama in Huntsville
A Semantic Metrics Suite for ObjectOriented Design


Under the NASA grant, currently
extending the informal tokens
portion of the PATRicia system to
analyze Semantic Metrics
New tool is called SemMet
The University of Alabama in Huntsville
A Semantic Metrics Suite for ObjectOriented Design

PATRicia system informal tokens
analysis (and now SemMet) written
in:

C++
CLIPS expert system shell

Various lex parsers

The University of Alabama in Huntsville
A Semantic Metrics Suite for ObjectOriented Design
Plan to validate SemMet-based
semantic metrics:



GUI/mathematical packages data
MDP data (from Mike Chapman)
Advanced Engineering Environment
from MSFC
The University of Alabama in Huntsville
A Semantic Metrics Suite for ObjectOriented Design
QUESTIONS?
The University of Alabama in Huntsville