Materi 5 – Problem Detection

Download Report

Transcript Materi 5 – Problem Detection

Object-Oriented Reengineering
Patterns and Techniques
Wahyu Andhyka Kusuma, S.Kom
[email protected]
081233148591
Materi 5
Problem Detection
Topik
• Metrics
• Object-Oriented Metrics dalam Praktek
• Duplikasi kode
Topik
• Metrics
– Kualitas dari Perangkat Lunak
– Menganalisa Kecenderungan
• Object-Oriented Metrics dalam Praktek
• Duplikasi kode
Mengapa menggunakan OO dalam Reengineering?
• Menaksir kualitas dari perangkat lunak
– Komponen mana yang memiliki kualitas yang buruk?
(sehingga dapat di reengineering)
– Komponen yang mana memiliki kualitas yang baik?
(sehingga dapat di reverse engineered)
 Metrics sebagai peralatan untuk reengineering
• Mengontrol proses dari reengineering
– Menganalisa kecenderungan :
• Komponen mana yang bisa diubah??
– Bagian refactoring mana yang dapat digunakan?
 Metrics sebagai peralatan reverse engineering!
7.4
ISO 9126 Quantitative Quality Model
Functionality
Error tolerance
Reliability
Accuracy
Software
Quality
Efficiency
defect density
= #defects / size
Consistency
Usability
Simplicity
correction time
Portability
Modularity
correction impact
= #components
changed
Factor
Characteristic
Metric
Maintainability
ISO 9126
7.5
Product & Process Attributes
Process Attribute
Definisi : Mengukur aspek dari
Proses dimana memproduksi produk
Contoh : waktu untuk memperbaiki,
kerusakan jumlah dari komponen
Yang dirubah per perbaikan
Product Attribute
Definisi : Mengukur aspek dari
Hasil yang dikirimkan ke pelanggan
Contoh : Jumlah dari sistem
Yang rusak, mempelajari
tentang sistem
7.6
External & Internal Attributes
Internal Attribute
Definisi : mengukur didalam
Istilah didalam produk Memisahkan
FORM, dalam konteks behaviour
Contoh : class coupling dan
cohesion, method size
External Attribute
Definisi : mengukur bagaimana
product/process berjalan dalam
environment
Contoh : waktu rata-rata dalam
kesalahan, #components changed
7.7
External vs. Internal Product Attributes
External
Internal
Keuntungan:
Kerugian:
> close relationship dengan quality
> relationship dengan quality factors
factors
tidak dalam empirically validated
Kerugian:
Keuntungan:
> Mengukur hanya setelah produk
> Dapat diukur kapanpun
> Pengumpulan data dapat secara
digunakan
> Pengumpulan data sulit data
serinkali ada interfrensi pengguna
> Menghubungkan eksternal efek ke
dalam internal sangat sulit
mudah dan otomatis
> Berhubungan langsung dengan
pengukuran dan penyebabnya
7.8
Metrik dan Pengukuran
• Weyuker [1988] mendefinisikan sembilan properti dimana Metrik
software harus diambil
• Untuk OO hanya 6 properti yang sangat penting [Chidamber 94, Fenton &
Pfleeger ]
– Non coarseness:
• Diberikan sebuah Class P dan sebuak metrik m, kelas lain misal Q juga dapat
ditemukan sehingga menjadi m(P)  m(Q)
• Tidak semua kelas memiliki nilai yang sama untuk metrik
– Non uniqueness.
• Dimana kelas P dan Q memiliki ukuran tetap sedemikian sehingga m(P) = m(Q)
• Dua kelas dapat memiliki metrik yang sama
– Monotonicity
• m(P)  m (P+Q) dan m(Q)  m (P+Q), P+Q adalah “kombinasi” dari kelas P dan Q.
7.9
Metrik dan Pengukuran
– Design Details are Important
• Inti utama dari Class harus mempengaruhi nilai dari metrik. Setiap
class melakukan aksi yang sama dengan detailnya harus
memberikan dampak terhadap nilai dari metrik.
– Nonequivalence of Interaction
• m(P) = m(Q)  m(P+R) = m(Q+R) dimana R interaksi dengan
Class
– Interaction Increases Complexity
• m(P) + (Q) < m (P+Q).
• Dimana dua class digabungkan, interaksi diantaranya juga akan
menambah nilai dari metrik
• Kesimpulan: Tidak semua pengukuran berupa Metrik
7.10
Memilih Metrik
• Cepat
– Scalable: Kita tidak dapat menghasilkan log(n2) dimana n  1 juta LOC
(Line of Code)
• Tepat
– (misalnya #methods — perhitungkan semua method, public, juga
inherited?)
• Bergantung pada kode
– Scalable: Kita menginginkan mengumpulkan metrik dalam waktu sama
• Sederhana
– Metrik yang komplek sulit untuk diterjemahkan
7.11
Menaksir kemudahan perbaikan
• Ukuran dari sistem, termasuk entitas dari sistem
– Ukuran Class, Ukuran method, inheritance
– Ukuran entitas mempengaruhi maintainability
• Kesatuan dari entities
– Class internal
– Perubahan harusnya ada dikelas tersebut
• Coupling (penggabungan) diantara entitas
– Didalam inheritance: coupling diantara class-subclass
– Diluar inheritance
– Strong coupling mempengarui perubahan di kelas tersebut
7.12
Sample Size and Inheritance Metrics
Class Size Metrics
# methods (NOM)
# instance attributes (NIA, NCA)
# Sum of method size (WMC)
Inheritance Metrics
hierarchy nesting level (HNL)
# immediate children (NOC)
# inherited methods, unmodified (NMI)
# overridden methods (NMO)
Inherit
Class
BelongTo
Method Size Metrics
# invocations (NOI)
# statements (NOS)
# lines of code (LOC)
Invoke
Attribute
Method
Access
7.13
Sample class Size
• (NIV)
– [Lore94] Number of Instance Variables (NIV)
– [Lore94] Number of Class Variables (static) (NCV)
– [Lore94] Number of Methods (public, private, protected) (NOM)
• (LOC) Lines of Code
• (NSC) Number of semicolons [Li93]  number of Statements
• (WMC) [Chid94] Weighted Method Count
– WMC = ∑ ci
– where c is the complexity of a method (number of exit or McCabe
Cyclomatic Complexity Metric)
7.14
Hierarchy Layout
• (HNL) [Chid94] Hierarchy Nesting Level , (DIT) [Li93] Depth of Inheritance
Tree,
• HNL, DIT = max hierarchy level
• (NOC) [Chid94] Number of Children
• (WNOC) Total number of Children
• (NMO, NMA, NMI, NME) [Lore94] Number of Method Overridden, Added,
Inherited, Extended (super call)
• (SIX) [Lore94]
– SIX (C) = NMO * HNL / NOM
– Weighted percentage of Overridden Methods
7.15
Method Size
• (MSG) Number of Message Sends
• (LOC) Lines of Code
• (MCX) Method complexity
– Total Number of Complexity / Total number of methods
– API calls= 5, Assignment = 0.5, arithmetics op = 2, messages with
params = 3....
7.16
Sample Metrics: Class Cohesion
• (LCOM) Lack of Cohesion in Methods
– [Chidamber 94] for definition
– [Hitz 95] for critique
Ii = set of instance variables used by method Mi
let
P = { (Ii, Ij ) | Ii  Ij =  }
Q = { (Ii, Ij ) | Ii  Ij   }
if all the sets are empty, P is empty
LCOM = |P| - |Q|
if |P|>|Q|
0
otherwise
• Tight Class Cohesion (TCC)
• Loose Class Cohesion (LCC)
– [Bieman 95] for definition
– Measure method cohesion across invocations
7.17
Sample Metrics: Class Coupling (i)
• Coupling Between Objects (CBO)
– [Chidamber 94a] for definition,
– [Hitz 95a] for a discussion
– Number of other classes to which it is coupled
• Data Abstraction Coupling (DAC)
– [Li 93] for definition
– Number of ADT’s defined in a class
• Change Dependency Between Classes (CDBC)
– [Hitz 96a] for definition
– Impact of changes from a server class (SC) to a client class (CC).
7.18
Sample Metrics: Class Coupling (ii)
• Locality of Data (LD)
– [Hitz 96] for definition
LD = ∑ |Li | / ∑ |Ti |
Li = non public instance variables
+ inherited protected of superclass
+ static variables of the class
Ti = all variables used in Mi, except non-static local variables
Mi = methods without accessors
7.19
The Trouble with Coupling and
Cohesion
• Coupling and Cohesion are intuitive notions
– Cf. “computability”
– E.g., is a library of mathematical functions “cohesive”
– E.g., is a package of classes that subclass framework classes cohesive?
Is it strongly coupled to the framework package?
7.20
Conclusion: Metrics for Quality
Assessment
• Can internal product metrics reveal which components have good/poor
quality?
• Yes, but...
– Not reliable
• false positives: “bad” measurements, yet good quality
• false negatives: “good” measurements, yet poor quality
– Heavyweight Approach
• Requires team to develop (customize?) a quantitative quality model
• Requires definition of thresholds (trial and error)
– Difficult to interpret
• Requires complex combinations of simple metrics
• However...
– Cheap once you have the quality model and the thresholds
– Good focus (± 20% of components are selected for further inspection)
• Note: focus on the most complex components first!
7.21
Topik
• Metrics
• Object-Oriented Metrics dalam Praktek
– Detection strategies, filters and composition
– Sample detection strategies: God Class …
• Duplikasi kode
Detection strategy
• A detection strategy is a metrics-based predicate to identify candidate
software artifacts that conform to (or violate) a particular design rule
7.23
Filters and composition
• A data filter is a predicate used to focus attention on a subset of interest of
a larger data set
– Statistical filters
• I.e., top and bottom 25% are considered outliers
– Other relative thresholds
• I.e., other percentages to identify outliers (e.g., top 10%)
– Absolute thresholds
• I.e., fixed criteria, independent of the data set
• A useful detection strategy can often be expressed as a composition of
data filters
7.24
God Class
• A God Class centralizes intelligence in the system
– Impacts understandibility
– Increases system fragility
7.25
Feature Envy
• Methods that are more interested in data of other classes than their own
[Fowler et al. 99]
7.26
Data Class
• A Data Class provides data to other classes but little or no functionality
of its own
7.27
Data Class (2)
7.28
Shotgun Surgery
• A change in an operation implies many (small) changes to a lot of
different operations and classes
7.29
Topik
• Metrics
• Object-Oriented Metrics dalam Praktek
• Duplikasi kode
– Detection techniques
– Visualizing duplicated code
Kode di salin
Contoh dari Mozilla Distribution (Milestone 9)
Diambil dari /dom/src/base/nsLocation.cpp
[432]
[433]
[434]
[435]
[436]
[437]
[438]
[439]
[440]
[441]
[442]
[443]
[444]
[445]
[446]
[447]
[448]
[449]
[450]
[451]
[452]
[453]
[454]
[455]
[456]
[457]
[458]
[459]
[460]
[461]
[462]
[463]
[464]
[465]
[466]
NS_IMET HODIMP
[467]
LocationImpl::GetP athname(nsSt [468]
ring
{
[469]
nsAutoString href;
[470]
nsIURI *url;
[471]
nsresult result = NS_OK;
[472]
[473]
result = Get Href(href);
[474]
if (NS_OK == result ) {
[475]
#ifndef NECKO
[476]
result = NS_NewURL(&url, href);
[477]
#else
[478]
result = NS_NewURI(&url, href);
[479]
#endif // NECKO
[480]
if (NS_OK == result ) {
[481]
#ifdef NECKO
[482]
char* file;
[483]
result = url->GetP ath(&file); [484]
#else
[485]
const char* file;
[486]
result = url->GetFile(&file); [487]
#endif
[488]
if (result == NS_OK ) {
[489]
aP at hname.SetString(file); [490]
#ifdef NECKO
[491]
nsCRT ::free(file);
[492]
#endif
[493]
}
[494]
NS_IF_RELEASE(url);
[495]
}
[496]
}
}
return result;
NS_IMET HODIMP
[497]
LocationImpl::Set P athnam e(const nsString
[498]
{
[499]
nsAutoString href;
[500]
nsIURI *url;
[501]
nsresult result = NS_OK;
[502]
[503]
result = Get Href(href);
[504]
if (NS_OK == result ) {
[505]
#ifndef NECKO
[506]
result = NS_NewURL(&url, href);
[507]
#else
[508]
result = NS_NewURI(&url, href);
[509]
#endif // NECKO
[510]
if (NS_OK == result ) {
[511]
char *buf = aP athname.T oNewCSt ring();
[512]
#ifdef NECKO
[513]
url->Set P ath(buf);
[514]
#else
[515]
url->Set File(buf);
[516]
#endif
[517]
Set URL(url);
[518]
delete[] buf;
[519]
NS_RELEA SE(url);
[520]
}
[521]
}
[522]
[523]
return result;
[524]
}
[525]
[526]
[527]
[528]
[529]
NS_IMET HODIMP
LocationImpl::GetP ort(nsString& aP ort )
{
nsAutoString href;
nsIURI *url;
nsresult result = NS_OK;
result = Get Href(href);
if (NS_OK == result ) {
#ifndef NECKO
result = NS_NewURL(&url, href);
#else
result = NS_NewURI(&url, href);
#endif // NECKO
if (NS_OK == result ) {
aP ort .Set Lengt h(0);
#ifdef NECKO
P RInt 32 port ;
(void)url->Get P ort (& port);
#else
P RUint 32 port ;
(void)url->Get Host P ort (& port);
#endif
if (-1 != port) {
aP ort.Append(port , 10);
}
NS_RELEA SE(url);
}
}
}
return result;
7.31
Berapa banyak kode diduplikasi?
Biasanya diperkirakan: 8 hingga 12% dari kode
Contoh
LOC
Duplikasi
tanpa
komentar
gcc
460’000
8.7%
5.6%
Database Server
245’000
36.4%
23.3%
Payroll
40’000
59.3%
25.4%
Message Board
6’500
29.4%
17.4%
Dengan
komentar
7.32
Apa itu duplikasi kode?
• Duplikasi kode = Bagian dari kode program ditemukan ditempat lain dalam
satu sistem yang sama
– Dalam File yang berbeda
– Dalam File sama tapi Method berbeda
– Dalam Method yang sama
• Bagian tersebut harus memiliki logika atau struktur yang sama sehingga
dapat diringkas,
...
computeIt(a,b,c,d);
...
...
computeIt(w,x,y,z);
...
is not considered
duplicated code.
...
getIt(hash(tail(z)));
...
...
getIt(hash(tail(a)));
...
could be abstracted
to a new function
7.33
Permasalahan dari duplikasi
• Biasanya memberikan efek negatif
– Penggelembungan kode
• Efek negatif ketika perbaikan sistem atau software
• Menyalin menjadi kerusakan tambahan dalam kode
– Software Aging, “hardening of the arteries”,
– “Software Entropy” increases even small design changes become very difficult
to effect
7.34
Mendeteksi duplikasi kode
Nontrivial problem:
• No a priori knowledge about which code has been copied
• How to find all clone pairs among all possible pairs of segments?
Lexical Equivalence
Syntactical Equivalence
Semantic Equivalence
7.35
General Schema of Detection
Process
Transformation
Source Code
Author
Comparison
Transformed Code
Level
Transformed Code
Duplication Data
Comparison Technique
Johnson 94
Lexical
Substrings
String-Matching
Ducasse 99
Lexical
Normalized Strings
String-Matching
Baker 95
Syntactical
Parameterized Strings
String-Matching
Mayrand 96
Syntactical
Metric Tuples
Discrete comparison
Kontogiannis 97
Syntactical
Metric Tuples
Euclidean distance
Baxter 98
Syntactical
AST
Tree-Matching
7.36
Recall and Precision
7.37
Simple Detection Approach (i)
•
•
Assumption:
• Code segments are just copied and changed at a few places
Noise elimination transformation
• remove white space, comments
• remove lines that contain uninteresting code elements
– (e.g., just ‘else’ or ‘}’)
…
//assign same fastid as container
fastid = NULL;
const char* fidptr = get_fastid();
if(fidptr != NULL) {
int l = strlen(fidptr);
fastid = newchar[ l + 1 ];
…
fastid=NULL;
constchar*fidptr=get_fastid();
if(fidptr!=NULL)
intl=strlen(fidptr)
fastid = newchar[l+]
7.38
Simple Detection Approach (ii)
• Code Comparison Step
– Line based comparison
(Assumption: Layout did not change during copying)
– Compare each line with each other line.
– Reduce search space by hashing:
• Preprocessing: Compute the hash value for each line
• Actual Comparison: Compare all lines in the same hash bucket
• Evaluation of the Approach
– Advantages: Simple, language independent
– Disadvantages: Difficult interpretation
7.39
A Perl script for C++ (i)
$equivalenceClassMinimalSiz e = 1;
$slidingWindo wSize
= 5;
$remo veKeyw ords
= 0;
@keyw ords = qw(if
then
else
);
while (<>) {
chomp;
$totalLines++;
# remo ve comments of type /* */
my $codeOnly = '';
while(($inComment && m|\*/|) ||
(!$inComment && m|/\*|)) {
unless($inComment) { $codeOnly .= $` }
$keyw ordsRegExp = join '|', @k eyw ords;
$inComment = !$inComment;
$_ = $';
}
@unw antedLines = qw( else
$codeOnly .= $_ unless $inComment;
retur n
$_ = $codeOnly;
retur n;
{
s|//.*$||; # remo ve comments of type //
}
s/\s+//g; #remo ve white space
;
s/$keyw ordsRegExp//og if
);
$remo veKeyw ords; #remo ve keywords
push @unw antedLines, @keyw ords;
7.40
A Perl script for C++ (ii)
$codeLines++;
push @currentLines , $_;
push @currentLineNos , $.;
if($slidingWindo wSiz e < @currentLines) {
shift @currentLines;
shift @currentLineNos;}
#print STDERR "Line $totalLines >$_<\n";
my $lineToBeCompared = join '', @currentLines;
my $lineNumbersCompared = "<$ARGV>"; # append
the name of the ¼ le
$lineNumbersCompared .= join '/', @currentLineNos;
#print STDERR "$lineNumbersCompared\n";
if($bucketRef = $eqLines{$lineT oBeCompared}) {
push @$b ucketRef , $lineNumbersCompared;
} else {$eqLines{$lineT oBeCompared} = [
$lineNumbersCompared ];}
if(eof) { close ARGV } # Reset linenumber-count f or next
¼le
• Handles multiple files
• Removes comments
and white spaces
• Controls noise (if, {,)
• Granularity (number of lines)
• Possible to remove keywords
7.41
Output Sample
Lines:
create_property(pd,pnImplObjects,stReference,false,*iImplObjects);
create_property(pd,pnElttype,stReference,true,*iEltType);
create_property(pd,pnMinelt,stInteger,true,*iMinelt);
create_property(pd,pnMaxelt,stInteger,true,*iMaxelt);
create_property(pd,pnOwnership,stBool,true,*iOwnership);
Locations: </face/typesystem/SCTypesystem.C>6178/6179/6180/6181/6182
</face/typesystem/SCTypesystem.C>6198/6199/6200/6201/6202
Lines:
create_property(pd,pnSupertype,stReference,true,*iSupertype);
create_property(pd,pnImplObjects,stReference,false,*iImplObjects);
create_property(pd,pnElttype,stReference,true,*iEltType);
create_property(pd,pMinelt,stInteger,true,*iMinelt);
create_property(pd,pnMaxelt,stInteger,true,*iMaxelt);
Locations: </face/typesystem/SCTypesystem.C>6177/6178
</face/typesystem/SCTypesystem.C>6229/6230
Lines = duplicated lines
Locations = file names and line number
7.42
Enhanced Simple Detection
Approach
• Code Comparison Step
– As before, but now
• Collect consecutive matching lines into match sequences
• Allow holes in the match sequence
• Evaluation of the Approach
– Advantages
• Identifies more real duplication, language independent
– Disadvantages
• Less simple
• Misses copies with (small) changes on every line
7.43
Abstraction
– Abstracting selected syntactic elements can increase recall, at the
possible cost of precision
7.44
Metrics-based detection strategy
• Duplication is significant if:
– It is the largest possible duplication chain uniting all exact clones that
are close enough to each other.
– The duplication is large enough.
7.45
Automated detection in practice
• Wettel [ MSc thesis, 2004] uses three thresholds:
– Minimum clone length: the minimum amount of lines present in a
clone (e.g., 7)
– Maximum line bias: the maximum amount of lines in between two
exact chunks (e.g., 2)
– Minimum chunk size: the minimum amount of lines of an exact chunk
(e.g., 3)
Mihai Balint, Tudor Gîrba and Radu Marinescu,
“How Developers Copy,” ICPC 2006
7.46
Visualization of Duplicated Code
• Visualization provides insights into the duplication situation
– A simple version can be implemented in three days
– Scalability issue
• Dotplots — Technique from DNA Analysis
– Code is put on vertical as well as horizontal axis
– A match between two elements is a dot in the matrix
abc defa bcdef
a b c d e fa b x y e f
a bcd e a b x yc de
a x bc xd e x f g xh
Exact Copies
Copies with
Variations
Inserts/Deletes
Repetitive
Code Elements
7.47
Visualization of Copied Code
Sequences
Detected Problem
File A
File B
File A contains two copies of a
piece of code
File B contains another copy of
File A
this code
Possible Solution
Extract Method
All examples are made using
Duploc from an industrial case File B
study
(1 Mio LOC C++ System)
7.48
Visualization of Repetitive
Structures
Detected Problem
4 Object factory clones: a
switch statement over a type
variable is used to call
individual construction code
Possible Solution
Strategy Method
7.49
Visualization of Cloned Classes
Class A
Class B
Detected Problem:
Class A is an edited copy
of class B. Editing & Insertion
Class A
Possible Solution
Subclassing …
Class B
7.50
Visualization of Clone Families
Overview
Detail
20 Classes implementing lists for different data types
7.51
Kesimpulan
• Duplikasi Kode adalah masalah nyata
– Membuat sistem semakin susah untuk diubah
• Mendeteksi duplikasi kode adalah masalah berat
– Beberapa teknik sederhana dapat membantu
– Dukungan dari alat lain juga dibutuhkan
• Visualisasi dari kode sangat berguna
• Mengatasi duplikasi kode bisa dijadikan bahan penelitian
7.52