Finding File Clones in FreeBSD Ports Collection Yusuke Sasaki Tetsuo Yamamoto Yasuhiro Hayase Katsuro Inoue Department of Computer Science, Graduate School of Information Science & Technology, Osaka University.
Download
Report
Transcript Finding File Clones in FreeBSD Ports Collection Yusuke Sasaki Tetsuo Yamamoto Yasuhiro Hayase Katsuro Inoue Department of Computer Science, Graduate School of Information Science & Technology, Osaka University.
Finding File Clones
in FreeBSD Ports
Collection
Yusuke Sasaki
Tetsuo Yamamoto
Yasuhiro Hayase
Katsuro Inoue
Department of Computer Science,
Graduate School of Information Science & Technology,
Osaka University
File Clones
Two or more files with the same content
Comments
and code indentation ignored
Inside a project or between different projects
Research about file-clones is scarce
Get
new knowledge about file-clones
Project A
Project B
int main() {
printf(“Hello msr!”);
return 0;
}
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
FCFinder
Input
.c
and .h files
Output
File-clone
sets
Faster than other tools
Tool
Speed
CCFinder
1.4M files / 960 hours
x1
1PC
D-CCFinder
1.4M files / 51 hours
x19
80PCs
FCFinder
1.4M files / 17.16 hours
x55
1PC
Detection
Tokenization
MD5 Hash Calculation
Exact Matching
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Experiment
Target
Only .c and .h
~1.4M files
~12 GB
17.16 hours
files in the FreeBSD Ports Collection
We measured:
File size
Number of files in each project
Size of each file-clone set
Number
of file-clones in a project
These values follow the power law
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
used in both of PHP4
and 5
Left:used in PHP5
Right:used in PHP4
D
E
100
L:650 sets
R:500 sets
419 sets
1
number of file clone sets
File-clone Set Size
120 file clones
5
10
50
100
L:61 file clones
R:59 file clones
population
of file
file clone
setclone
sizeset
R*2 = 0.8508
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
500
Right:PHP4 modules
Center:projects related
bin-utils
Left:PHP5 modules
5
50
G
1
number of projects with file clones
File-clones per Project
5 10
50 100
500 1K
5K 10K
number
of file
clone
setsproject are
R*2excluded)
= 0.8263
number of file clones
in projects
(clones
inside
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
File-clones Between Projects (1/3)
* Nodes show the projects
* Edges between projects show the number of file clones
between two projects
Ex) gcc41 and gfortran shares 7691 file clones
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
File-clones Between Projects (2/3)
* Nodes show the projects
* Edges between projects show the number of file clones
between two projects
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
File-clones Between Projects (3/3)
* Nodes show the projects
* Edges between projects show the number of file clones
between two projects
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Conclusions & Future Work
Conclusions
Measured several features of the FreeBSD
Ports collection.
Found that the measured features follow the
power law
Future Work
Projects logical coupling investigation
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University