下載/瀏覽

Download Report

Transcript 下載/瀏覽

Automatic Website Summarization by
Image Content:
A Case Study with Logo and Trademark
Images
Evdoxios Baratis, Euripides G.M. Petrakis, Member, IEEE, and
Evangelos Milios, Senior Member, IEEE
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA
ENGINEERING, VOL. 20, NO. 9, SEPTEMBER 2008
Date : 2009/10/29
Speaker : Chin-Yen Yang
南台科技大學 資訊工程系
Outline
2
1
INTRODUCTION
2
IMAGE FEATURE EXTRACTION
3
PROPOSED METHOD
4
EXPERIMENTAL RESULTS
5
CONCLUSIONS
1. INTRODUCTION
We introduce the concept of image-based
summarization
A fully automated image-based summarization
approach is proposed
The evaluation of the method on corporate
Websites is presented
3
1. INTRODUCTION (C.)
Logos and trademarks are important
characteristic signs of corporate Websites
A recent contribution reports that logos and
trademarks comprise 32.6 percent of the total
number of images on the Web
4
2. IMAGE FEATURE EXTRACTION
Intensity histogram
Radial histogram
Angle histogram
5
2. IMAGE FEATURE EXTRACTION (C.)
2.1 Image Representation
6
3 PROPOSED METHOD
7
3 PROPOSED METHOD (C.)
3.1 Image Information Extraction
1. Link information
MaxDepth  1  LinkDepth
Depth 
MaxDepth
2. Text Information
This information is displayed together with images
or can be used for searching the Web
8
3 PROPOSED METHOD (C.)
3.2 Logo and Trademark Detection
Training the decision tree using histogram
features outperforms training using raw
histograms
9
3 PROPOSED METHOD (C.)
Similarity detection
Three attributes corresponding to three histogram
intersections, and one attribute corresponding to
the euclidean distance of their vectors of moment
invariants
The decision tree was pruned with a confidence
value of 0.1 and achieved a 93.89 percent average
classification accuracy
10
3 PROPOSED METHOD (C.)
Image clustering
3.3 Duplicate Logo and Trademark Detection
From each cluster, one image is selected to
represent the cluster in the summary
11
3 PROPOSED METHOD (C.)
3.4 Logo and Trademark Ranking
Probability
Instances
Depth
Image Importance = Probability*Depth*Instances
12
3 PROPOSED METHOD (C.)
3.5 Image-Based Summarization
Cluster Importance =

Image Importancei
image.icluster
13
4 EXPERIMENTAL RESULTS
14
4 EXPERIMENTAL RESULTS (C.)
15
5 CONCLUSIONS
First by extracting images with high probability of
being logos or trademarks
Clustering similar images together and by ranking
images in each cluster by importance
The most important image from each cluster is
included in the summary
16
5 CONCLUSIONS(C.)
76 percent detection accuracy
85 percent classification accuracy
64 percent summarization accuracy
Future work includes experimentation with larger
training data sets and image types for improving
the performance machine learning
17
南台科技大學 資訊工程系