Document 7129525

Download Report

Transcript Document 7129525

Business Information
Systems
Content-based image retrieval –
a medical perspective
Henning Müller
(Thomas Deselaers)
HES SO//Valais
Sierre, Switzerland
Business Information
Systems
Overview
• Introduction/motivation
– Non-medical and medical
• Content-based image retrieval
– Features, techniques, etc.
•
•
•
•
•
Text-based image retrieval, multimodal access
Relevance feedback
Applications
Evaluation
Demo, questions, answers
Business Information
Systems
Henning Müller
• Diploma in Medical Informatics
– University of Heidelberg (1992-1997)
• Daimler Benz research and technology
– Portland, OR, USA (1997-1998)
• PhD in image analysis (content-based retrieval)
– University of Geneva (1998-2002)
– Part of the work in Melbourne, Australia (2001)
• Medical image analysis and information systems
– University and Hospitals of Geneva (2002-)
• HES SO Valais, Sierre (2007-)
Business Information
Systems
University hospitals of Geneva
• 2,200 beds, 6 hospitals
• >80,000 images produced per day
– All available in digital form
• ~6,000 computers
• Budget >CHF 1 billion/year
• Computerized information
system since the 70s
• Medical Informatics vs.
Infrastructure informatics
Business Information
Systems
Service of medical informatics
• ~70 employees, part of the
radiology department
• ~10 persons in research
• Research areas:
–
–
–
–
Multimedia electronic patient record
Decision support systems
Telemedicine, especially with African countries
Knowledge representation, natural language
processing, data mining
– Image processing, PACS, operation planning
Business Information
Systems
HES SO
• University of Applied Sciences
Western Switzerland, Sierre
• Business Information Systems
– eHealth section
(retrieval, interoperability, …)
– eServices
– Software development
• [email protected]
Business Information
Systems
MedGIFT project
Data access,
standardization
application,
implementation
Talisman
@neurIST
KnowARC
infrastructures,
computing architectures
Fracture retrieval
ImageCLEF
evaluation,
validation
Business Information
Systems
Introduction
Motivation
Business Information
Systems
Some questions!
• Who works in image processing or image
analysis?
• Who works on information or document retrieval?
• Who knows what content-based image retrieval
really is?
• Who has already worked on image retrieval?
• Who has already worked in the medical field?
Business Information
Systems
General Content-based image retrieval
• Amount of visual data produced has risen strongly
– Cheap digital cameras
– Digital images have no basic costs as paper had
• Not everything is well annotated
– Personal collections, journalist archives
– Image sharing on the web
• FlickR, youTube, …
• Goals of annotation and retrieval do not always
match, some things are hard to express
• Some data can be obtained automatically (GPS…)
Business Information
Systems
Images on FlickR
Business Information
Systems
Business Information
Systems
Goals of visual retrieval
• Retrieval of images by visual (objective) means
– Features extracted fully automatically
•
•
•
•
Colours,
Textures,
Shapes,
Interest points, …
• Query by image example(s), QBE
– “find me images visually similar to this one”
Simplified overview
Represented by
Business Information
Systems
Colour 1
Colour 2
…
Texture N
…
Stored in
User queries
Colour 1
Colour 2
…
Texture N
…
Feedback
Database
Business Information
Systems
Problems of image retrieval
• Sensory gap
– Images taken mean a data loss compared to
reality (limited resolution, no 3D, …)
• Semantic gap
– Features automatically extracted may not
correspond to human semantic search categories
• Page zero problem of query formulation
– How can an image for QBE be found?
• Result: use text wherever available!
Business Information
Systems
How we would like to see image retrieval …
… and how it is perceived
Business Information
Systems
Business Information
Systems
Medical image retrieval
• Situation is often different from photos as images
are the first things available for a patient
– No page zero problem, but often no colour
• Search currently mainly by patient name/ID
– Legal constraints limit visual navigation
• Images alone are most often of limited interest
– Impossible to interpret things
– Content vs. context
– Very small region of interest, often rather detection
Business Information
Systems
Content vs. context: healthy patients
25 year old man:
- homogeneous tissue
88 year old man:
- lower mean density
- pre-fibrotic lesions
Business Information
Systems
Image classification vs. image retrieval
• Retrieval
– No or almost no training data
– No clearly defined tasks a priori, relevance for a
user can contain subjectivity
• Classification
– Large amount of training data available
– Limited number of categories are well defined
• Detection
– Objects, concepts, presence, place, size, …
Business Information
Systems
Contentbased
retrieval
Business Information
Systems
General system overview
Business Information
Systems
A more detailed view
New images
Wavelets
MRML
Interface
Regions
…
Feature
extraction
Image
collection
Query engine
Storage
method
Feedback Feature
algorithm weighting
Stored
index
Business Information
Systems
Interface
Query image
Link to the full
size image
User relevance
feedback
Similarity
score
Diagnosis & link
to teaching file
Business Information
Systems
QBIC – Query By Image Content
• IBM, commercial product,
1993
• Add on for DB2
• Simple color, texture,
layout features
• Very simple feedback
Business Information
Systems
Viper/GIFT
• http://viper.unige.ch/
• http://www.gnu.org/software/gift/
• Visual Information Processing for Enhanced
Retrieval, project of the University of Geneva
• The outcome of the project is GIFT and is
open source
– Continuation at several places
Business Information
Systems
Characteristics of GIFT
• MRML-based communication interface
– KMRML in Konqueror
– Plugin for Gimp
– User interfaces in Java, PHP, CGI/perl
• Components can be exchanged relatively easily
– For example the features
– Uses technologies known from text retrieval
• Tools to index directory trees, generate inverted
files, etc.
Business Information
Systems
• Standardized access to visual search engines
– http://www.mrml.net/
• Communication is in XML, server waits at a
<mrml session-id="1" transaction-id="44">
port
<query-step session-id="1"
resultsize="30"
• Componentalgorithm-id="algorithm-default">
based structure <user-relevance-list>
<user-relevance-element
image-location="http://viper/1.jpg"
user-relevance="1"/>
<user-relevance-element
image-location="http://viper/2.jpg"
user-relevance="-1"/>
</user-relevance-list>
</query-step>
</mrml>
Visual features used
Business Information
Systems
• Global colour histogram (HSV,
18, 3, 3, 4 grey levels)
• Colour blocks at different scales
and locations
• Histogram of Gabor filter responses
– 4 directions, 3 scales, quantized in 9+1 strengths
• Gabor blocks at smallest scale
• ~85’000 possible features, 1’000-3’000 features
per image, distribution similar to words in text
Business Information
Systems
The inverted file
Feature 1
Image 5
Image 7
Image 1
Image 25
Feature 2
Image 1
Image 17
Image 3
...
...
Image 25
Image 17
Image 1
Image 4
Feature n-1
Image 4
Image 5
Image 6
...
Feature n
Image 2
Image 17
Image 12
Image 3
Inverted file
Business Information
Systems
• Access feature by feature instead of image by
image
• Extremely fast access for rare features
• Efficient for sparsely populated spaces
Feature weighting
Business Information
Systems
• Classical idf
– tf
- term frequency
» (number of occurrences of a term in a document)
–
–
–
–
–
cf
j
q
k
R
- collection frequency=document frequency
- feature number
- query with i=1..N input images
- possible result image
- Relevance of an image in a query 2
1 N
1 
 
relevance j     tfij Ri   log( )


N
cf
i

1


j 


scorekq   relevance j
j
Business Information
Systems
General views on the concept of feature
• Features are numerical values computed from
each image (continuous values)
– View connected to image classification
– Ideas and methods from classification and
machine learning
– Inspired by k-nearest neighbor approach
• Features are image properties that are present
or absent
– View connected to textual information retrieval
– Ideas and methods from text retrieval
Business Information
Systems
Visual properties of images
• Color
• Texture
• Shapes
• Image parts
• Complete image
• Meta data
• Textual labels/captions/annotations
– Global vs. local
Business Information
Systems
Features
• Global Descriptors
– Color histograms
– Texture Features (Gabor filters, Wavelets,
Coocurrence Matrices)
– Shape Features (Moments, …)
• Local Descriptors
– Direct approach, partitioning
– Patch-histograms / bag-of-visual words
– SIFT features
Color Histograms: Example
Business Information
Systems
RGB color
space
HSV color
space
Visualization done with 3D Color Inspector: http://rsbweb.nih.giv/ij/plugins/color-inspector.html
Color Histograms: Example
Business Information
Systems
RGB color space
HSV color space
Visualization done with 3D Color Inspector: http://rsbweb.nih.giv/ij/plugins/color-inspector.html
Texture Features
Business Information
Systems
“Texture refers to the properties held and
sensations caused by the external surface of
objects received through the sense of touch.”
Only makes sense in homogeneous areas
Texture in image processing:
– Various definitions
– Different representations
Tamura Features
Business Information
Systems
• Proposed by Tamura [1978]
– Features corresponding to human perception
– Examined 6 features, 3 corresponding to human
perception
•
•
•
•
•
•
Coarseness – coarse vs. fine
Contrast – high vs. low
Directionality – directional vs. non-directional
Linelikeness – line-like vs. non-line-like
Regularity – regular vs. irregular
Roughness – rough vs. smooth
Gabor Features
Business Information
Systems
• Obtain several values per pixel denoting
spatial frequencies and directions
Business Information
Systems
Gabor Features
• Windowed Fourier transform with Gaussian
as window function:
Business Information
Systems
Gray-Level Co-Occurrence Matrices
• Statistical descriptor for texture properties of
an image by comparing neighboring pixels
– Direction and distance
– Extract features from matrix
•
•
•
•
Entropy
Contrast
Correlation
…
Business Information
Systems
Pixel Values as Features
• Most straightforward
• Scale all images to a common size
• Compare using e.g. Euclidean distance
– Pixel-wise
• Multi-scale Representations:
Business Information
Systems
Image Distortion Model
• Allow for small local displacements
Business Information
Systems
Shape: GIST descriptor
GIST descriptor
Oliva and Torralba, IJCV 2001
Slide by James Hays and Alexei Efros
Business Information
Systems
Local Descriptors
• Various types
•
•
•
•
Features extracted from local regions
Patches, SIFT features, local color histograms, …
Extraction position determined by interest points
Known to achieve good results in many tasks
• Active field of research in object recognition,
detection, scene classification, image annotation
– More recently: image retrieval
Interest Points
Business Information
Systems
Business Information
Systems
Local Descriptors: Direct Retrieval
Business Information
Systems
Histograms of Local Descriptors
Business Information
Systems
Correlation between Features
Business Information
Systems
1: colorhistogram
2: MPEG7: colorlayout
3: LFSIFThistogram
4: LFSIFTsignature
5: LFSIFTglobal search
6: MPEG7: edgehistogram,
7: Gaborvector
8: Gaborhistograms
9: grayvaluehistogram
10: global texturefeature,
11: inv. Feat histo color
12: Lfpatchesglobal
13: LFpatcheshistogram
14: LFpatchessignature,
15: inv. feat historel
16: MPEG7: scalablecolor,
17: Tamura
18: 32x32 image
19: Xx32image.
Correlation Between
Features
Business Information
Systems
Business Information
Systems
Combining Features
• Manually tuned
– Have an ‘expert’ find a proper set of parameters
• Heuristic to capture different image properties
• Combination to reflect human perception
• Combination to obtain optimal performance
(given a set of training queries)
Business Information
Systems
Combining Features
• Given the result from the correlation analysis,
tofirst
Capture
Different Image Properties
choose a simple feature
• Then add features which have low correlation
Color Histogram: 50.5% MAP
+ Global Texture Features: 49.5 % MAP
+ Tamura Texture Histogram: 51.2% MAP
+ Image Thumbnails: 53.9% MAP
+ Patch Histograms: 55.7% MAP
Business Information
Systems
Combining Features
Reflecting Human Perception
Business Information
Systems
Available Resources
• Image Retrieval Systems:
– FIRE – Flexible Image Retrieval Engine
• http://www-i6.informatik.rwthaachen.de/~deselaers/fire/
• Research image retrieval system
• Developed to allow for easy extension
• Following the Continuous approach
– openCV – computer vision library
• http://sourceforge.net/projects/opencvlibrary/
• Implements many image processing
operations
• Face detection and recognition, feature
extraction
Business Information
Systems
Efficient access methods to features
• Query time should below 1s
– Best below 0.1 s
• Many methods to reduce search space
– PCA. ICP, …
• Database community has many index and
access methods
– Different trees
• Inverted files for sparse feature spaces
Business Information
Systems
Execution times for query per 100 features
0.35
0.3
0.2
0.15
0.1
0.05
Features evaluated
4100
3900
3700
3500
3300
3100
2900
2700
2500
2300
2100
1900
1700
1500
1300
1100
900
700
500
300
0
100
Time (s)
0.25
Business Information
Systems
50
40
00
50
38
00
33
30
35
30
50
28
50
00
40
50
38
00
35
50
33
00
Features evaluated
30
50
28
00
25
50
23
00
20
50
18
00
15
13
50
10
0
80
0
55
0
Rank
25
23
50
00
50
00
50
00
20
50
100
Features evaluated90
80
70
60
50
40
30
20
10
0
18
00
15
50
13
0
0
10
80
55
30
0
100
90
80
70
60
50
40
30
20
10
0
50
Rank
Final top ten of the retrieval
Business Information
Systems
Text-based
retrieval
Business Information
Systems
Text retrieval (of images)
• Started in the early 1960s … for images 1970s
• Not the main focus of this talk
• Text retrieval is old!!
– Many techniques in image retrieval are taken from
this domain (reinvented)
• It becomes clear that the combination of visual
and textual retrieval has biggest potential
– Good text retrieval engines exist in Open Source
Business Information
Systems
Problems with annotation
• Many things are hard to express
– Feelings, situations, … (what is scary?)
– What is in the image, what is it about, what does
it invoke?
• Annotation is never complete
– Plus it depends on the goal of the annotation
• Many ways to say the same thing …
– Synonyms, hyponyms, hypernyms, …
• Mistakes
– Spelling errors, spelling differences (US vs. UK),
weird abbreviations (particularly medical …)
Business Information
Systems
Principle techniques used
• Words follow basically a Zipf distribution
• Tf/idf weightings
– A feature frequent in a document describes it well
– A feature rare in a collection has a high
discriminative power
– Many variations of tf/idf (see also Salton/Buckley
paper)
• Use of inverted files for quick query responses
– Relevance feedback, query expansion, …
Business Information
Systems
Zipf distribution (wikipedia example)
• X- rank
• Y- number
of occurrences
of the word
Business Information
Systems
Techniques used in text retrieval
• Bag of words approach
– Or N-grams can be used
– Stop words can be removed (based on frequency
or list)
• Stemming can improve results
– Stemmers exist for several languages
• Named entity recognition
• Spelling correction (also umlauts, accents, …)
• Mapping of text to a controlled
vocabulary/ontology
Business Information
Systems
Medical terminologies
• MeSH, UMLS are frequently used
– Mapping of free text to terminologies
• Quality for the first few is very high
– Links between items can be used
• Hyponyms, hypernyms, …
– Several axes exist (anatomy, pathology, …)
• This can be used for making a query more
discriminative
• This can also be used for multilingual retrieval
Business Information
Systems
Wordnet
•
Hierarchy, links, definitions in English language
– Maintained in Princeton
•
Car, auto, automobile, machine, motorcar
– motor vehicle, automotive vehicle
•
vehicle
– conveyance, transport
»
»
»
»
instrumentality, instrumentation
artifact, artefact
object, physical object
entity, something
Business Information
Systems
Apache Lucene
• Open source text retrieval system
– Written in Java
• Several tools available
– Easy to use
• Used in many research projects
Business Information
Systems
Multilingual retrieval
• Many collections are multilingual
– Web, FlickR, medical teaching files, …
• Translation resources exist on the web
– Translate query into document language
– Translate documents into query language
– Map documents and queries onto a common
terminology of concepts
• We understand documents in other languages
Business Information
Systems
Multilingual tools
• Many tools accessible on the web
– Yahoo! Babel fish
– www.reverso.net
– Google translate
• Named entity recognition
• Word-sense disambiguation
Business Information
Systems
Current challenges in text retrieval
• Many taken from the WWW or linked to it
• Analysis of link structures to obtain information
on potential relevance
– Also in companies, social platforms, …
• Question of diversity in results
– You do not want to have the same results show
up ten times on the top
• Retrieval in context (domain specific)
• Question answering
Diversity
Business Information
Systems
Business Information
Systems
Relevance
feedback
Business Information
Systems
User interaction
• Contains not only the user interface but also
several other parts of the system
– Interactivity  Interaction speed
– Relevance feedback
• Positive, negative, excessive use
–
–
–
–
Feedback strategies of users
Long-term analysis of user behavior
Interaction paradigms (QBE, Browsing)
Query starting point (text, example, drawing)
Business Information
Systems
Relevance feedback
• Queries based on single keywords or images tell
little about the users’ information needs
– Obtain more information through interaction
• Results of a query are used to refine a query
– “show me similar documents to this one or those
two but not like the other one”
• Query expansion (automatic, manual)
– After a result keywords or features are added
Business Information
Systems
Interpreting feedback
• AND
– Find images that contain something present in all
selected images
• OR
– Find images of one sort or the other
• Mix
– Something in between
– Pseudo image in GIFT, for example
Business Information
Systems
Ways of calculating feedback
• Separate queries for every example image
– Computationally expensive
– Corresponds rather to an OR
• Creation of a single pseudo image with all the
input data
– Quick
– Corresponds rather to an AND
– Well suited for the GIFT-model
Business Information
Systems
Data fusion strategies
• Data from separate queries but also varying
feature sets
• Early fusion
– Features and distances are regarded in a same
feature space
• Late fusion
– Distances are calculated in various feature
spaces in the combined afterwards
Business Information
Systems
Relevance feedback strategies
• Several similar input images improve the query
quality
– BUT: all images are already similar, just a
reordering of the top N
• Negative feedback is extremely important
– Obtain more discriminative information on
features
• Several strategies for negative feedback
– All, or the most different ones, or one per cluster
Relevance feedback strategies
Business Information
Systems
• Experienced users obtain better results
– Better use of feedback (especially negative)
• Automation of feedback strategies
– Positive, few negative, all neg. (low weight)
• Excessive negative feedback can kill results
• Rocchio feedback (1960!)
– Separate calculation of pos. and neg. parts
tf j 

n1
n1
 R tf
i
i 1
ij


n2
n2
 R tf
i
i 1
ij
Business Information
Systems
Long-term learning
• Analyze user behaviour over longer period
• Logfiles with interaction are stored for example in
MRML
• Analyze images that are marked together as
relevant or non-relevant in the same query step
(concentrate on pairs)
– This can lead to image correlations
• We want to learn on a feature basis to be more
general
– On an image basis can help if much feedback is
available
Business Information
Systems
Combinations of images
0
Business Information
Systems
Comparison with market basket analysis
• Market basket analysis (MBA)
–
–
–
–
Items bought together in a supermarket
Large data sets exist in supermarkets
Impossible to evaluate all possible sets
Data reduction is necessary
• Association rules are the goal
– Which set of items implies another set of items
– Best are frequent buys and high probability
Business Information
Systems
Factor for feature relevance
• Based on probability for association rules
f (Ia Ib )
P( I a  I b ) 
f (Ia )
factorj  P F j is a good feature  
1
weight j  factorj 
N
f F j ( I a  I b )
f F j ( I a  I b )  f F j ( I a  I b )  f F j ( I a  I b )
 1
i 1 tfij Ri log  cf i
N
2



Business Information
Systems
A hierarchy for learning
Business Information
Systems
Other techniques
• Image browsing (target search)
– PicHunter system
– Maximize information gain in each step to find a
known image in a collection
• Changes of feature sets during feedback
– First results seem to be good
– Increases discriminative power
Business Information
Systems
Another simple interface
irrelevant
unjudged
relevant
Business Information
Systems
History of feedback
Interfaces
Business Information
Systems
• 3D Browsing and Searching Interface in
MARS
Interfaces
Business Information
Systems
• Video Search and Retrieval Interface
Nguyen et al. ACM Trans. MM 2008
Collection guide
Business Information
Systems
Business Information
Systems
Business Information
Systems
Business Information
Systems
Comparison of several feedback
techniques (Wang database)
Business Information
Systems
Medical
applications
Business Information
Systems
Business Information
Systems
Business Information
Systems
Talisman
• Texture analysis in lung CT images to aid
diagnosis of interstitial lung diseases
– Set of ~150 diseases with unspecific symptoms
– Often hard for the non-specialist
• Database of lung CTs and clinical data is
acquired
– 100 characteristics, based on expert systems
• Combine visual and clinical parameters for
retrieval
Business Information
Systems
Dataset (increasing)
•
Extracted from a multimedia database of ILDs containing
96 patients and 1’104 ROIs (1/2008)
–
•
•
Business Information
Systems
100 clinical parameters
Clinical attributes have ~35% of the values non-defined
736 ROIs in HRCT scans from 56 patients representing
5 classes of lung tissue were selected:
Healthy:
63 ROIs from
5 patients
Emphysema:
58 ROIs from
4 patients
Ground glass:
148 ROIs from
14 patients
Fibrosis:
312 ROIs from
28 patients
Micronodules:
155 ROIs from
5 patients
Some results
Business Information
Systems
Business Information
Systems
Casimage – a radiological
case database
Business Information
Systems
• Case database, especially for teaching
• >100’000 images, 9’000
externally accessible and
anonymized
• Case descriptions (textual)
available in XML
– Very varying quality
– Mix of French and English
• Interface is compatible to the
MIRC (Medical Image Resource
Centre) standard of the RSNA
Fracture retrieval
Business Information
Systems
• Database with >20’000 images
– Before and after interventions, sometimes long term
• Assistance for treatment planning
– Goal is to find similar cases
• Based on several images (frontal, lateral), place of
fracture, complexity of fracture but also patient data:
age, weight, …
– What is the best method?
• Screw, plate, …
– Local features required
• Salient points …
@neurIST
Business Information
Systems
• EU project (IP) with 32 partners on Aneurism
treatment
• Multimodal data analysis and fusion of data from
heterogeneous sources
– Genes, proteins, cells, tissue/organs, individual,
population
• Data collection
is important
• Rare disease
Business Information
Systems
@neurIST – our role
– Security, legal issues
– Political issues
– Work internationally
Overall architecture and key
functions of the interface between
the Clinical Information Systems
and the @neurIST grid
Clinical
Clinical
Information
Information
System
System
SOAP
WP2.6, June 2, 2006
From CIS to mediator:
- add_new_patient()
- remove_patient()
- add_data_for_patient()
- request_grid_service_for_patient()
- set_translation_rule()
From mediator to CIS:
- store_data_in_CIS_for_patient()
CISCIS-GRID mediator
CIS interface
CRIM
translation
rules
normalization
&
denormalization
services
deidentification
&
reidentification
services
ID
database
grid interface
SOAP
• Create architecture
for data acquisition
and communication
• Constraints
• Normalization
grid
grid
From grid to mediator:
- get_data_for_case()
- add_data_for_case()
- get_list_of_cases()
- get_last_modifications_for_case()
From mediator to grid:
request_grid_service_for_case()
CRIM = Clinical Reference Information Model
• Goal is a reusable architecture to access data in
the patient record also for other research projects
KnowARC
Business Information
Systems
• NorduGrid ARC (Advanced Resource Connector) is
a middleware of several Nordic countries
• High energy physics community
– Linked with CERN, LHC
• For analyzing medical images we could
use much computing power
• Goal: Use 6’000 computers
of the hospitals with an
easy-to-use middleware
• Problems: Security, Politics
Desktop grid via virtualization
Business Information
Systems
• Hospital PCs are centrally managed Windows
machines (~6’000)
– Barely used at night … and day
• Use case similar at several institutions
– State of Geneva, University
• Acquire knowledge on grids and think
about new solutions
– Independent of computing restrictions
• Image retrieval is easy to parallelize
Business Information
Systems
A small hospital grid
• Virtual machine distributed automatically
including a Linux image
– Active directory based solution in the hospitals
– PCs in a seminar room, plus own PCs
• User can stop the virtual machines when he
needs his computer fully
– New PCs are barely slowed down
• Smaller time overhead internally than externally
– Internal and external computation with the same
interface
Visual problems
Business Information
Systems
logo
text
specific
problems
Large parts without information
Business Information
Systems
Business Information
Systems
Document image search
•
http://153.109.124.56:8080/TB_IVAN/
• Collaboration with WHO
.doc, -xls, .pdf, .ppt
text
Lucene
images
GIFT
Business Information
Systems
Business Information
Systems
ASSERT (1997-2000)
IRMA – Image annotation
Business Information
Systems
myPACS
Business Information
Systems
MedTing
Business Information
Systems
Goldminer
Business Information
Systems
Google image search (faces)
Business Information
Systems
Business Information
Systems
Evaluation
Business Information
Systems
Information retrieval evaluation
• Started very early (1960s, in part as a theoretical
discipline …)
– Cranfield tests, Smart
• TREC became a role model for benchmarks with
many spin-offs (TRECVID, CLEF, …)
– Yearly circle of events
– Relevance-based evaluations, …
– Mainly system-oriented evaluation
• Still much can be criticized
– Measures, interactive retrieval, …
Business Information
Systems
A yearly circle
Proceedings
pubc
il ation
Cal for
participation
TREC
conference
Task
definit ion
Document
procurement
Topc
i
definit ion
Result s
anay
l sis
Result s
evaluation
Relevance
assessments
IR
experiments
Visual retrieval evaluation
Business Information
Systems
• Little systematic evaluation in first years of
research (1990-2000)
– Some papers on methodologies
– Benchathlon to foster discussions
• Since then, evaluation has come a long way …
• TRECVID, ImageCLEF, INEX MM, ImageEval, …
– Improvement in performance can be shown
– Techniques can be compared
• Methodologies and user models can be criticized
– Not all research can be benchmarked
– Innovation instead of pure performance
Business Information
Systems
Axes for evaluation
• Databases
• Tasks
– Including experts for relevance judgements
• Participants
– Techniques to compare
• Ground truth, gold standard
• Performance measures
Business Information
Systems
Problems of retrieval benchmarks
•
•
•
•
•
•
•
•
Funding
Access to datasets
Motivate participation
Partners from industry
Realistic tasks and user models
Ground truthing (costly, ambiguous)
Organisational issues
Proving advances and benefits
CLEF - ImageCLEF
Business Information
Systems
• Cross Language Evaluation Forum
– Started as track in TREC (Text Retrieval
Conference,1997)
• Independent workshop since 2000
• Multilingual information retrieval
– Collections are multilingual
– Queries are in a language different from the
collection
• Good framework, registration, legal issues,
proceedings in Springer LNCS, …
History ImageCLEF
Business Information
Systems
• 2003: first image retrieval task, 4 participants
• 2004: 17 participants for three tasks (~200 runs)
– Medical task for visual image retrieval added
• 2005: 24 participants for fours tasks (~300 runs)
– Two medical tasks
• 2006: 30 participants for four tasks (~300 runs)
– LTU database of objects for object classification
• 2007: 35 participants (>1000 runs)
– Hierarchical classification
• 2008: 45 participants submitted results (>2000 runs)
– 63 registrations, wiki task
Business Information
Systems
ImageCLEF 2008
• ImageCLEF/Quaero workshop on image retrieval
evaluation
– To motivate visual retrieval community
• Ad-hoc retrieval with query in different language
– Photo collection, vacation pictures of an agency
• Concept detection task
• Medical Retrieval task
– Collection of ~70’000 images with annotations
• Medical classification task
– Hierarchical classification
• Wikipedia retrieval task
• Interactive retrieval (using a FlickR API)
Tasks and topic definitions
Business Information
Systems
• Realistic!!
– Based on independent expert opinions
– Based on surveys (Portland, Geneva)
– Based on log files (health on the net media search,
medline)
• Retrieval with varying degree of visualness
– A little subjective
• Afterwards analysis of results per task
– Analyze ambiguity for judges (double judgments)
• Kappa analysis
Task examples
Business Information
Systems
1.4
Show me x-ray images of a tibia with a fracture.
Zeige mir Röntgenbilder einer gebrochenen Tibia.
Montre-moi des radiographies du tibia avec fracture.
Task examples
Business Information
Systems
3.6
Show me x-ray images of bone cysts.
Zeige mir Röntgenbilder von Knochenzysten.
Montre-moi des radiographies de kystes d'os.
Ground truthing
Business Information
Systems
• Retrieval
– Expensive task with real users!
• Funding from NSF, help from participants
– Pooling is used with varying number depending on
submissions
– Judgment scheme: relevant – partially – nonrelevant
• Describe all categories exactly!!
– Double judgments to analyze ambiguity
• Good systems stay good with any judge
• Interactive
– Participants evaluate themselves (time, Nrel)
Business Information
Systems
Evaluation
• Categories for media used
– Visual, textual, mixed
• Categories for interaction used
– Automatic, feedback, manual modification
• Still: Mean Average Precision as a lead measure
– Correlates very well with other measures
– BPref, P(10-50) used for comparison
• Many ideas on how to find better measures
– No resources to pursue this
MAP and other measures
Business Information
Systems
Workshop
Business Information
Systems
• Event for discussions among participants
– Mix visual and text retrieval communities
– Learn from results of others
• Oral presentations are selected based on
novelty of techniques not on performance
• Every participant can present a poster
• Presentation of the main findings
• Feedback is very positive and participants do
not regret their participation
Example from the database 2008
Business Information
Systems
Business Information
Systems
ImageCLEFmed 2008
• Images and full-text articles of Radiology/
Radiographics (thanks to the RSNA!)
– Captions of the figures with detailed information
on the figures, subfigures
– The kind of data that clinicians search
• Detailed search tasks may not be the most
common for diagnosis, rather teaching
• More adapted for text retrieval, image analysis
has to be done with care
Business Information
Systems
Some results
• Visual retrieval has often good early precision
but poor recall
• Visual features can be useful for specific
queries
– This can be detected more or less automatically
• Multimodal retrieval has most potential
• Visual classification has improved significantly
• Relevance feedback and interactive retrieval
are rarely used (lack of manpower)
Business Information
Systems
ImageCLEFmed 2009
• Search for similar cases in the literature
– Several sorts of images (xray, CT, MRI)
– Use incomplete data (no textual information on
modality, pathology)
– Much more realistic scenario! Clinician in the
process of solving a difficult case
• Hard task: text processing might not work
– Fusion of very varied data is an important topic
Business Information
Systems
Demo
questions
answers
Business Information
Systems
Demo of GIFT
• http://medgift.unige.ch
Business Information
Systems
Conclusions
• Image retrieval has an important potential
– Still, there are many challenges
– It will not replace but complement text retrieval
• The medical domain faces many challenges of
the non-medical retrieval field
– But there are some advantages
– … and a strong motivation to learn communicate
with clinicians
Business Information
Systems
Future work
• Include 3D and 4D datasets into the analysis
– Massive data reduction is required
• Detection of abnormalities
– Dissimilarity retrieval?
– Region of interest is most often very small
• Case-based retrieval instead of image-based
retrieval
– Multimodal data analysis, incomplete data
Abbreviations
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Business Information
Systems
CBIR
- Content-based image retrieval
CT
- Computed Tomography
GIFT
- GNU Image Finding Tool
GNU
- GNU is Not Unix
GUI
- Graphical User Interface
HES SO
- Haute Ecole Spécialisé de Suisse Orientale
HRCT
- High Resoltion Computed Tmography
HSV
- Hue, Saturation, Value
ID
- Identification
ILD
- Interstitial Lung Disease
MBA
- Market Basket Analysis
MeSH- Medical Subject Headings
MRML
- Multimedia Retrieval Markup Language
PACS- Picture Archival and Communication System
QBE
- Query by Example(s)
ROI
- Region of Interest
UK
- United Kingdom
UMLS- Unified Medical Language System
US
- United States
VIPER
- Visual Information Processing for Enhanced Retrieval
XML
- eXtensible Markup Language