Mining Images of Material Nanostructure Data Aparna S. Varde, Jianyu Liang,

Download Report

Transcript Mining Images of Material Nanostructure Data Aparna S. Varde, Jianyu Liang,

Mining Images of Material
Nanostructure Data
Aparna S. Varde, Jianyu Liang,
Elke A. Rundensteiner and Richard D. Sisson Jr.
ICDCIT
December 2006
Bhubaneswar, India
1
Introduction
• Data Mining: Process of
discovering interesting
patterns in data sets
• Mining Scientific Data
• Bioinformatics
• Materials Science
• Nanotechnology
2
Nanotechnology
• Field that involves
• Design, characterization,
production, application of
• Structures, devices and
systems by controlling
• Shape, size, structure and
chemistry of materials
• At the nanoscale level
Carbon
Nanofibers
Cobalt
Nanowire
Arrays
• Data from nanotechnology
• Images of nanostructures
Silicon
Nanopore
Array
3
Domain-Specific Analysis
• What is the difference in nanostructure at various
locations of a given sample?
• How does the nanostructure evolve at different stages
of a physical / chemical / biochemical process?
• How does processing under different conditions affect
interactions at the same stage of a process?
4
Goals of Analysis in Applications
• Fabrication of biological nanostructures
• Materials for implants in human body
• Building computational tools
• Useful for tutoring, simulation, estimation
• Selection of materials for industrial processes
• Studying smaller samples helps large scale selection
5
Image Mining Techniques
• Clustering
• Similarity Search
Target Image
Top 4 Matches
6
Challenges in Mining Nanostructure
Image Data
• Learning Notion of Similarity
• Defining Interestingness Measures
• Visualizing Mining Results
7
Learning Notion of Similarity
• Some features of images may be more important
than others
• Experts at best have subjective notions of
similarity
• Need to learn a similarity measure that captures
domain semantics
8
Domain Semantics
• Nanoparticle size
• Dimension of each particle in
nanostructure
• Inter-particle distance
• Distance between particles in
2-D space
• Nanoparticle height
• Projection of particles above
surface
• Zoom
• Level of magnification of
images
• Location
• Part of sample where image
taken
9
Proposed Learning Approach:
FeaturesRank
• Given: Training samples with pairs of images and levels
of similarity identified
• Learn: Distance function that incorporates image
features and their relative importance
• Process: Iterative approach
•
•
•
•
Use guessed initial distance function
Compare obtained clusters with training samples
Adjust function based on error between clusters and samples
Return distance function with minimal error
10
Issues in FeaturesRank
• Defining suitable notion of error
• Proposing weight adjustment heuristics
• Assessing effectiveness of learned distance function
• Addressed in our paper [VRJSL:07]
11
Defining Interesting Measures
• What is interesting to the user
• Assessment of mining results
• Displaying the answers
• Objective measures for interestingness
• Take into account targeted applications
• Our work on cluster representatives [VRRMS:06]
• Minimum Description Length principle
12
Visualizing Mining Results
• Potential use of Visualization Techniques for
Multidimensional Data
• Example: Star glyphs plot for heat transfer
curves [VTRWMS:03]
Vertex: Attribute
Distance from center of star: Value
13
Related Work
• Similarity Search in Multimedia Databases [KB:04]:
Overview metrics, do not learn a function
• Interestingness Measures for Association Rules,
Decision Trees [HK:01]: Objective measures, not
directly applicable to our work, draw an analogy
• XMDV Tool for Visualization of Multivariate Data [W:94]:
Possible adaptation in this context
14
Conclusions
• Mining Nanostructure Images
• Domain Specific Analysis
• Targeted Applications
• Biological Nanostructures
• Computational Tools
• Industrial Processes
• Challenges
• Learning Notion of Similarity
• Defining Interestingness Measures
• Visualizing Mining Results
15