sdm2002 1246

Download Report

Transcript sdm2002 1246

Mathematical Challenges in Scientific Data Mining
IPAM 14-18 January, 2002
Mining Turbulence Data
Ivan Marusic
Department of Aerospace Engineering and Mechanics
University of Minnesota
Collaborators: Victoria Interrante, George Karypis, Vipin Kumar
Graham Candler, Ellen Longmire, Sean Garrick
Acknowledgement: National Science Foundation
Turbulent Boundary Layer
(Flow visualization using Al flakes in water channel)
Flow direction
Solid surface
Outline
• Turbulent boundary layers: introduction and background
 Need for both simulation and experimental datasets
• Visualization and feature extraction
 What are the important features?
 What is to be “data mined”?
• Difficulties with present analysis approach
• New analysis strategy to investigate causal relationships
• Data mining issues and challenges
Turbulent Boundary Layer
Flow direction
Solid surface
 Responsible for heat transfer, skin friction (drag), mixing of scalars
Issues in wall turbulence
• Described by Navier-Stokes equations (non-linear PDEs)
• Direct numerical simulation is restricted to low Re (Reynolds number)
 Re = ratio of inertia to viscous forces (dUt/n)
 No. of simulation grid points ~ (Re)9/4 , Cost ~ (Re)3
 Present simulation: Re = O(103), Require Re = O(106)
• Also need experimental datasets to investigate high Re flows
• Better understanding of physics/causal relationships would lead to more
accurate modeled simulation tools (CFD) and analytical scaling laws
What features do we extract?
• Flow field information involves in (x,y,z,t) :
Velocity u, Pressure p, Temperature q, etc
• Good candidate = Coherent vortex structures
Vortex identification using velocity gradient tensor
Flow topology classification
Isosurfaces of:
Volume rendered visualizations
Enstrophy
Discriminant
( DNS data Re = 700)
Decreasing threshold levels
Discriminant
Cross-section of “blue” vortex
EXPERIMENTAL WIND TUNNEL FACILITY
PIV SETUP
Pulsed Lasers
Nd:YAG
Kodak Megaplus Cameras
1024 x 1024 pixels
q = 15
In-plane Vorticity
In-plane Swirl
Difficulties with present analysis approach
Typical Turbulent Boundary Layer Simulation
• O(108) grid points
• Generates >10 Terabytes per day (every day)
• Write to disk every 1/1000 time steps (99.9% discarded)
• Final database ~1 Terabyte
• All analysis is done after final database is obtained
Present approach
New analysis approach
Some important trigger events
associated with drag
• “Bursting”
• High values of Reynolds shear stress (-uw)
(associated with momentum transport)
Example of bursting events
N.B. High –uw region
Swirl (|lci|)
Vorticity
20Apr_06 zone1
Reynolds shear stress
Wall-normal velocity
Consistent with “packets of vortices”
(together with other evidence):
SIMPLE SEARCH ALGORITHM

Dual threshold search routine

Define connected region only if 8 neighboring points


To search for ‘Packets of hairpin vortices’, define a region if
Positive Vorticity in the bottom and
Negative Vorticity in the top..
Additional search for
(a) Low streamwise velocity (Low momentum)
(b) High Reynolds shear stress
in the adjoining region of patches of vorticity
MOMENTUM
VORTICITY
z+ = 92
All quantities nondimensionalized using
Ut and n
SWIRL
STRENGTH
VORTICITY
z+ = 92
All quantities nondimensionalized using
Ut and n
u’w’
VORTICITY
MOMENTUM
u’w’
Adrian, Meinhart & Tomkins (2000)
Frequent Subgraph Discovery
(FSG – Karypis & Kuramochi 2001)
Modeling Data With Graphs Beyond Transactions
Data Instance
 Graphs are suitable
for capturing arbitrary
relations between the
various objects.
Object
Graph Instance
Vertex
Object’s Attributes
Vertex Label
Relation Between
Two Objects
Edge
Type Of Relation
Edge Label
Interesting Patterns
 Frequent Subgraphs
 Discovering interesting patterns

Finding frequent, recurrent subgraphs
Efficient algorithms must be developed that operate
and take advantage of the new representation.

Finding Frequent Subgraphs:
Input and Output
 Problem setting: similar to finding frequent itemsets
for association rule discovery
 Input
 Database of graph transactions
 Undirected simple graph (no loops, no multiples edges)
 Each graph transaction has labeled edges/vertices.
 Transactions may not be connected
 Minimum support threshold σ
 Output
 Frequent subgraphs that satisfy the support threshold
 Each frequent subgraph is connected.
Finding Frequent Subgraphs:
Input and Output
Input: Graph Transactions
Output: Frequent Connected Subgraphs
Support = 100%
Support = 66%
Support = 66%
Example
Example of datasets (Database type-B) for investigation
using a Frequent Subgraph Discovery scheme:
- PIV data : In-plane swirl S(x,y) for multiple timesteps
(with and without trigger signal)
- Full 3D data from simulation
Further Challenges
• Temporally and Spatially evolving structures
(objects change)
• Interactions of vortex structures
A
B
C
D