Transcript Slide 1

A Kepler-based Three Tier Architecture applied to LiDAR Interpolation and Analysis Efrat Frank, Ilkay Altintas San Diego Supercomputer Center, UCSD

R. Haugerud, U.S.G.S

D. Harding, NASA

Survey

Point Cloud x, y, z n , …

Process & Classify

LiDAR Introduction

Interpolate / Grid

Increasing Usage of Technology in Geosciences

•Online data acquisition and access •Managing large databases •Indexing data on spatial and temporal attributes •Quick subsetting operations •Large scale resource sharing and management •Collaborative and distributed applications •Parallel gridding algorithms on large data sets using high performance computing •Integrate data with other related data sets, e.g. geologic maps, and hydrology models •Provide easy-to-use user interfaces from portals and scientific workflow environments

The Computational Challenge:

•LiDAR (Light Distance And Ranging, a.k.a ALSM, Airborne Laser Swath Mapping) point cloud datasets, a high performance processing of high point density datasets. •LiDAR generates massive data volumes - billions of returns are common.

•Distribution of these volumes of point cloud data to users via the internet represents a significant challenge. •Processing and analysis of these data requires significant computing resources not available to most geoscientists.

•Interpolation of these data challenges typical GIS/ interpolation software.

•our tests indicate that ArcGIS, Matlab and similar software packages struggle to interpolate even a small portion of these data.

•Traditionally: Popularity > Resources

Configuration

phase

Subset

: DB2 query on DataStar move Analyze process move Visualize render display

Interpolate

: Grass RST, Grass IDW, GMT…

Visualize

: Global Mapper, FlederMaus, ArcIMS

Analyze / “Do Science”

LiDAR Processing via Kepler

•An extensible, easy to use, workflow design and prototyping tool •On-the-fly creation of workflow instances from workflow templates •Integrating heterogeneous local and remote tools in a single interface: •Gridding and Imaging services via Web and Grid services •GIS services •Remote tools via SSH, SCP and GridFTP •Relational and spatial databases access •Direct access to data and tools from remote repositories •Reusable generic and domain specific actors •Support for High Performance Computations: •Job submission and monitoring •Logging of execution trace and registering intermediate products •Data provenance and failure recovery •Portal accessibility. •GEON LiDAR Workflow is deployed on the GEON portal •Reverse engineering of traditional approach

LiDAR Job Management and Monitoring

•GLW is exposed to a high risk of components failures •Long running process •Distributed computational resources under diverse controlling authorities •Kepler provides transparent/background error handling using provenance data •A unified interface to follow up on the status of submitted jobs •View job metadata •Zoom to a specific bounding box location •Track errors •Modify a job and re-submist •View the processing results •In the future, register desired workflow products •Useful for publication

Client/ GEON Portal Map and Attributes Gras s Functions and Parameters submit Parameter xml Create Workflow Description x,y,z and attribute ArcInfo DB2 Render Map ArcSDE ArcIMS NFS Mounted Disk raw data process output

Portal

Monitoring/ Translation Scheduling/ Output Processing

Grid GEON’s Solution:A Three-Tier Architecture for LiDAR Processing

•GOAL: Efficient three-tier architecture for LiDAR interpolation and analysis using GEON infrastructure and tools •GEON Portal - front end layer •Kepler Scientific Workflow System - control layer •Kepler is used as a batch execution engine •GEON Grid - computation layer •Use scientific workflows to glue/combine different tools and the infrastructure •The architecture provides an efficient and reliable LiDAR data analysis Example of LiDAR data acquired along the Northern San Andreas fault in Sonoma County, California.

Left:

Hillshade produced from the first return surface DEM (Digital Elevation Model) derived from the LiDAR data. In this heavily forested region the first return surface largely shows the tree canopy top.

Right:

Hillshade of the last return surface DEM for the same area shown in left image. The multiple returns offered by the LiDAR workflow allow for “virtual deforestation” and the creation of a “bare-earth” model of the ground surface. Note San Andreas fault and roads not visible in the first return hillshade. LiDAR data represents an important new tool for the study of the earth’s surface, especially in regions where heavy vegetation makes traditional techniques such as aerial photography ineffective. (

Source:

Christopher J. Crosby, J. Ramon Arrowsmith, GEON, ASU)

DB2 Spatial query Binary grid ASCII grid Text file Tiff/Jpeg/Gif Map onto the grid Grass surfacing algorithms: Spline IDW block mean … ASCII grid Download data

KEPLER WORKFLOW

Compute Cluster

http://geongrid.org

Future Plans

• Improve overall performance using advanced processing tools •Parallel interpolation, enhanced visualization • Extend built-in failure recovery and reporting features • Additional portal execution and registration support • Utilize provenance information for workflow product registration • Create graphical illustration of job progress / location in the workflow to demonstrate the distributed nature of the system

ULTIMATE GOAL:

Make it useful to a wide range of earth science users!

Contributors

Efrat Jaeger-Frank, Ilkay Altintas, Chaitan Baru, Ashraf Memon, Viswanath Nandigam, (GEON, San Diego Supercomputer Center, UCSD) Christopher J. Crosby, Jefferey S. Conner, J. Ramon Arrowsmith (GEON, ASU) Kepler includes contributors from GEON, SEEK, SDM Center, Ptolemy II, ROADNet, CIPRes and Resurgence supported by NSF ITRs 0225673 (GEON), 022567 (SEEK), DOE DE-FC02-01ER25486 (SciDAC/SDM), and DARPA F33615-00-C-1703 (Ptolemy).