National Aeronautics and Space Administration GPS Sensor Web Time Series Analysis Using SensorGrid Technology Robert Granat1, Galip Aydin2, Zhigang Qi2, Marlon Pierce2 1Science Data Understanding Group,

Download Report

Transcript National Aeronautics and Space Administration GPS Sensor Web Time Series Analysis Using SensorGrid Technology Robert Granat1, Galip Aydin2, Zhigang Qi2, Marlon Pierce2 1Science Data Understanding Group,

National Aeronautics and Space Administration
GPS Sensor Web Time Series Analysis
Using SensorGrid Technology
Robert Granat1, Galip Aydin2, Zhigang Qi2,
Marlon Pierce2
1Science
Data Understanding Group, Jet
Propulsion Laboratory
2Community
Grids Laboratory, Indiana
University
www.nasa.gov
Jet Propulsion Laboratory
California Institute of Technology
Pasadena, CA
Introduction
•
Modern earth sensor networks are producing large volumes of
data.
•
This demands three things:
1. Automated methods to search, analyze, and mine the data.
2. Infrastructure to connect sensors collecting data with users
and methods.
3. Interfaces through which users can access data and employ
methods.
•
Here address these demands in a GPS sensor web context but most of this work can be generalized to other contexts.
•
We use RDAHMM, a hidden Markov model-based time series
analysis method, and SensorGrid, a web infrastructure
technology.
National Aeronautics and Space Administration
Jet Propulsion Laboratory - California Institute of Technology
GPS Sensor Web Time Series Analysis Using SensorGrid Technology
2
Hidden Markov Models
•
Statistical models for time series data.
•
Can be used with continuous or discrete valued data.
•
Fitting an HMM allows us to describe discrete modes
of behavior to the system.
•
Can be trained with labeled examples (supervised
learning) or without labeled examples (unsupervised
learning).
•
Successful in many fields (e.g., speech processing,
protein sequence analysis).
National Aeronautics and Space Administration
Jet Propulsion Laboratory - California Institute of Technology
GPS Sensor Web Time Series Analysis Using SensorGrid Technology
3
Hidden Markov Model Mechanics
State Sequence
Q1
Q2
Q3
QT
O1
O2
O3
OT
Noise
Observations
The HMM is a stochastic state machine: the state at each
point in time is a probabilistic function of the previous
state; likewise the observed output at that time is a
probabilistic function of the current state.
National Aeronautics and Space Administration
Jet Propulsion Laboratory - California Institute of Technology
GPS Sensor Web Time Series Analysis Using SensorGrid Technology
4
Hidden Markov Models for Geophysical
Sensor Webs
•
Classification of the observation into system/operational modes
is the goal.
•
Fitting an HMM automatically provides classification; the
solution inherently implies an underlying sequence of discrete
states. Observations are classified according to the state to
which they belong.
Below: the HMM state sequence for the time series above
National Aeronautics and Space Administration
Jet Propulsion Laboratory - California Institute of Technology
GPS Sensor Web Time Series Analysis Using SensorGrid Technology
5
Example of HMM Classification
Seismograph data
collected at 1Hz from
a station in Pasadena,
California.
HMM states are colorcoded.
Classification was
performed without
guidance or labeled
training examples.
National Aeronautics and Space Administration
Jet Propulsion Laboratory - California Institute of Technology
GPS Sensor Web Time Series Analysis Using SensorGrid Technology
6
Challenges of Geophysical Data
•
Large volumes of data collected by sensor webs (e.g.,
GPS/seismic networks, ocean buoys).
•
Little or no labeled training data - so we are almost always in an
unsupervised learning mode.
•
A priori system information is often unavailable or unreliable.
•
Data is complicated enough to induce large numbers of local
maxima.
•
Standard Expectation-Maximization fitting method is vulnerable
to local maxima issues in the absence of constraints based on a
priori information.
National Aeronautics and Space Administration
Jet Propulsion Laboratory - California Institute of Technology
GPS Sensor Web Time Series Analysis Using SensorGrid Technology
7
Regularized Deterministic Annealing
Expectation-Maximization
•
RDAEM is a method for overcoming the problems inherent in
basic EM.
•
Deterministic annealing modifies the objective function based
on a computational temperature that flattens or accentuates
features.
•
The annealing method greatly reduces the sensitivity of the
method to initial conditions, but gets stuck in certain structural
local maxima with duplicate states.
•
We overcome this problem by adding regularization terms that
bias the solution away from those local maxima.
National Aeronautics and Space Administration
Jet Propulsion Laboratory - California Institute of Technology
GPS Sensor Web Time Series Analysis Using SensorGrid Technology
8
Comparison of EM and RDAEM
We compare the methods with two metrics:
1) The log likelihood of the solutions: Quality.
2) The number of maxima found in repeated tests: Stability.
Conclusion: RDAEM has equal quality and greater stability.
National Aeronautics and Space Administration
Jet Propulsion Laboratory - California Institute of Technology
GPS Sensor Web Time Series Analysis Using SensorGrid Technology
9
SensorGrid Architecture
•
Major components:
•
•
•
•
•
•
Real-Time filters
Grid Messaging Substrate
Information Service
Filters can be run as Web
Services to create workflows.
Filter Chains can be
deployed for complex
processing.
Streaming messaging
provide high-performance
transfer options.
National Aeronautics and Space Administration
Jet Propulsion Laboratory - California Institute of Technology
NaradaBrokering provides a
robust message-passing
infrastructure.
GPS Sensor Web Time Series Analysis Using SensorGrid Technology
10
Real-Time Filters
•
•
•
Real-time data processing is supported by
employing filters around publish/subscribe
messaging system.
The filters are extended from a generic class to
inherit publish and subscribe capabilities.
They can be connected in parallel or serial as
chains to solve complex problems.
National Aeronautics and Space Administration
Jet Propulsion Laboratory - California Institute of Technology
GPS Sensor Web Time Series Analysis Using SensorGrid Technology
11
SOPAC GPS Network
•
8 networks for 80 stations produce 1Hz high resolution
data.
• Socket based real-time binary-RYO format access is
available, but not utilized!
• We developed filters to provide multiple format (RYO,
ASCII, GML) real-time streaming access.
National Aeronautics and Space Administration
Jet Propulsion Laboratory - California Institute of Technology
GPS Sensor Web Time Series Analysis Using SensorGrid Technology
12
Integration with SCIGN and SOPAC GPS
Step 1: Raw GPS data (1Hz) is
converted to RYO format and made
available through a data server.
Step 2: Data is passed through a series of filters that perform
format conversion and station separation. Message passing
is handled through NaradaBrokering.
Step 3: Data is passed to
the RDAHMM analysis
application.
In this context, analysis applications - such as RDAHMM - are viewed as just another filter.
National Aeronautics and Space Administration
Jet Propulsion Laboratory - California Institute of Technology
GPS Sensor Web Time Series Analysis Using SensorGrid Technology
13
RDAHMM GPS Results via SensorGrid
•
A Google Maps interface allows a user to selection GPS
stations.
•
Models are fit to a large initial body of data from each station
(assumes body of data is representative).
•
Trained models are applied to incoming data from each station.
•
Currently data are held in 10 minute buffers, analyzed and then
presented to the user (near-real time, the 10-minute buffer time
is arbitrarily chosen).
•
Additional interfaces exist for exploration of archived data.
•
Segmented time series can be used to perform exploratory
science, search data catalogs, and detect anomalies.
National Aeronautics and Space Administration
Jet Propulsion Laboratory - California Institute of Technology
GPS Sensor Web Time Series Analysis Using SensorGrid Technology
14
RDAHMM Integration and Visualization with Real-Time Filters
National Aeronautics and Space Administration
Jet Propulsion Laboratory - California Institute of Technology
GPS Sensor Web Time Series Analysis Using SensorGrid Technology
15
Real-Time positions on Google maps
National Aeronautics and Space Administration
Jet Propulsion Laboratory - California Institute of Technology
GPS Sensor Web Time Series Analysis Using SensorGrid Technology
16
Recording and Replaying Sensor Streams
•
•
•
•
•
Filters can be used to record and replay scenarios,
such as Earthquakes in GPS case.
We developed RYO Recorder and RYO Publisher
Filters.
The RYO Recorder creates daily archives of the GPS
Streams.
RYO Publisher can be used to play daily or certain
segments of the records.
We replayed the 2004 Southern California Earthquake
using Parkfield GPS network archive
National Aeronautics and Space Administration
Jet Propulsion Laboratory - California Institute of Technology
GPS Sensor Web Time Series Analysis Using SensorGrid Technology
17
Conclusions
•
We have developed analysis and infrastructure methods for
GPS sensor web data.
•
These methods are not network or data specific and can be
extended to other sensor networks and data types.
•
A hidden Markov model-based time series analysis method
provides robust segmentation and classification results that can
be applied in near-real time (next step: full real time).
•
SensorGrid infrastructure allows robust and flexible
connections between data sources, applications, and users.
•
Demo of the user interface (with Scripps collaborators) at Tue.
afternoon poster session G23B-1289.
National Aeronautics and Space Administration
Jet Propulsion Laboratory - California Institute of Technology
GPS Sensor Web Time Series Analysis Using SensorGrid Technology
18
Hidden Markov Model Parameters
A hidden Markov model
with
states consists of
Initial probabilities
State-to-state transition probabilities
Output distributions
Where
National Aeronautics and Space Administration
Jet Propulsion Laboratory - California Institute of Technology
GPS Sensor Web Time Series Analysis Using SensorGrid Technology
19
Hidden Markov Model
Expectation-Maximization
•
EM is the standard method for fitting HMMs to data.
•
Iterative, starts with an initial model guess.
•
“E”-step: Calculate the expectation of the log likelihood of the
model given an estimate of the unknown parameters.
•
“M”-step: Maximize the expected value of the log likelihood in
the unknown parameters.
•
The so-called Q-function optimized in the “M”-step is
is an estimate of the
state assignment.
is an estimate of the
state transitions.
National Aeronautics and Space Administration
Jet Propulsion Laboratory - California Institute of Technology
GPS Sensor Web Time Series Analysis Using SensorGrid Technology
20
Regularization Terms:
Gaussian Output Distributions
We modify the likelihood objective function with the
following improper prior:
This prior is smallest when the means are identical. It
manifests as a regularization term added to the Q-function:
To maintain concavity of the Q-function, the regularization
weight must be constrained according to
National Aeronautics and Space Administration
Jet Propulsion Laboratory - California Institute of Technology
GPS Sensor Web Time Series Analysis Using SensorGrid Technology
21
Slide Master
National Aeronautics and Space Administration
Jet Propulsion Laboratory - California Institute of Technology
GPS Sensor Web Time Series Analysis Using SensorGrid Technology
22
National
Aeronautics
Space
Administration
National
Aeronautics
andand
Space
Administration
Jet Propulsion Laboratory - California Institute of Technology
Jet Propulsion Laboratory
California Institute of Technology
GPS Sensor Web Time Series Analysis Using SensorGrid Technology
Pasadena, CA
23