EXTREME EIGENVALUES WITH APPLICATIONS TO …

Download Report

Transcript EXTREME EIGENVALUES WITH APPLICATIONS TO …

Path Integral Method
and Speeding up MC with
Nonlocal Strategies
Juan M. Restrepo
Department of Mathematics
Physics Department
University of Arizona
Collaborators
Gregory Eyink (Johns Hopkins)
 Frank Alexander (LANL)

Parsing the Problem
There is no single data assimilation
problem, when the challenges are
considered:



Spatio/temporal scales
High frequency of data
insertion or low frequency
When the number of state
variables is small or large
Three Estimation Problems:
Given a random time series {X(t): t < t0}
X(t) 2 RN
Prediction:
Estimate {x(t): t> t0}
 Filtering (Nudiction):
Estimate {x(t0)}
 Smoothing (Retrodiction):
Estimate {x(t): t · t0}

Formulating the Equations:
Discrete Model:
L(x(0),…,x(t-dt),x(t),x(t+dt),…,x(tf ),…,
B(t)q(t), B(t)q(t+dt),…,t) = 0
x(t) 2 RN
B q 2 RN
Example: take dt = 1
x(t+1) = L(x,Bq,t)
if linear:
x(t+1) = A(t) x + B(t) q(t)
A 2 RN£ N
Example:
t u(z,t) = n zzu(z,t) + f(t)
u(z,0) = u0(z)
u(0,t) = g(t) u(1,t) = h(t)
Discretizing:
x(t) ´ [u1(t),u2(t)…uN(t)]T
x(t+dt) = A x(t) + B q(t)
x(t)
= A x(t-dt) + B q(t-dt)
....
Leads to:
L(x(0),…,x(t-dt),x(t),x(t+dt),…,x(tf ),…,
Bq(t), Bq(t+dt),…,t) = 0
x(t) 2 RN B q 2 RN
Statement of the Problem
Stochastic Dynamics (Langevin Problem):
Measurements:
GOAL:
Find mean x conditioned on measurements:
xS(t) = E[ x(t)| y1,..., yM]
and
Covariance matrix (uncertainty)
CS(t) =E[(x(t)-xS(t))(x(t) -xS(t))>|y1,...,yM]
The conditional mean xS(t) minimizes
tr CS(t) = E[|(x(t)-xS(t))|2|y1,...,yM].
It is termed the smoother estimate.
A Nonlinear Example
Stochastic Dynamics (Langevin Problem):
dX(t) = f(X(t)) dt + k dW(t)
with
V(x) = -2x2+x4
f(x) = -V’(x)=4x(1-x2)
k = 0.5
Measurements:
at times m Dt, m=1,…, M one observes
ym := X(tm) + r Nm
to have measured values Ym, m=1,2,…,M
Observations
Ym 2 y(tm)
Extended Kalman Filter
Alternative Approaches
KSP: optimal, but impractical
 ADJOINT/4D-VAR
(Restrepo, Leaf, Griewank, SIAM J. Sci Comp 1995)
 Mean Field Variational Method

(Eyink, Restrepo, Alexander, Physica D, 2003)
enKF (ensemble Kalman Filter)
 Particle Method

(Kim Eyink Restrepo Alexander Johnson, Mon. Wea. Rev. 2002)

Path Integral Method
(Alexander Eyink Restrepo, J. Stat. Phys. 2005)
KSP
Kushner, Stratonovich
Pardoux
Prediction/Filter:
Solve t P = -x [f(x) P] + k2xx P/2
P(x,0) = Ps(x)
between tm < t < tm+1, m=0,1…,M.
At measurement times tm,
P (x,tm+) = 1/N exp(rm/r2 – x2/r2/2) P(x,tm-)
Smoothing: solve final value problem
Solve t A = -x f(x) A – k2xx A /2
A (x,t ¸ tm) = 1
between tm+1¸ t ¸ tm, m=M,..,1.
At measurement times tm,
A (x,tm-) = 1/N exp(rm x/r2 – x2/r2/2) A (x,tm+)
Then, for each m=1, 2,…, M and tm < t < tm+1
P(x,t|yj= Yj, j=1,2,…,M)= A (x,t) P (x,t)
Kolmogorov Equation
t P = -x [f(x) P] + k2xx P/2
P(x,t)t ! 1 = Ps(x)
Observations
Ym 2 y(tm)
KSP Filter Results
Why not KSP?
Impractical!
Very tiny example: a scalar PDE in 1-d space
and time solved on a lattice of N points in space:
Dim(X(t))=N. Each component of X(t) requires
the solution of 2 stiff PDE’s
enKF (ensemble)
Combine forecast with measurements:
use linear interpolation via Kalman gain
• Between observations:
Run full model P times, use MC to get
conditional mean and variance.
•
Pros/Cons
Conditional statistics
 Linear interpolation
 Error in PDF » 1/P1/2
 Need to run model fully resolved
 Conveniently incorporates legacy
code

Path Integral Method
Related to simulated annealing
 It can be developed as a black box
 Simple to implement
 Can handle nonlinear/non-Gaussian
problems
 Calculates mean and uncertainty

PROBLEM: Relies on MC!!!
Discretized using explicit Euler-Maruyama scheme
Probability of the dynamics generating a given history
related to probability that it experiences a certain
noise history h (tk) = W(tk + dt)-W(tk),
at times tk,
k=0,1,2,…,
For Gaussian, uncorrelated noise this probability,
Prob h(t) »exp(-1/2 k | h (tk) |2).
In general: Prob h(t) » exp(-Hdyn)
t = t0, t1, ...t_T
where
Hdyn ´ ¼ k = 0T-1 [ [(xk+1- xk)/dt-f(xk,tk)]> D-1(xk, tk)
[(xk+1-xk)/dt-f(xk,tk)] ]
Hdyn ´ k = 0T - 1 [ [(xk+1- xk)/dt –f(xk,tk)]> D-1(xk, tk)
[(xk+1-xk)/dt -f(xk,tk)] ]/4
To include influence of observations
use Bayes' rule.
This modifies Hamiltonian:
Hobs =m=1M[h(x(tm)- y(tm)]>R-1[h(x(tm))-y(tm)]
The total Hamiltonian:
H = Hdyn + Hobs
The Hamiltonian is the log-likelihood.
There is a very close relationship between the
approach we take here and 4D-var,
used extensively in the ocean/climate community.
Maximum likelihood methods minimize the
Hamiltonian.
Instead, we are going to sample the Gibbs
distribution.
SAMPLING
Hybrid Monte Carlo (HMC)
 Unigrid Monte Carlo (UMC)
 Generalized Monte Carlo (GHMC)

Hybrid Monte Carlo
molecular dynamics: used to
propose a new system configuration
 Metropolis MC: accept/reject based
on the energy

Configuration is specified by
degrees of freedom q0, q1, ... , qT.
qi 2 RN
The HMC algorithm works as follows:
To each qi, a conjugate generalized
momemtum, pi, is assigned.
The momenta pi give rise to a kinetic
energy HK = i pi2/2 .
The total Hamiltonian of the system
H = H + HK.
The dynamics are:
dqi/dt = pi
dpi/dt= Fi where
Fi=- H/ qi
is the force on the ith degree of freedom.
What’s going on?
Write Probability P(x) = exp(-E(x))/Z
E(x) and grad(E(x)) are easily evaluated:
Gradient indicates which direction one should go
to find states with higher probability!
Note: H(q,p) = H(q) + HK
PH(q,p)=exp(-H(q,p))/Z=exp(-H(q))exp(HK(p))/Z
separable
then marginal distribution exp(-H(q))/Zq
1) A chain of states is generated:
(qi’,pi’)
i=0,1,2,…,T, by evolving J steps.
2) Detailed balance achieved if configuration
obtained after evolving J steps is accepted
with probability min[1, exp DH],
where DH = H(q',p')-H(q,p).
The Metropolis step corrects for time
discretization errors.
3) p’ refreshed after every acceptance/rejection
according to a Gaussian distribution of independent
variables exp(-HK).
Unigrid Monte Carlo
Updates system by taking coherent moves on a
number of length'' scales.
Decompose system into blocks of contiguous
lattice points.
B=Block sizes 1, 2, 4, ..., 2s.
B=1 the standard local Metropolis.
Update: to each site in B local value has a
random (Gaussian) df added to it.
Metropolis accept/reject as before.
Generalized HMC
Dynamics replaced by:
dqi/dt = A pi dpi/dt = [A]T Fi
A is an N£ N matrix.
when A =I we obtain HMC.
Discretize:
q' = q + dt Ap + ½ dt A AT F([q])
p' = p + ½ dt AT(F[q]+F[q'])
Challenge: find A that leads to a significant reduction
of the correlation time.
Used the circulant matrix
A =circ(1,exp(-a ),exp(-2 a ),…, exp(-Ta ))
A Nonlinear Example
Stochastic Dynamics (Langevin Problem):
dX(t) = f(X(t)) dt + k dW(t)
with
V(x) = -2x2+x4
f(x) = -V’(x)=4x(1-x2)
k = 0.5
Measurements:
at times m Dt, m=1,…, M one observes
ym := X(tm) + r Nm
to have measured values Ym, m=1,2,…,M
Observations
Ym 2 y(tm)
RESULTS:
decorrelation time
T+1 number of lattice points
J number of MD steps per sweep
(standard deviation)
[a] in A
Conclusions
GHMC affects significant speedup
 GHMC results can be applied to
enKF
 Path Integral can be used as a
black box data assimilator

Questions
Euler-Maruyama error O(dt)
other time integrators?
 Optimal A matrix in GHMC?
 Combining Path Integral Method
with closure techniques when N and
T large

Further Information:
http://www.physics.arizona.edu/~restrepo
Education:
Not enough people trained to do it
 The training sequence is not yet
mature
 The problem is interdisciplinary

Data Assimilation (Arizona)

Prerequisites:
Graduate numerical analysis
sequence (linear algebra).
Probability theory.
Program

Semester 1:
Inverse Problem: the deterministic
problem
optimization
sensitivity analysis