SOC Interview 1999 Talk - University of Southampton

Download Report

Transcript SOC Interview 1999 Talk - University of Southampton

SOES6002: Modelling in
Environmental and Earth System
Science
CSEM Lecture 7
Martin Sinha
School of Ocean & Earth Science
University of Southampton
So far…..
We covered the basic physics of
CSEM surveying
 We saw that if we know the structure
of the Earth, we can predict what we
will record in a given experiment
 We have seen how to choose
suitable experiment parameters so
that the results are sensitive to
particular features

Electromagnetic fields in the Earth
Next…..
Given a set of data, what can we say
about the structure of the Earth?
 Specific example of “Inverse
problem”
 In this case we want to determine
ρ(z)

Introduction to Inverse Theory





Some definitions:
‘Data’ are the experimental observations
‘Forward Model’ – a numerical algorithm
‘Model parameters’ – variable inputs to the
forward model
‘Response’ – predictions output from the
model
Definitions
di
mj
ri
i  1, M
 observational data
j  1, N
i  1, M


m odel param eters
response
F is a functionalcorresponding to the ' forward algorithm'
such that
r  F m
The last line of the above corresponds to the ‘forward problem’.
The ‘inverse problem’ is to find a set of m such that r=d
Misfit measurement:
The misfit for the ith data point is
Residual  d i  ri  d i  Fi m
and
M
 2   d i  Fi m2
i 1
If the model predictions fit the data exactly, then
2  o
The object of ‘inverse modelling’ is to find the set of m that minimise 
2
Example:
Suppose we have a two-parameter model – for example a layer of thickness α
and resistivity ρ buried within a half-space.
We need to find 
and
.
L
L
0
0
Misfit function, 2 model
0
500
1000
1500
0
parameters
500
Layer T
Layer Thickness / m
LEMUR 16
20
Layer Resistivity / m
Layer Resistivity / m
20
15
10
5
0
S
0
5
2
0
500
1000
Layer Thickness / m
LEM
1500
15
10
5
0
1
0
500
Layer Th
Simplest possible case:
If M = N and each model parameter affects one data point and one only, then
the problem is trivial.
In almost all real cases, each element of m affects many elements of r, to
varying degrees.
So we have to perform an automated search through ‘model space’ to find the
best possible solution.
If 2M>N, the problem is overdetermined. We seek the solution that minimises

.
If M<N, the problem is underdetermined. There is an infinite set of models
that can fit the data. We have to choose the model from among all these that
we ‘like’.
But in both cases, we first have to know how to find a model that ‘fits’.
Searching model space:
We begin with a ‘first guess’, which we call a ‘starting
model’. For this, we
2
calculate each of the residuals, and the value of  . Typically, the fit is not
2

very good. Both the residuals individually and
are quite large. We
therefore need to update the model parameters so as to find a better fit.
Our new set of model parameters should be a point somewhere in model
space that is ‘down-slope’ (in terms of misfit) from the starting model.
If we had a map of misfit for all of model space, we could jump straight to the
‘minimum’. But in practice, the best we can do is to make an estimate of what
the misfit surface looks like in a small region very close to our starting model.
To achieve this, we calculate ‘gradients’. We can then use the gradients to
make a series of small jumps down-gradient in model space, proceeding by
iterations until we find the minimum misfit.
Finding the minimum misfit
Starting model
Starting model and 1st jump
Successive iterations
SOES6002: Modelling in
Environmental and Earth System
Science
CSEM Lecture 8
Martin Sinha
School of Ocean & Earth Science
University of Southampton
What do we mean by the ‘gradients’?
We represent the local ‘gradient’ of the misfit function in model space in terms
of Fréchet Derivatives.
J i, j
Fi m

m j
These derivatives are exactly equivalent to the ‘sensitivities’ that are plotted in
the coloured diagram.
Knowing the Fréchet Derivatives, or local gradient of misfit, and knowing how
big the misfit of our current model is, we can estimate both a direction and a
length for a jump in model parameter space to the position of our next model:
e.g.
m n 1  m n  m
Sensitivity pattern
The problem is generally non-linear, because the gradients vary in different
parts of model space – so that every time you jump to a new model, you have
to calculate a new set of derivatives:
Fi mn 1  Fi mn 

m j
m j
So: we have some data d; we have a forward model algorithm F; we add
some extra bits to F so that, as well as calculating r, it can also calculate J.
We wrap all this up in an ‘inversion algorithm’, that progressively improves
the
2
model by finding an update that improves the fit – i.e. decreases
. We
set it going, from some starting model, and we sit back and wait for it to do its
job.
Suppose that each of our observational measurements, di, is associated with
a measurement error i . Then we can scale our measurement of misfit by
the measurement error, and obtain:
M
2  
i 1
d i  Fi m2
 i2
IF the data errors have zero mean and a Gaussian distribution, then for the
‘true’ model we’d expect that
2  M
We can define an ‘rms misfit’ for an entire model as
rms 
2
M
which for the hypothetical case of the ‘true model’ would statistically be most
likely to equal 1.
If during our search through model space we find a model that has an
rms < 1, then it fits the data better than we might expect statistically. In all
probability, it contains features that correspond to fitting random errors in the
data. The model overfits the data.
If during our search through model space we find a model that has an
rms > 1, then it does not fit the data as well as we might expect statistically.
The model ‘underfits’ the data. In all probability, we can find a model that fits
the data better.
Problem Number 1. Global mimimum and local minima.
Suppose our N-dimensional surface of misfit in model parameter space has
not only a ‘global minimum’, but also some ‘local minima’. If we jump into one
of these, we’ll be stuck – and we’ll never find our way to the ‘correct’
minimum.
Possible solutions:
(i)
Use a variety of different starting models. Hope that the trajectories
don’t all end up in local minima.
(ii)
Add a random element to the length and direction of the ‘jump’
between models for successive iterations. With luck, we’ll
eventually jump out of any local minimum, and set off again towards
the global minimum.
But both of these solutions require extra work, and require us to be sceptical
of any reslut until it’s been exhaustively tested many times.
Problem: ‘local minimum’
Problem Number 2. Non-uniqueness.
There are many models that all have an equally good fit to the data. How do
you choose among them?
One approach: ‘Ockham’s (or Occam’s) Razor’.
In our case, we can express this as:
‘The model with the least structure that fits the data adequately is the model
to be preferred.’
But – what do we mean by ‘least structure’? And what do we mean by fitting
the data ‘adequately?
Ockham’s Razor






also known as OCCAM'S RAZOR, or LAW OF ECONOMY,
or LAW OF PARSIMONY
principle stated by William of Ockham (1285-1347/49), a
scholar, theologian and philosopher:
“Pluralitas non est ponenda sine necessitate”
"Plurality should not be posited without necessity."
The principle gives precedence to simplicity. Of two
competing theories, the simplest explanation of an entity is to
be preferred.
The principle is also expressed "Entities are not to be
multiplied beyond necessity."
Model Roughness.
We define the ‘minimum structure’ model as that which has the least
‘roughness’, where in the case of a 1-D model (model parameter varies only
with depth) the local roughness is defined as:
 m 
r1  

 z 
2
 m
r2   2 
 z 
2
‘first derivative smooth’
2
‘second derivative smooth’
and the roughness of the entire model is defined as
 m 
R1   
 dz
z z


2
2
 m
R2    2  dz
z z


2
And for a model consisting of N uniform layers each of finite thickness,
R1  i 2 mi  mi 1 
N
2
R2  i 2 mi 1  2mi  mi 1 
N 1
2
‘Objective Function’


So far we’ve talked about minimising the misfit
between data and model prediction
We can also combine measures of ‘undesirable
characteristics’ into an ‘objective function’ that
we seek to minimise:
O    R  I
2

Where I might be some measure of geological
implausibility (eg using a priori information), R
roughness. λ and μ describe the importance of
data vs constraints
Occam
Occam uses an objective function which
combines misfit and the ‘roughness’ of
the model
 The ‘Occam’ inversion code works by
adjusting the ‘mix’ in the objective
function as the iterations progress
 Stopping is based on reaching the
specified misfit

Occam Approach
Starting
Model
Add
Structure
Target misfit
reached?
Remove
Roughness
Stop
Important issues
How good are the data?
 Can we estimate the observation
uncertainties?
 Are the errors really gaussian and zeromean?
 In most cases we have a problem with
both over-determined and underdetermined parameters…..
