Transcript pptx
Chapter 15 Above: GPS time series from southern California after removing several curve fits to the data Fitting curves to data is very common in Earth sciences • Has applications in virtually all subdiscipline Two things to keep in mind: • Data is noisy • Data is discrete (non-continuous) • Curve fitting can help overcome these issues (to some degree) Empirical Modeling: • A type of modeling that involves fitting a curve to data and then using the equation of the curve to predict values Extrapolations of Future Global Warming, IPCC (2007) From Wells & Coppersmith (1994) MATLAB provides several built-in functions to fit curves • Many require the “Curve Fitting Toolbox”, or other toolboxes. • We will only use the basic curve fitting functions that are part of standard MATLAB • We will focus on • polyfit, polyval, corrcoef, roots polyfit • Fits data with a polynomial curve of a user-specified degree • Polynomials come in different orders or degrees • 0th order: a single constant value • Examples: y = 4 y = 2.75 y = -12.1 • 1st order: a linear equation (independent var is to 1st power) • Examples: y = 4x y = 3.2x + 7 y = -8.2x – 21.3 • 2nd order: a quadratic equation • Examples: y = 5x2 y = 2.9x2 + 7 y = -1.8x2 – 7.4x + 1.4 • 3rd order: a cubic equation • Examples: y = 7x3 y = 4.6x3 + 2 y = 2.4x3 + 3.5x2 + 3.2x + 7.3 • nth order: a polynomial where “n” is the largest exponent • Can be represented as a row vector in MATLAB • Interpreted by “polyval” as coefficients of a polynomial 3 2.7 1 −5.7 3𝑥 3 + 2.7𝑥 2 + 𝑥 − 5.7 • Lets make data for: 𝒚 = 𝟐. 𝟕𝒙𝟒 + 𝟒𝒙𝟑 − 𝒙𝟐 + 𝟏. 𝟖𝒙 − 𝟏𝟐 • Using polyval requires a lot less typing • Saves time! • A first order polynomial is a linear equation 𝒚 = 𝟑. 𝟕𝟓𝒙 + 𝟎. 𝟐𝟓 • Roots of a function: Where function = 0 • Useful in sciences because we often want to know where parameters return to zero • Also useful for finding min/max of data and equations • • • • 1st order polynomial: 1 root 2nd order polynomial: 0, 1, or 2 roots 3rd order polynomial: up to 3 roots nth order polynomial: up to n roots Warning! • Some polynomials have no real roots, but do have roots with imaginary numbers • Recall, the discriminant • b2 – 4ac 𝒚 = 𝒙𝟐 + 𝟔𝒙 + 𝟖 Polynomial 𝒙𝟐 − 𝟔𝒙 + 𝟖 • This means that… 𝒙𝟐 − 𝟔𝒙 + 𝟖 = 𝒙 − 𝟐 𝒙 − 𝟒 • Discriminant > 0 Polynomial 𝟒𝒙𝟐 − 𝟐𝒙 + 𝟔 • This means that… 𝟒𝒙𝟐 − 𝟐𝒙 + 𝟔 • Has no real roots! • Discriminant < 0 Polynomial 𝒙𝟒 − 𝟐𝟑𝒙𝟐 − 𝟏𝟖𝒙 + 𝟒𝟎 • This means that… 𝒙𝟒 − 𝟐𝟑𝒙𝟐 − 𝟏𝟖𝒙 + 𝟒𝟎 • Has 4 real roots! • Polynomials are easy to deal with in MATLAB • As the order of the polynomial increases… • So does the complexity of the curve • Remember the Taylor Series? • You can fit any function with an infinite series of polynomials • More polynomials = better fit • Polyfit is similar except it fits a single polynomial to data A Simple Test… • Fit 5 collinear points with linear equation • Use polyfit to perform a least squares fit of a 1st order polynomial • i.e. a linear fit While there are better ways to evaluate goodness of fit… • It is beyond our scope to cover all methods for goodness of fit • Take a Stats course! • Correlation coefficient, R, is one commonly used measure • R2 = tells the % of your data’s variance that is explained by a linear fit • R2 = 0.95 means 95% of your data variance is explained by linear fit. • What is good enough? Depends on the situation Make synthetic data with noise • See if fit is reasonable • Increasing the order of the polynomial allows for more complex curves to be fit • Be careful to not over-fit your data! • MATLAB will give a warning if the result is poorly conditioned Occasionally, we may want to fit high order polynomials to data • Typically, this is done to model the shape of a feature, not a data trend To avoid poorly conditioned polynomial warnings, we can: • Ask polyfit to scale the data before fitting 𝒙= 𝒙−𝒙 𝝈𝒙 = 𝒙−mean(𝒙) std(𝒙) = 𝒙−𝝁𝟏 𝝁𝟐 E.g. 𝒚 = 𝒂𝒙𝟐 • Forces mean of x to be zero • Forces standard deviation of x to be 1 • Improves fitting algorithm • Tell polyval about the scaling to reconstruct the correct y-values WARNING!: The polynomial coefficients are not the best fit of the original data. They are best fit of the scaled data. These are the best fit coefficients of the RESCALED data 𝒙−𝒎𝒖(𝟏) ,𝒚 𝒎𝒖(𝟐) not [𝒙, 𝒚] What if data is unevenly spaced? • How could you estimate an evenly spaced data set? • Interpolate it • Interpolation: The process of estimating values in between data points What if data is limited in range? • How could you estimate data beyond your data range? • Extrapolate values • Very prone to errors. Should always be done with extreme caution • Extrapolation: The process of estimating values beyond the bounds of your data • MATLAB provides several ways to interpolate data • I will only cover using “interp1” and “polyfit” You can use a best fit curve to interpolate • Make sure the curve fits data well • Best fit curves tend to smooth data • Will not honor your collected data points! • This is why formal interpolation is typically preferred • Be careful about extrapolating! interp1: interpolates 1D data • See also interp2 and interp3 for 2D/3D • interp1 has several options • Read the documentation • We will only use linear or spline methods Linear Interpolation • Resultant data is boxy • Min/Max will not exceed the original data Interpolation using splines • Resultant data has smooth curves • Min/Max may exceed the original data • Both methods honor y-vals at original data • Unevenly sampled an equation to make synthetic data 𝒚 = 𝟖 𝒙 − 𝟐. 𝟓𝒙 • In this case, splines work best, but the polynomial fit is not bad • Unevenly sampled an equation with some random noise (± 2) added 𝒚 = 𝟐𝒙 + 𝟒 • In this case, linear interp is not bad, but the linear fit is best • “interp1” can be used to extrapolate beyond input data limits • Use ‘extrap’ option interp1(x,y,’linear’,’extrap’) • Use with great caution! • Extrapolation is highly prone to errors • Extrapolation should only be a last resort • Which method worked best? • None! Extrapolation is a bad idea • If you have to do it, only go very slightly beyond your data limits • MATLAB offers several built-in functions for curve fitting and resampling data • We covered only polyfit and interp1 • There is no way to know a priori which method is most appropriate for your data • Take statistics classes • Always test your curve fits • Know what relationship to expect between data (if possible) • Polynomial fits are not appropriate for all data sets • May want to explore other methods • E.g. Fourier Analysis / Spectral Analysis • MATLAB has TONS of other curve fitting and resampling options in various toolboxes • Don’t use toolbox commands this class, but feel free to explore them in your research