Fast Query-Optimized Kernel Machine Classication Via

Download Report

Transcript Fast Query-Optimized Kernel Machine Classication Via

Support Vector Machine With Adaptive Parameters
in Financial Time Series Forecasting
by L. J. Cao and Francis E. H. Tay
IEEE Transactions On Neural Networks, Vol. 14, No. 6, Nov 2003
Presented by Pooja Hegde
CIS 525: Neural Computation
Spring 2004
Instructor: Dr Vucetic
Presentation Outline

Introduction


Background


Motivation and introduction of a novel approach: SVM
SVMs in Regression Estimation
Application of SVMs in financial forecasting

Experimental setup and results

Experimental analysis of SVM parameters and results

Adaptive Support Vector machines (ASVM)


Experimental setup and results
Conclusions
Introduction

Financial Time Series is one of the most challenging applications of modern
time series forecasting.

Characteristics:

Noisy- unavailability of complete information from past behavior of financial
markets to fully capture dependency between future and past prices.


Non-stationary- distribution of financial time series changes over time.
The learning algorithm needs to incorporate this characteristic: information
given by recent data points is given more weight as compared to distant data
points.
Introduction




Back-propagation Neural Networks have been successfully used for
modeling financial series.
BP Neural networks are universal function approximators that can map any
non-linear function without any priori assumptions about the properties of the
data.
They are more effective in describing dynamics of non-stationary time series
due to their unique non-parametric, noise-tolerant and adaptive properties.
Then what’s the problem!!

Need for large number of controlling parameters.

Difficulty in obtaining a stable solution.

Danger of overfitting: Neural network captures not just the useful
information in training data but also unwanted noises, hence this leads to
poor generalization.
A Novel Approach: SVMs

Support Vector Machines are being used in a number of areas ranging from
pattern recognition to regression estimation.

Reason : Remarkable characteristics of SVMs

Good generalization performance: SVMs implement the Structural Risk
Minimization Principle which seeks to minimize the upper bound of the
generalization error rather than only minimize the training error.

Absence of local minima: Training SMV is equivalent to solving a linearly
constrained quadratic programming problem. Hence the solution of SVMs
is unique and globally optimal.

Sparse Representation of solution:In SVM, the solution to the problem only
depends on a subset of training data points, called support vectors.
Background

Theory of SVMs in Regression Estimation



Given a set of data points (x1,y1), (x2,y2),…,(xl,yl) randomly and independently generated from
an unknown function. SVM approximates the function using the following:
The coefficients w and b are estimated by minimizing the regularized risk function.
To estimate w and b the above equation is transformed to the primal function by introducing
positive slack variables.
Background

Theory of SVMs in Regression Estimation (contd..)


Introducing Lagrange multipliers and exploiting optimality constraints: decision function has
following explicit form
are the Lagrange multipliers. They satisfy the equalities
and they are obtained by maximizing the dual function which has the following form:
Feasibility of Applying SVM in
Financial Forecasting

Experimental Setup:

Data Sets
The daily closing prices of five real
futures contracts from the Chicago
Mercantile Market are used as datasets.

The original closing price is transformed into a
five-day relative difference in percentage of price (RDP).
Feasibility of Applying SVM in
Financial Forecasting

Input variables are determined from four
lagged RDP values based on 5-day periods
(RDP-5, RDP-10, RDP-15, RDP-20) and
one transformed closing price(EMA100).
Output variable- RDP+5.

Z-score normalization is used for
normalizing the time series containing
outliers.

Walk-forward testing routine is used to divide
whole dataset into 5 overlapping training-validation-testing sets.
Feasibility of Applying SVM in
Financial Forecasting

Performance Criteria:






NMSE and MAE: measures of deviation
between the actual and predicted values.
Smaller values of NMSE and MAE indicate
better predictor.
DS: indication of the correctness of the
predicted direction of RDP+5 given in
the form of percentages.
A larger value of DS suggests a better
predictor.
Gaussian Kernel is used as the kernel
function of SVM.
Use the results on the validation set to choose the
optimal kernel parameters (C,ε and δ2) of the SVM.
Feasibility of Applying SVM in
Financial Forecasting

Benchmarks

Standard 3-layer BP neural network with 5 input nodes and 1 output node.

Number of hidden nodes,learning rate & number of epochs is chosen
based on the validation set.

Sigmoid transfer function-hidden nodes and Linear transfer functionoutput node.

Stochastic gradient descent method- train NN.

Regularized RBF Neural Network

It minimizes the risk function consisting of the empirical error and
regularized term.

Regularized RBF neural network software used is developed by Muller
et al. and can be downloaded from http://www.kernel-machines.org.

Centers, variances and output weights are adjusted.

Number of hidden nodes and regularization parameter is chosen based
on validation set.
Results
 In all future contracts, largest values of NMSE
& MAE are in RBF Neural Network.
 In CME-SP, CBOT-US and EUREX_BUND,
SVM has smaller NMSE and MAE values but
BP has smaller values for DS .
 The reverse is true for CBOT-BO & MATIF-CAC40
 All values of NMSE are near or larger than 1.0
indicating financial datasets are very noisy.
 Smallest values of NMSE & MAE occur in SVM,
followed by RBF neural network.
 In terms of DS, results are comparable among the
3 methods
Results
 In CME-SP, CBOT-BO, EUREX-BUND, and MATIF-CAC40, smallest values of NMSE and MAE
are found in SVM followed by RBF neural network.
 In CBOT-US, BP has smallest NMSE & MAE followed by RBF.
 Paired t-test: SVM and RBF outperform BP with = 5% significance level for one-tailed test. No
significant difference between SVM and RBF.
Experimental Analysis of Parameters
C and δ2
Results

Too small a value of δ2 causes SVM to overfit the training data while too large a
value causes SVM to underfit the training data.

Small value for C will underfit training data. When C is too large, SVM will
overfit the training set – deterioration in generalization performance.

δ2 and C play an important role as far as the generalization performance of the
SVM is concerned.
Experimental Analysis of Parameter
ε

NMSE on training & validation set is very stable & relatively unaffected by changes in ε.

Performance of SVM is insensitive to ε.But this result cannot be generalized because
effect of ε on performance depends on input dimension of dataset

Number of support vectors is a decreasing function of ε.Hence a large ε reduces the
number of support vectors without affecting the performance of the SVM.
Support Vector Machine with
Adaptive Vectors (ASVM)

Modification of parameter C:



Regularized risk function – empirical error + regularized term
Increasing value of C increases relative importance of empirical error w.r.t
regularized term.
The behaviors of the weight function can be summarized as follows:

When a0 lima 0Ci = C. Hence EASVM = ESVM

When a

When a  [0, ] and a increases, the weights for first half of training data points
become smaller and those for second half of training data points become larger.
Support Vector Machine with
Adaptive Vectors (ASVM)

Modification of parameter ε :




To make the solution of SVM sparser, ε adopts following form:
Proposed adaptive places ε more weights on recent training points than the
distant ones.
Support vectors are a decreasing function of ε, recent training points will
obtain more attention in the representation of solution that the distant points
The behaviors of the weight function can be summarized as follows:

When b0 limb  0 ε i = ε. Hence the weights in all training data points = 1.0

When b

When b  [0, ] and b increases, the weights for first half of training data points
become larger and those for second half of training data points become smaller.
Adaptive Vectors (ASVM) &
Weighted BP Neural Network(WBP)

Regularized risk function in ASVM:

Corresponding dual function

Weighted BP Neural Network:

Weight update:
Results of ASVM
 ASVM and WBP have smaller NMSE & MAE
but larger DS than their corresponding standard
methods.
 ASVM outperforms SVM with =2.5%
 WBF outperforms BP with  =10%
 ASVM outperforms WBP with  =5%
 ASVM converges to fewer support vectors
Conclusions

SVM: A promising alternative tool to BP neural network for financial time
series forecasting.

Comparable performance between regularized RBF neural network and
SVM.

C and δ2 have a great influence on the performance of SVM. Number of
support vectors can be reduced by using larger ε, resulting in sparse
representation of solution.

ASVM achieves higher generalization performance and uses fewer support
vectors than standard SVM in financial forecasting.

Future work: Investigate techniques to choose optimal values of the free
parameters of ASVM. Explore sophisticated weight functions that closely
follow dynamics of time series and further improve performance of ASVM.
THANK YOU!!!!