Forecasting the BET-C Stock Index with Artificial Neural Networks

Download Report

Transcript Forecasting the BET-C Stock Index with Artificial Neural Networks

DOCTORAL SCHOOL OF FINANCE AND BANKING DOFIN ACADEMY OF ECONOMIC STUDIES

Forecasting the BET-C Stock Index with Artificial Neural Networks

July 2006 MSc Student: Stoica Ioan-Andrei Supervisor: Professor Moisa Altar

Stock Markets and Prediction

    Predicting stock prices - goal of every investor trying to achieve profit on the stock market predictability of the market - issue that has been discussed by a lot of researchers and academics Efficient Market Hypothesis - Eugene Fama three forms:   Weak: future stock prices can’t be predicted using past stock prices Semi-strong: even published information can’t be used to predict future  prices Strong: market can’t be predicted no matter what information is available

Stock Markets and Prediction

    Technical Analysis  ‘castles-in-the air’  investors behavior and reactions according to these anticipations Fundamental Analysis   ‘firm foundations’ stocks have an intrinsic value determined by present conditions and future prospects of the company Traditional Time Series Analysis  uses historic data attempting to approximate future values of a time series as a linear combination Machine Learning - Artificial Neural Networks

The Artificial Neural Network

    computational technique that benefits from techniques similar to those employed in the human brain 1943 - W.S. McCulloch and W. Pitts attempted to mimic the ability of the human brain to process data and information and comprehend patterns and dependencies The human brain - a complex, nonlinear and parallel computer The neurons:  elementary information processing units  building blocks of a neural network

The Artificial Neural Network

   semi-parametric approximation method Advantages:   ability to detect nonlinear dependencies parsimonious compared to polynomial expansions    generalization ability and robustness no assumptions of the model have to be made flexibility Disadvantages:  has the ‘black box’ property  training requires an experienced user    training takes a lot of time, fast computer needed overtraining  undertraining  overfitting underfitting

The Artificial Neural Network

y

f

(

x

)  sin

x

 ln

x

The Artificial Neural Network

y

 sin

x

 

The Artificial Neural Network

Overtraining/Overfitting

The Artificial Neural Network

Undertraining/Underfitting

Architecture of the Neural Network

 Types of layers:  input layer: number of neurons = number of inputs   output layer: number of neurons = number of outputs hidden layer(s): number of neurons = trial and error   Connections between neurons:  fully connected  partially connected The activation function:    threshold function piecewise linear function sigmoid functions

The feed forward network

n k

,

t

 

k

, 0 

i n

  1 

k

,

i

*

x i

,

t N k

,

t

L

(

n k

,

t

)  1  1

e

n k

,

t

^

y t

  0 

k m

  1 

k

*

N k

,

t

m = number of hidden layer neurons n = number of inputs

The Feed forward Network with Jump Connections

n k

,

t

 

k

, 0 

i n

  1 

k

,

i

*

x i

,

t N k

,

t

L

(

n k

,

t

)  1  1

e

n k

,

t

^

y t

  0 

k m

  1 

k

*

N k

,

t

i n

  1 

i x i

,

t

The Recurrent Neural Network - Elman

n k

,

t

 

k

, 0 

i n

  1 

k

,

i

*

x i

,

t

k m

  1 

k

*

n k

,

t

 1

N k

,

t

L

(

n k

,

t

)  1  1

e

n k

,

t

^

y t

  0 

k m

  1 

k

*

N k

,

t

allows the neurons to depend on their own lagged values  building ‘memory’ in their evolution

Training the Neural Network

Objective: minimizing the discrepancy between real data and the output of the network min   (  ) 

t T

  1 (

y t

 ^

y t

) 2 ^

y t

f

(

x t

;  ) Ω - the set of parameters Ψ – loss function Ψ nonlinear  nonlinear optimization problem - backpropagation - genetic algorithm

The Backpropagation Algorithm

  alternative to quasi-Newton gradient descent Ω 0 – randomly generated (  1   0 )     0   ρ – learning parameter, in [.05,.5] after n iterations: μ=0.9, momentum parameter 

n

 

n

 1    

n

 1   ( 

n

 1  

n

 2 )  problem: local minimum points

The Genetic Algorithm

       based on Darwinian laws Population Creation: N random vectors of weights

Selection

 (Ωi Ωj) parent vectors

Crossover & Mutation

C1,C2 children vectors Election Tournament: the fittest 2 vectors passed to the next generation Convergence: G* generations G* - large enough so there are no significant changes in the fitness of the best individual for several generations

Experiments and Results

Data

   BET-C stock index – daily closing prices, 16 April 1998 until 18 May 2006 daily returns:

R t

 ln

P t

 ln

P t

 1  ln

P t P t

 1 conditional volatility - rolling 20-day standard deviation:

V t

i

20   1 (

R t

i

 _

R t

) 2 19  BDS-Test for nonlinear dependencies:   H 0 : i.i.d. data BDS m,ε ~N(0,1) Series m=2 ε=1 ε=1.5

ε=1 m=3 ε=1.5

OD ARF 16.6526

16.2626

17.6970

17.2148

18.5436

18.3803

18.7202

18.4839

ε=1 m=4 ε=1.5

19.7849

19.7618

19.0588

18.9595

      

Experiments and Results

3 types of Ann's:  feed-forward network  feed-forward network with jump connections  recurrent network Input: [Rt-1 Rt-2 Rt-3 Rt-4 Rt-5] & Vt Output: next-day-return Rt Training: genetic algorithm & backpropagation Data divided in:  training set – 90%  test set – 10% one-day-ahead forecasts - static forecasting Network:   trained 100 times best 10 – SSE  best 1 - RMSE

Experiments and Results

Evaluation Criteria

 In-sample Criteria

R

2 

t T

  1 ( ^

y t t T

  1 (

y t

  

y t

) 2 

y t

) 2  1 

t T

  1 (

y t t T

  1 (

y t

 ^

y t

) 2  

y t

) 2  Out-of-sample Criteria

RMSE

 

t T

  1 (

y t T

 ^

y t

) 2

MAE

 1

T t T

  1

SSE

ˆ

y t

y t

t T

  1 (

y t

 ^

y t

) 2 

t T

  1 

t

2

HR

I t

  1 ,

if t T

  1

I t y t

^

y t

 0 0

ROI

t T

  1

y t sign

( ^

y t

)

RP

t T

  1

y t sign

( ^

y t

)

t T

  1 |

y t

| Pesaran-Timmerman Test for Directional Accuracy:   H 0 : signs of the forecast and those of the real data are independent DA~N(0,1)

Experiments and Results

  ROI - trading strategy based on the sign forecasts:  + buy sign  - sell sign Finite differences: 

y

x i

f

(

x

1 ,...,

x i

h i

,...,

x n

) 

f

(

x

1 ,...,

x i

,...

x n

)

h i

Benchmarks

   Naïve model: R t+1 =R t buy-and-hold strategy AR(1) model – LS – overfitting:  RMSE  MAE

h i

 10  6

R 2 SSE RMSE MAE HR ROI RP PT-Test B&H

Experiments and Results

AR(1) FFN – no vol FFN FFN-jump Naïve RN 0.015100

0.011948

55.77% (111) 0.079257

0.332702

0.011344

0.008932

56.78% (113) 0.083252

0.331258

0.011325

0.008929

57.79% (115) 0.083755

0.331077

0.011304

0.008873

59.79% (119) 0.084827

0.330689

0.011332

0.008867

59.79% (119) 0.091762

0.328183

0.011319

0.008892

59.79% (119) 0.265271

15.02% 0.2753

0.255605

14.47% 0.2753

Volatility 0.318374

18.02% 14.79

0.2753

FFN -0.1123

0.351890

FFN-jump -0.1358

0.331464

18.77% 15.01

0.2753

RN -0.1841

0.412183

23.34% 14.49

0.2753

Experiments and Results

Actual, fitted ( training sample)

Experiments and Results

Actual, fitted ( test sample)

Conclusions

      RMSE and MAE < AR(1)  no signs of overfitting R 2 < 0.1  forecasting magnitude is a failure sign forecasting ~60%  success Volatility:   improves sign forecast finite differences  negative correlation  perceived as measure of risk trading strategy: outperforms naïve model and buy-and-hold quality of the sign forecast – confirmed by Pesaran-Timmerman test

Further development

   Volatility: other estimates neural classificator: specialized in sign forecasting using data outside the Bucharest Stock Exchange:  T-Bond yields  exchange rates  indexes from foreign capital markets