POST MODERN PORTFOLIO THEORY: ACHIEVING SUPERIOR UPSIDE

Download Report

Transcript POST MODERN PORTFOLIO THEORY: ACHIEVING SUPERIOR UPSIDE

Applications of
news analytics in finance:
a review
Gautam Mitra
Co-author Leela Mitra
Summary and scope



In this talk we set out a structured (reading) guide to
the published research outputs: Journal papers, white
papers, case studies which are emerging in the
domain of “news analytics” applied to finance.
We aim to provide insight into the subtle interplay of
information technology (including AI), the
quantitative models and behavioural biases in the
context of trading and investment decisions.
Applications such as low frequency and high
frequency trading are presented; some
desirable/potential applications are discussed.
Outline

Introduction

News data



Determining sentiment scores




Data sources
Pre analysis of data
General overview
Das and Chen
Lo
Models and applications in summary form


(abnormal ) Returns
Volatility and risk control

Desirable industry applications

Summary and discussions
Introduction

News.

Market Environment.

Sentiment.

Investment Decisions.

Risk Control.
Introduction

Traders [ High Frequency ]

Fund Managers [ Low Frequency ]

Desktop
• Market Data
• NewsWire

Data WareHouse

DataMart
Introduction
R & D Challenge  Identify Killer Application

Smart investors rapidly analyse/digest information.
 News stories/announcements.
 Stock price moves (market reactions).
 Act promptly to take trading/investment decisions.

Can a machine act intelligently(AI) to compete or
outsmart humans ?
Introduction

At least can we have IT/AI tools which help humans make
good investment decisions?
Intelligence Amplification
<Gearing… engineering concept>

Thus three disciplines converge;

Information Systems
 AI, in particular, Natural Language Processing
 Financial Engineering/quantitative Modelling
( including behavioural finance )
Introduction
Mainstream
News
Pre-News
Web 2.0
Social Media
Pre-Analysis
Classifiers
Sentiment
Scores
(Numeric) financial
market data
Analysis
Consolidated
Datamart
Updated beliefs,
Ex-ante view of market
environment
Quant Models
1. Return Predictions
2. Fund Management /
Trading Decisions
3. Volatility estimates
and risk control
Data  analysis  Datamart  quant models
Outline

Introduction

News data



Determining sentiment scores




Data sources
Pre analysis of data
General overview
Das and Chen
Lo
Models and applications in summary form


(abnormal) Returns
Volatility and risk control

Desirable industry applications

Summary and discussions
News data: Data sources

Sources of news/informational flows (Leinweber)



News: Mainstream media, reputable sources.
 Newswires to traders desks.
 Newspapers, radio and TV.
Pre-News: Source data
 SEC reports and filings. Government agency reports.
 Scheduled announcements, macro economic news,
industry stats, company earnings reports…
Social media: Blogs, websites and message boards
 Quality can vary significantly
 Barriers to entry low
 Human behaviour and agendas
News data: Data sources

Web based news




Individual investors pay more attention than institutional
investors (Das and Rieger)
“Collective Intelligence” large group of people (no ulterior
motives) their collective opinion may be useful.
SEC does monitor message boards
 Far from perfect vetting of information.
Financial news can be split between


Scheduled news (Synchronous)
Unscheduled news (Asynchronous, event driven)
News data: Data sources

Scheduled news (Synchronous)





Arrives at pre scheduled times
Much of pre news
Structured format
Often basic numerical format
Typically macro economic announcements and earnings
announcements
News data: Data sources

Macro economic announcements






Widely used in automated trading
Impact large and most liquid markets (foreign exchange,
Govt. debt, futures markets)
Naturally affects trading strategies.
Speed and accuracy are key... technology requirements
substantial
Providers in this space
 Trade the News, Need to Know News, Market News
International, Thomson Reuters, Dow Jones, Bloomberg…
Earnings announcements


Directly influences stock prices’
Widely anticipated and used in trading strategies
News data: Data sources

Unscheduled news (Asynchronous, event driven)







Arrives unexpectedly over time
Mainstream news and social media
Unstructured, qualitative, textual form
Non-numeric
Difficult to process quickly and quantitatively
May contain information about effect and cause of an
event
To be applied in quant models needs to be converted to an
input time series
Outline

Introduction

News data



Determining sentiment scores




Data sources
Pre analysis of data
General overview
Das and Chen
Lo
Models and applications in summary form


(abnormal) Returns
Volatility and risk control

Desirable industry applications

Summary and discussions
News data: Pre analysis of data

Collecting, cleaning and analysing news data …challenging

Major newswire providers collect news from a wide range of
sources e.g. Factiva database from Dow Jones, news from 400
sources

Tagging – Machine readable meta data
Major newswire providers tag incoming news stories
 Reporters tag stories as they enter them to system
 Machine learning techniques also used to identify relevant
tags (RavenPack)
 Unstructured stories into basic machine readable form
 Tags held in XML < standard for meta-data exchange>
 Reveals story’s topic areas and other useful meta data
News data: Pre analysis of data

Need to identify news which is relevant and current


“Information events” distinguish stories reporting on old news
from genuinely “new” news
Tetlock et al. event study shows “information leakage”
News data: Pre analysis of data

Need to identify news which is relevant and current


Reuters give for each article
 Relevance scores … measures by how much the article is
about a particular company
 Novelty/uniqueness determines the repetition among
articles
RavenPack
 Distinguish stories which are events
 Carry first mention of a particular theme
 Stories which are not events are excluded
 To minimise number of duplicate stories
News data: Pre analysis of data

Classification of news




Tagged stories provide hundreds of event types
Need to distinguish what types of news are relevant to our
application
Market may react differently to different types of news
 e.g. Moniz et. al. find market reacts more strongly to
earnings news than strategic news
Different news is available for different assets
 Larger companies with more liquid stock, tend to have
higher news coverage
News data: Pre analysis of data

Classification of news

Accounting related news
 Earnings



Trading updates


Announcements of earnings
Restatements of Operating Results etc..
Announcements of Sales/Trading Statement etc…
Strategic news
 M&A Related



M&A Rumours and discussion
M&A Transaction announcements etc…
Restructuring issues etc…
News data: Pre analysis of data


Relationship of different news items / Independence of
news… important consideration
Seasonality of news (Hafez, Lo, Moniz)


Need to be able to identify unexpected newsflow from
variation due to seasonality
Hourly, daily and weekly seasonality
 Intraday - larger volumes of newsflow just before opening
of European, US and Asian stockmarkets (Hafez)
News data: Pre analysis of data
Illustration of Seasonality (Hafez, RavenPack)
Outline

Introduction

News data



Determining sentiment scores




Data sources
Pre analysis of data
General overview
Das and Chen
Lo
Models and applications in summary form


(abnormal) Returns
Volatility and risk control

Desirable industry applications

Summary and discussions
Determining sentiment scores


Informational content of news: Converting qualitative data
into a quantitative form … challenging
Distinguish the sentiment of stories (positive/negative)


scale of positivity / negativity … sentiment scores
Consider the story’s context and language 

How positively/negatively human interprets story… emotive content
 Expert classification
 Psychosocial dictionaries e.g. General Inquirer
 Different groups of people effected by events differently or have
different interpretations of same events …conflicts may arise
Determining sentiment scores



Market based measures (Lo, Moniz et. al. and Lavernko)
 Markets’ lagged relative change in returns/volatility for a
particular asset (asset class)
Machine learning and natural language techniques can be
used, to determine sentiment of incoming stories
… sentiment indices over time
Index validation - To use index we must be able to find
relationship with relevant market variables
Outline

Introduction

News data



Determining sentiment scores




Data sources
Pre analysis of data
General overview
Das and Chen
Lo
Models and applications in summary form


(abnormal) Returns
Volatility and risk control

Desirable industry applications

Summary and discussions
Das and Chen

extract investor sentiment from stock message boards

for Morgan Stanley High Tech (MSH) Index


Web scraper program downloads tech sector message
board messages
Five algorithms with different conceptual underpinnings
are used to classify each message

Voting scheme is then applied
Das and Chen

Three supplementary databases




Dictionary – nature of the word, noun adjective, adverb.
Lexicon - collection of hand picked words which form
variables for statistical inference within the algorithms
Grammar – training corpus of base messages used in
determining in-sample statistical information. Applied for use
on the out-of-sample messages
Lexicon and grammar jointly determine the context of the
sentiment
Das and Chen

Five algorithms: (=Classifiers)
1. Naïve classifier

Based on word count of positive and negative connotation
words
2. Vector distance classifier



Each of the D words in the lexicon is assigned a dimension in
vector space
Each training message is pre classified as positive, negative
or neutral
Each new message is classified by comparison to the cluster
of pre trained vectors and is assigned the same classification
as that vector with which it has the smallest angle
Das and Chen
3. Discriminant based classifier


NC weights all words within the lexicon equally. The
discriminant based classification method replaces this simple
word count with a weighted word count.
The weights determine how well a particular lexicon word
discriminates between the different message categories
4. Adjective-adverb phrase classifier


This is based on the assumption that phrases which use
adjectives and adverbs emphasize sentiment and require
greater weight.
Uses a word count but uses only those words within phrases
containing adjectives and adverbs.
Das and Chen
5. Bayesian classifier




Given the class of each message in the training set we can
determine the frequency with which a lexical word appears
in a particular class.
For a new message we are able to compute the probability
it falls within a particular class given its component lexicon
words
The message is classified as being from the category with the
highest probability.
Voting scheme … final classification based on achieving
majority amongst classifiers


Reduces number of messages classified
Enhances classification accuracy
Das and Chen

Ambiguity - stock message boards messages often highly
ambiguous





Use General Inquirer … determine optimism score
Filter in and consider only most highly optimistic stories in
positive category
Filter in and consider only the most highly pessimistic scores
in the negative category
Number of false positive in classification declines
Disagreement – 0 no disagreement; 1 high disagreement
Das and Chen

Relationship between sentiment indices and market
variables ? Nature of sentiment index?

Positive sentiment bias
 Fig shows histogram of normalised sentiment for a
stock…positively skewed
 RavenPack find positive bias in classifiers … more marked
in bull markets
Das and Chen

Relationship between sentiment indices and market
variables





Sentiment and stock levels – are related …determining
precise nature of price relationship is difficult
Sentiment inversely related to disagreement
 Disgreement rises, sentiment falls
Sentiment correlated to posting volume
 Discussion increases, indicates optimism about stock is
rising
Strong relationship between message volume and volatility
(Antweiler and Frank (2004) also)
Strong relationship between trading volume and volatility
Outline

Introduction

News data



Determining sentiment scores




Data sources
Pre analysis of data
General overview
Das and Chen
Lo
Models and applications in summary form


(abnormal) Returns
Volatility and risk control

Desirable industry applications

Summary and discussions
Lo

Reuters NewsScope Event Indices (NEI) are constructed



to have predictive power for returns and realised volatility
integrated framework, returns and volatility used in
calibrating indices
News data

Reuters newsalerts -quick news flashes issued when
newsworthy events occur – timely and relevant
Tags machine readable

Headlines concise, small vocabulary…good for machine

learning analysis
Lo

The following parameters are used





List of keywords and phrases with real valued weights
A rolling “sentiment window” of size r (say 5/10 minutes)
A rolling calibration window of size R (say 90 days)
is the vector of keyword frequencies over
Raw score is defined as
this will tend to be high when news volume is high
…normalised score
Lo

Normalised score



At all times t in R days of calibration window record
 raw score
 news volume;
Normalised score determined by comparing current raw
score against raw scores where news volume equals current
news volume
St =0.92: 92 % of time news volume is at current level, the raw
score is less than it currently is.
Lo

Model calibration



Determine keywords
 Create list of keywords by hand
 Tool to extract news from periods when scores are high…
determine whether keywords are legitimate or need
adjusting
Optimal weights
for intraday return sentiment
index
 regress word frequencies against intraday returns
Optimal weights for intraday volatility sentiment index
 regress word frequencies against (deseasonalised)
intraday realised volatility
Lo

Model calibration


Determining optimal weights
more general
classification problem
Other techniques…machine learning…perceptron algorithm,
support vector machines…
Lo

Index validation – to establish empirical significance of
indices… event study analysis



Event is defined when (return/volatility sentiment) index
exceeds a threshold value (0.995)
Remove events that follow in less than one hour of another
event … consider only “new” events
Tests null hypothesis: Distribution of returns / deseasonalised
realised volatility is the same before / after an event.
 Visual inspection
 t –test for equality of means
 Levene’s test for change in standard deviation
 Chi – squared goodness of fit
Lo

Index validation – to establish empirical significance of
indices… event study analysis
Lo

Index validation – to establish empirical significance of
indices… event study analysis
RavenPack Sentiment Scores
Reuters NewsScope Sentiment
Engine
Outline

Introduction

News data



Determining sentiment scores




Data sources
Pre analysis of data
General overview
Das and Chen
Lo
Models and applications in summary form


(abnormal) Returns
Volatility and risk control

Desirable industry applications

Summary and discussions
Model & Applications… (abnormal )
Returns
Average Stock Price Reaction to Negative News Events
Source: Macquarie Quant Research –May 2009
Model & Applications… (abnormal )
Returns
Average Stock Price Reaction to Positive News Events
Source: Macquarie Quant Research –May 2009
Model & Applications… (abnormal )
Returns


Traders and quant managers … identify and exploit asset
mispricings before they correct … generate alpha
News data can be used

Stock picking and generating trading signal

Factor models

Exploit behavioural biases in investor decisions
Model & Applications… (abnormal )
Returns

Stock picking and generating trading signal

Li (2006) simple ranking procedure
 … identify stocks with positive and negative sentiment
 10 K SEC filings for non-financial firms 1994 – 2005
 Risk sentiment measure – count number of times words
risk, risks, risky, uncertain, uncertainty and uncertainties
appear in management discussion and analysis section


Strategy long in low risk sentiment stocks

short in high risk sentiment stocks

… reasonable level returns
Leinweber (2010) – event studies based on Reuters
NewsScope Sentiment Engine
Model & Applications… (abnormal )
Returns
Factor models




CAPM (Sharpe 1964; Lintner 1965), APT (Ross 1976)
…additional sources of information to market
“Profits may be viewed as the economic rents which accrue
to [the] competitive advantage of … superior information,
superior technology, financial innovation” (Lo )
Tetlock, Saar-Tsechansky and Mackassy (2008)
Investors’ perception … determined from… their
“information sets”
Model & Applications… (abnormal )
Returns
Factor models



“Information sets”
1. analysts forecasts,
2. quantifiable publicly disclosed accounting variables
3. linguistic descriptions of firm’s current and future profit
generating activities
If 1. and 2. are incomplete or biased, 3. may give
relevant information
MacQuarie Report Cahan et. al.,

News sentiment data in a multifactor models.
Results are positive … such an approach does add value.

In particular they note the value of this source of
information during the credit crisis, when determining
fundamentals (which traditional quant factors are based
on) was problematic.
Model & Applications… (abnormal )
Returns
Behavioural biases





Behavioural economists challenge the assumption that
markets act rationally … EMH  AMH ( Lo )
Propose individuals display certain biased behaviour
Due to biases they systematically deviate from optimal
(rational) trading behaviour
Use behavioural biases to explain (abnormal) returns, rather
than risk based explanations.
Model & Applications… (abnormal )
Returns
Behavioural biases



Odean and Barber (2007) find evidence individual investors
have a tendency to buy attention grabbing stocks.

Professional investors better equipped to assess a wider
range of stocks they are less prone to buying attention
grabbing stocks
Da, Engleberg and Gao also consider how the amount of
attention a stock received affects its cross-section of returns.

Use the frequency of Google searches for a particular
company as a measure of attention.

Find some evidence that changes in investor attention
can predict the cross-section of returns.
Model & Applications… (abnormal )
Returns
Behavioural biases



Chan (2003) finds stocks with major public news exhibit
momentum over the following month.

In contrast stocks with large price movements, but an
absence of news, tend to show return reversals in the
following month.

This would support a trading strategy based on
momentum reinforced with news signals.
Moniz et. al. (2009) finds a strategy based on earnings
momentum reinforced by newsflow is effective.
Outline

Introduction

News data



Determining sentiment scores




Data sources
Pre analysis of data
General overview
Das and Chen
Lo
Models and applications in summary form


(abnormal) Returns
Volatility and risk control

Desirable industry applications

Summary and discussions
Applications: Risk management
Traditionally historic asset price data has been used to
estimate risk measures.



Significant changes in the market environment



ex post retrospective measures
fail to account for developments in the market environment,
investor sentiment and knowledge
Traditional measures can fail to capture the true level of risk
(Mitra, Mitra and diBartolomeo 2009; diBartolomeo and
Warrick 2005)
Incorporating measures or observations of the market
environment in risk estimation is important
Applications: Risk management
The risk structure of assets may change over time


Patton and Verardo find news impacts beta of stocks and in
particular most of beta increase comes from rising
covariance, suggesting there is contagion in information
content of news releases.
Applications: Risk management
Relationship between information release and volatility
widely reported




Ederington and Lee (1993) macro economic
announcements and foreign exchange and interest rate
futures
Stock message board activity is a good predictor of
volatility Antweiler and Frank (2004); Wysocki (1999)
GARCH model with news inputs
Kalev et al. (2004); Robertson, Geva and Wolff (2007)
Outline

Introduction

News data



Determining sentiment scores




Data sources
Pre analysis of data
General overview
Das and Chen
Lo
Models and applications in summary form


(abnormal) Returns
Volatility and risk control

Desirable industry applications

Summary and discussions
Desirable Industry Applications
1. Enhanced Strategies ( Asset Management)
Low Frequency  Portfolio (rebalancing) early trigger
based on “draw down” rules/risk.

High Frequency

•
Trading “wish to” trade signals.
•
Trading “have to/need to trade sell and buy” signals.
•
News analytics market views taken into consideration for
the “optimal trade execution” algorithms.
{ VWAP, Almgren & Chriss, Lo & Bertsimas }
Desirable Industry Applications
2. Risk Control and Compliance.


improved short term risk estimate.
Enhanced downside risk estimate;
(improving scenario generators by using sentiment scores).
???

Wolf Detection;
Signal to stop trading in a specific stock/asset.
Desirable Industry Applications
3. Post trade analysis (reporting).
4. Refine fundamental research ( results /figures)
5. Use by regulator/public body (government
treasuries) to take a prior view of the “impact”
of (economic and other) announcements
Outline

Introduction

News data



Determining sentiment scores




Data sources
Pre analysis of data
General overview
Das and Chen
Lo
Models and applications in summary form


(abnormal) Returns
Volatility and risk control

Desirable industry applications

Summary and discussions
Summary & discussions


Applications of (semi-)automated news analytics
in finance are growing in importance.
Pay back can be substantial to:

Investment Managers

Traders

Internal Risk Auditors

Regulators
Summary & discussions

Knowledge and Skills from three different
disciplines:

Information Systems.

Artificial Intelligence.

Financial Engineering & quantitative modelling
(including behavioural finance).
are required in various degrees to progress the
field/make substantial impact.
THANK YOU FOR YOUR ATTENTION …ANY QUESTIONS…?