POST MODERN PORTFOLIO THEORY: ACHIEVING SUPERIOR UPSIDE
Download
Report
Transcript POST MODERN PORTFOLIO THEORY: ACHIEVING SUPERIOR UPSIDE
Applications of
news analytics in finance:
a review
Gautam Mitra
Co-author Leela Mitra
Summary and scope
In this talk we set out a structured (reading) guide to
the published research outputs: Journal papers, white
papers, case studies which are emerging in the
domain of “news analytics” applied to finance.
We aim to provide insight into the subtle interplay of
information technology (including AI), the
quantitative models and behavioural biases in the
context of trading and investment decisions.
Applications such as low frequency and high
frequency trading are presented; some
desirable/potential applications are discussed.
Outline
Introduction
News data
Determining sentiment scores
Data sources
Pre analysis of data
General overview
Das and Chen
Lo
Models and applications in summary form
(abnormal ) Returns
Volatility and risk control
Desirable industry applications
Summary and discussions
Introduction
News.
Market Environment.
Sentiment.
Investment Decisions.
Risk Control.
Introduction
Traders [ High Frequency ]
Fund Managers [ Low Frequency ]
Desktop
• Market Data
• NewsWire
Data WareHouse
DataMart
Introduction
R & D Challenge Identify Killer Application
Smart investors rapidly analyse/digest information.
News stories/announcements.
Stock price moves (market reactions).
Act promptly to take trading/investment decisions.
Can a machine act intelligently(AI) to compete or
outsmart humans ?
Introduction
At least can we have IT/AI tools which help humans make
good investment decisions?
Intelligence Amplification
<Gearing… engineering concept>
Thus three disciplines converge;
Information Systems
AI, in particular, Natural Language Processing
Financial Engineering/quantitative Modelling
( including behavioural finance )
Introduction
Mainstream
News
Pre-News
Web 2.0
Social Media
Pre-Analysis
Classifiers
Sentiment
Scores
(Numeric) financial
market data
Analysis
Consolidated
Datamart
Updated beliefs,
Ex-ante view of market
environment
Quant Models
1. Return Predictions
2. Fund Management /
Trading Decisions
3. Volatility estimates
and risk control
Data analysis Datamart quant models
Outline
Introduction
News data
Determining sentiment scores
Data sources
Pre analysis of data
General overview
Das and Chen
Lo
Models and applications in summary form
(abnormal) Returns
Volatility and risk control
Desirable industry applications
Summary and discussions
News data: Data sources
Sources of news/informational flows (Leinweber)
News: Mainstream media, reputable sources.
Newswires to traders desks.
Newspapers, radio and TV.
Pre-News: Source data
SEC reports and filings. Government agency reports.
Scheduled announcements, macro economic news,
industry stats, company earnings reports…
Social media: Blogs, websites and message boards
Quality can vary significantly
Barriers to entry low
Human behaviour and agendas
News data: Data sources
Web based news
Individual investors pay more attention than institutional
investors (Das and Rieger)
“Collective Intelligence” large group of people (no ulterior
motives) their collective opinion may be useful.
SEC does monitor message boards
Far from perfect vetting of information.
Financial news can be split between
Scheduled news (Synchronous)
Unscheduled news (Asynchronous, event driven)
News data: Data sources
Scheduled news (Synchronous)
Arrives at pre scheduled times
Much of pre news
Structured format
Often basic numerical format
Typically macro economic announcements and earnings
announcements
News data: Data sources
Macro economic announcements
Widely used in automated trading
Impact large and most liquid markets (foreign exchange,
Govt. debt, futures markets)
Naturally affects trading strategies.
Speed and accuracy are key... technology requirements
substantial
Providers in this space
Trade the News, Need to Know News, Market News
International, Thomson Reuters, Dow Jones, Bloomberg…
Earnings announcements
Directly influences stock prices’
Widely anticipated and used in trading strategies
News data: Data sources
Unscheduled news (Asynchronous, event driven)
Arrives unexpectedly over time
Mainstream news and social media
Unstructured, qualitative, textual form
Non-numeric
Difficult to process quickly and quantitatively
May contain information about effect and cause of an
event
To be applied in quant models needs to be converted to an
input time series
Outline
Introduction
News data
Determining sentiment scores
Data sources
Pre analysis of data
General overview
Das and Chen
Lo
Models and applications in summary form
(abnormal) Returns
Volatility and risk control
Desirable industry applications
Summary and discussions
News data: Pre analysis of data
Collecting, cleaning and analysing news data …challenging
Major newswire providers collect news from a wide range of
sources e.g. Factiva database from Dow Jones, news from 400
sources
Tagging – Machine readable meta data
Major newswire providers tag incoming news stories
Reporters tag stories as they enter them to system
Machine learning techniques also used to identify relevant
tags (RavenPack)
Unstructured stories into basic machine readable form
Tags held in XML < standard for meta-data exchange>
Reveals story’s topic areas and other useful meta data
News data: Pre analysis of data
Need to identify news which is relevant and current
“Information events” distinguish stories reporting on old news
from genuinely “new” news
Tetlock et al. event study shows “information leakage”
News data: Pre analysis of data
Need to identify news which is relevant and current
Reuters give for each article
Relevance scores … measures by how much the article is
about a particular company
Novelty/uniqueness determines the repetition among
articles
RavenPack
Distinguish stories which are events
Carry first mention of a particular theme
Stories which are not events are excluded
To minimise number of duplicate stories
News data: Pre analysis of data
Classification of news
Tagged stories provide hundreds of event types
Need to distinguish what types of news are relevant to our
application
Market may react differently to different types of news
e.g. Moniz et. al. find market reacts more strongly to
earnings news than strategic news
Different news is available for different assets
Larger companies with more liquid stock, tend to have
higher news coverage
News data: Pre analysis of data
Classification of news
Accounting related news
Earnings
Trading updates
Announcements of earnings
Restatements of Operating Results etc..
Announcements of Sales/Trading Statement etc…
Strategic news
M&A Related
M&A Rumours and discussion
M&A Transaction announcements etc…
Restructuring issues etc…
News data: Pre analysis of data
Relationship of different news items / Independence of
news… important consideration
Seasonality of news (Hafez, Lo, Moniz)
Need to be able to identify unexpected newsflow from
variation due to seasonality
Hourly, daily and weekly seasonality
Intraday - larger volumes of newsflow just before opening
of European, US and Asian stockmarkets (Hafez)
News data: Pre analysis of data
Illustration of Seasonality (Hafez, RavenPack)
Outline
Introduction
News data
Determining sentiment scores
Data sources
Pre analysis of data
General overview
Das and Chen
Lo
Models and applications in summary form
(abnormal) Returns
Volatility and risk control
Desirable industry applications
Summary and discussions
Determining sentiment scores
Informational content of news: Converting qualitative data
into a quantitative form … challenging
Distinguish the sentiment of stories (positive/negative)
scale of positivity / negativity … sentiment scores
Consider the story’s context and language
How positively/negatively human interprets story… emotive content
Expert classification
Psychosocial dictionaries e.g. General Inquirer
Different groups of people effected by events differently or have
different interpretations of same events …conflicts may arise
Determining sentiment scores
Market based measures (Lo, Moniz et. al. and Lavernko)
Markets’ lagged relative change in returns/volatility for a
particular asset (asset class)
Machine learning and natural language techniques can be
used, to determine sentiment of incoming stories
… sentiment indices over time
Index validation - To use index we must be able to find
relationship with relevant market variables
Outline
Introduction
News data
Determining sentiment scores
Data sources
Pre analysis of data
General overview
Das and Chen
Lo
Models and applications in summary form
(abnormal) Returns
Volatility and risk control
Desirable industry applications
Summary and discussions
Das and Chen
extract investor sentiment from stock message boards
for Morgan Stanley High Tech (MSH) Index
Web scraper program downloads tech sector message
board messages
Five algorithms with different conceptual underpinnings
are used to classify each message
Voting scheme is then applied
Das and Chen
Three supplementary databases
Dictionary – nature of the word, noun adjective, adverb.
Lexicon - collection of hand picked words which form
variables for statistical inference within the algorithms
Grammar – training corpus of base messages used in
determining in-sample statistical information. Applied for use
on the out-of-sample messages
Lexicon and grammar jointly determine the context of the
sentiment
Das and Chen
Five algorithms: (=Classifiers)
1. Naïve classifier
Based on word count of positive and negative connotation
words
2. Vector distance classifier
Each of the D words in the lexicon is assigned a dimension in
vector space
Each training message is pre classified as positive, negative
or neutral
Each new message is classified by comparison to the cluster
of pre trained vectors and is assigned the same classification
as that vector with which it has the smallest angle
Das and Chen
3. Discriminant based classifier
NC weights all words within the lexicon equally. The
discriminant based classification method replaces this simple
word count with a weighted word count.
The weights determine how well a particular lexicon word
discriminates between the different message categories
4. Adjective-adverb phrase classifier
This is based on the assumption that phrases which use
adjectives and adverbs emphasize sentiment and require
greater weight.
Uses a word count but uses only those words within phrases
containing adjectives and adverbs.
Das and Chen
5. Bayesian classifier
Given the class of each message in the training set we can
determine the frequency with which a lexical word appears
in a particular class.
For a new message we are able to compute the probability
it falls within a particular class given its component lexicon
words
The message is classified as being from the category with the
highest probability.
Voting scheme … final classification based on achieving
majority amongst classifiers
Reduces number of messages classified
Enhances classification accuracy
Das and Chen
Ambiguity - stock message boards messages often highly
ambiguous
Use General Inquirer … determine optimism score
Filter in and consider only most highly optimistic stories in
positive category
Filter in and consider only the most highly pessimistic scores
in the negative category
Number of false positive in classification declines
Disagreement – 0 no disagreement; 1 high disagreement
Das and Chen
Relationship between sentiment indices and market
variables ? Nature of sentiment index?
Positive sentiment bias
Fig shows histogram of normalised sentiment for a
stock…positively skewed
RavenPack find positive bias in classifiers … more marked
in bull markets
Das and Chen
Relationship between sentiment indices and market
variables
Sentiment and stock levels – are related …determining
precise nature of price relationship is difficult
Sentiment inversely related to disagreement
Disgreement rises, sentiment falls
Sentiment correlated to posting volume
Discussion increases, indicates optimism about stock is
rising
Strong relationship between message volume and volatility
(Antweiler and Frank (2004) also)
Strong relationship between trading volume and volatility
Outline
Introduction
News data
Determining sentiment scores
Data sources
Pre analysis of data
General overview
Das and Chen
Lo
Models and applications in summary form
(abnormal) Returns
Volatility and risk control
Desirable industry applications
Summary and discussions
Lo
Reuters NewsScope Event Indices (NEI) are constructed
to have predictive power for returns and realised volatility
integrated framework, returns and volatility used in
calibrating indices
News data
Reuters newsalerts -quick news flashes issued when
newsworthy events occur – timely and relevant
Tags machine readable
Headlines concise, small vocabulary…good for machine
learning analysis
Lo
The following parameters are used
List of keywords and phrases with real valued weights
A rolling “sentiment window” of size r (say 5/10 minutes)
A rolling calibration window of size R (say 90 days)
is the vector of keyword frequencies over
Raw score is defined as
this will tend to be high when news volume is high
…normalised score
Lo
Normalised score
At all times t in R days of calibration window record
raw score
news volume;
Normalised score determined by comparing current raw
score against raw scores where news volume equals current
news volume
St =0.92: 92 % of time news volume is at current level, the raw
score is less than it currently is.
Lo
Model calibration
Determine keywords
Create list of keywords by hand
Tool to extract news from periods when scores are high…
determine whether keywords are legitimate or need
adjusting
Optimal weights
for intraday return sentiment
index
regress word frequencies against intraday returns
Optimal weights for intraday volatility sentiment index
regress word frequencies against (deseasonalised)
intraday realised volatility
Lo
Model calibration
Determining optimal weights
more general
classification problem
Other techniques…machine learning…perceptron algorithm,
support vector machines…
Lo
Index validation – to establish empirical significance of
indices… event study analysis
Event is defined when (return/volatility sentiment) index
exceeds a threshold value (0.995)
Remove events that follow in less than one hour of another
event … consider only “new” events
Tests null hypothesis: Distribution of returns / deseasonalised
realised volatility is the same before / after an event.
Visual inspection
t –test for equality of means
Levene’s test for change in standard deviation
Chi – squared goodness of fit
Lo
Index validation – to establish empirical significance of
indices… event study analysis
Lo
Index validation – to establish empirical significance of
indices… event study analysis
RavenPack Sentiment Scores
Reuters NewsScope Sentiment
Engine
Outline
Introduction
News data
Determining sentiment scores
Data sources
Pre analysis of data
General overview
Das and Chen
Lo
Models and applications in summary form
(abnormal) Returns
Volatility and risk control
Desirable industry applications
Summary and discussions
Model & Applications… (abnormal )
Returns
Average Stock Price Reaction to Negative News Events
Source: Macquarie Quant Research –May 2009
Model & Applications… (abnormal )
Returns
Average Stock Price Reaction to Positive News Events
Source: Macquarie Quant Research –May 2009
Model & Applications… (abnormal )
Returns
Traders and quant managers … identify and exploit asset
mispricings before they correct … generate alpha
News data can be used
Stock picking and generating trading signal
Factor models
Exploit behavioural biases in investor decisions
Model & Applications… (abnormal )
Returns
Stock picking and generating trading signal
Li (2006) simple ranking procedure
… identify stocks with positive and negative sentiment
10 K SEC filings for non-financial firms 1994 – 2005
Risk sentiment measure – count number of times words
risk, risks, risky, uncertain, uncertainty and uncertainties
appear in management discussion and analysis section
Strategy long in low risk sentiment stocks
short in high risk sentiment stocks
… reasonable level returns
Leinweber (2010) – event studies based on Reuters
NewsScope Sentiment Engine
Model & Applications… (abnormal )
Returns
Factor models
CAPM (Sharpe 1964; Lintner 1965), APT (Ross 1976)
…additional sources of information to market
“Profits may be viewed as the economic rents which accrue
to [the] competitive advantage of … superior information,
superior technology, financial innovation” (Lo )
Tetlock, Saar-Tsechansky and Mackassy (2008)
Investors’ perception … determined from… their
“information sets”
Model & Applications… (abnormal )
Returns
Factor models
“Information sets”
1. analysts forecasts,
2. quantifiable publicly disclosed accounting variables
3. linguistic descriptions of firm’s current and future profit
generating activities
If 1. and 2. are incomplete or biased, 3. may give
relevant information
MacQuarie Report Cahan et. al.,
News sentiment data in a multifactor models.
Results are positive … such an approach does add value.
In particular they note the value of this source of
information during the credit crisis, when determining
fundamentals (which traditional quant factors are based
on) was problematic.
Model & Applications… (abnormal )
Returns
Behavioural biases
Behavioural economists challenge the assumption that
markets act rationally … EMH AMH ( Lo )
Propose individuals display certain biased behaviour
Due to biases they systematically deviate from optimal
(rational) trading behaviour
Use behavioural biases to explain (abnormal) returns, rather
than risk based explanations.
Model & Applications… (abnormal )
Returns
Behavioural biases
Odean and Barber (2007) find evidence individual investors
have a tendency to buy attention grabbing stocks.
Professional investors better equipped to assess a wider
range of stocks they are less prone to buying attention
grabbing stocks
Da, Engleberg and Gao also consider how the amount of
attention a stock received affects its cross-section of returns.
Use the frequency of Google searches for a particular
company as a measure of attention.
Find some evidence that changes in investor attention
can predict the cross-section of returns.
Model & Applications… (abnormal )
Returns
Behavioural biases
Chan (2003) finds stocks with major public news exhibit
momentum over the following month.
In contrast stocks with large price movements, but an
absence of news, tend to show return reversals in the
following month.
This would support a trading strategy based on
momentum reinforced with news signals.
Moniz et. al. (2009) finds a strategy based on earnings
momentum reinforced by newsflow is effective.
Outline
Introduction
News data
Determining sentiment scores
Data sources
Pre analysis of data
General overview
Das and Chen
Lo
Models and applications in summary form
(abnormal) Returns
Volatility and risk control
Desirable industry applications
Summary and discussions
Applications: Risk management
Traditionally historic asset price data has been used to
estimate risk measures.
Significant changes in the market environment
ex post retrospective measures
fail to account for developments in the market environment,
investor sentiment and knowledge
Traditional measures can fail to capture the true level of risk
(Mitra, Mitra and diBartolomeo 2009; diBartolomeo and
Warrick 2005)
Incorporating measures or observations of the market
environment in risk estimation is important
Applications: Risk management
The risk structure of assets may change over time
Patton and Verardo find news impacts beta of stocks and in
particular most of beta increase comes from rising
covariance, suggesting there is contagion in information
content of news releases.
Applications: Risk management
Relationship between information release and volatility
widely reported
Ederington and Lee (1993) macro economic
announcements and foreign exchange and interest rate
futures
Stock message board activity is a good predictor of
volatility Antweiler and Frank (2004); Wysocki (1999)
GARCH model with news inputs
Kalev et al. (2004); Robertson, Geva and Wolff (2007)
Outline
Introduction
News data
Determining sentiment scores
Data sources
Pre analysis of data
General overview
Das and Chen
Lo
Models and applications in summary form
(abnormal) Returns
Volatility and risk control
Desirable industry applications
Summary and discussions
Desirable Industry Applications
1. Enhanced Strategies ( Asset Management)
Low Frequency Portfolio (rebalancing) early trigger
based on “draw down” rules/risk.
High Frequency
•
Trading “wish to” trade signals.
•
Trading “have to/need to trade sell and buy” signals.
•
News analytics market views taken into consideration for
the “optimal trade execution” algorithms.
{ VWAP, Almgren & Chriss, Lo & Bertsimas }
Desirable Industry Applications
2. Risk Control and Compliance.
improved short term risk estimate.
Enhanced downside risk estimate;
(improving scenario generators by using sentiment scores).
???
Wolf Detection;
Signal to stop trading in a specific stock/asset.
Desirable Industry Applications
3. Post trade analysis (reporting).
4. Refine fundamental research ( results /figures)
5. Use by regulator/public body (government
treasuries) to take a prior view of the “impact”
of (economic and other) announcements
Outline
Introduction
News data
Determining sentiment scores
Data sources
Pre analysis of data
General overview
Das and Chen
Lo
Models and applications in summary form
(abnormal) Returns
Volatility and risk control
Desirable industry applications
Summary and discussions
Summary & discussions
Applications of (semi-)automated news analytics
in finance are growing in importance.
Pay back can be substantial to:
Investment Managers
Traders
Internal Risk Auditors
Regulators
Summary & discussions
Knowledge and Skills from three different
disciplines:
Information Systems.
Artificial Intelligence.
Financial Engineering & quantitative modelling
(including behavioural finance).
are required in various degrees to progress the
field/make substantial impact.
THANK YOU FOR YOUR ATTENTION …ANY QUESTIONS…?