Harhoff-Hoisl Geneva Presentation

Download Report

Transcript Harhoff-Hoisl Geneva Presentation

European Patent Citations How to count and how
to interpret them?
Dietmar Harhoff
Ludwig-Maximilian University Munich (INNO-tec)
ZEW and CEPR and EPIP 
Karin Hoisl
Ludwig-Maximilian University Munich (INNO-tec)
and EPIP 
Colin Webb
OECD
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
Overview

Motivation

Specificities of the EPO Search Process

Raw Data

How to ... Issues

Results

Still to Be Tackled

Where to Get the Data? And when?
2
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
Motivation




Patent citations are widely used by economists (and
others) in empirical analysis.
The NBER data have had a tremendous impact. They
represent a true pioneering effort with high social benefits.
But European patent citations differ considerably from US
citations. Many issues have been left unresolved.
Many pitfalls (as we painfully learned) – even suggesting
that the pioneering NBER data have problems which
should be corrected in the second round using some of the
insights described here.
3
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
Specificities of the EPO Search Process
Institutional Framework

The Search Process follows the Guidelines for
Examination in the European Patent Office.

Historically, the overall responsibility for search has been
with the Directorate General for Searching in The Hague.

Under BEST (Bringing Examination and Search
Together), the borderline between search and
examination disappears.

Main objective: discovering prior art relevant for
determining whether the invention meets the novelty
and inventive step requirements.

The search is conducted in the basis of the claims.
4
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
Specificities of the EPO Search Process
Termination and Reporting

The Search will be terminated if



the probability of discovering further relevant documents is
very low compared to the effort needed, or
documents are discovered which doubtlessly demonstrate a
violation of patentability requirements.
The Search Report should only include

the most important documents,

one of several documents of equal importance,

the earlier document of two equally important documents.
According to EPO philosophy a good search report
contains all relevant information within a minimum
number of citations (Michel/Bettels, 2001)
5
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
Specificities of the EPO Search Process
Classification of References
X
particularly relevant documents when taken alone (implies: the claimed invention
cannot be considered novel or cannot be considered to involve an inventive step)
Y
particularly relevant if combined with another document of the same category
A
documents defining the general state of the art
O
documents referring to non-written disclosure
P
intermediate documents (documents published between the date of filing and the
priority date)
T
documents relating to theory or principle underlying the invention (documents
which were published after the filing date and are not in conflict with the
application, but were cited for a better understanding of the invention)
E
potentially conflicting patent documents, published on or after the filing date of
the underlying invention
D
document already cited in the application
L
document cited for other reasons (e.g., a document which may throw doubt on a
priority claim)
6
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
Specificities of the EPO Search Process
Consequences for Citation Analysis (1/2)


Difference in interpretation between examiner and
applicant citations

the examiner has to ensure the novelty and inventive step of the
invention

the applicant cites work that is related but significantly different from his
invention

the EPOLINE/REFI data also include references made by the applicant
– all issues raised here apply to them, but we ignore them here unless
the searcher/examiner has adopted them (D references)
The date of filing of the EP application is used as a
reference date for the search

Referenced documents published between the priority date and the filing
date may lead to negative citation lags, but they are not taken to
threaten novelty.
7
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
Specificities of the EPO Search Process
Consequences for Citation Analysis (2/2)

Referencing no more than what is absolutely necessary
and obligation to favor early documents over later ones


Examiners should preferably reference documents in the
language of the application.


other documents important for the economic analysis may not appear in the
list of references
overestimation of the influence of the applicant’s home country
and possible distortion in citation counts (see below)
The search only identifies documents to which
searchers/examiners have access

trivial, but important: documents not accessible in databases will not appear
in the “paper trail”
8
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
Raw Data




OECD/EPO data (described in OECD Discussion
Paper Webb/Dernis/Harhoff/Hoisl)
EPOLINE references (12/2004) – updated and
checked with REFI (07/2005)
EPOLINE data on procedural aspects (search
dates)
OPS/ESPACE data on other than WO/EP
documents
9
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
10
priority date
How to ... issues
publication of patent document
(grant in the U.S.)
1 – how to deal with timing?
search report published
publication of supplementary search
report
date of publication of granted patent
18 mths
“citing EP/WO patent”
“cited DE patent”
18 mths
“cited US patent” (with PCT equivalent)
?
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
11
How to ... issues
1 – how to deal with timing?

What is our objective?





time the “knowledge flow”? (date of the invention – date of the
referenced invention/patent becoming public)
compute time between inventions/state of the art used to
characterize inventions? (Ddate of invention = Ddate of priority =
Ddate of publication – at least for European patents)
other?
“wisest” solution might be to take differences between priority dates
How do we deal with references to different “incarnations” of the
same application (about 19.200 cases)?

Example from the database
referencing patent
EP0106446A1
EP0056080A1


referenced patent
DE1947057B
DE1947057A
publ. date
19760318
19700326
appl. date
19690917
19690917
Here: compute lag as difference between (earliest) publication
date of referencing search report and (earliest) publication date
of referenced document/incarnation
Note: this does not solve the problem with U.S. data (next step:
take differences in (earliest) priority dates).
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
How to ... issues
2 – where to get the data on non-EP/WO documents?

What do we need (ideally)?








(earliest) priority date, priority identification
application date
publication date
applicant name (for later: identify “self-referencing”)
3,715,484 unique non-EP/WO entries drawn from OPS
server, representing 6,623,877 references
data for 2,223,435 EP/WO entries known from EPO data
still missing: 767,122 documents
full information for 8,130,010 references for which we can
compute correct citation lags
12
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
How to ... issues
2 – where to get the data on non-EP/WO documents?

cutting down data


WO publications which do not enter the regional phase at the
EPO are excluded
6,740,846 entries remaining
• international (WO) search reports (n=2,416,318)
• EP search reports (n=4,324,528) – some of them supplementary
search reports for WO applications

referenced patents
• 1,228,165 EP documents (usually used for analysis)
•
559,423 WO documents (sometimes included)
• 4,953,258 other documents (lost in most studies)
• US: 2,423,267 – JP: 650,334 – DE: 834,617
– FR: 448,599 – GB: 417,864
• for 4,357,543 we obtained procedural and applicant data from
OPS
• meaning: we can compute citation lags (and other information)
for 6,198,413 (91.96%) of 6,740,846 references
comparison of citation lags
with and w/o “complete” data
13
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
How to ... issues
3 – how to count references and citations?

What do we want to compute?




counts of a particular incarnation of a document
count of a particular invention (and the associated property
right) being named as relevant prior art
Clearly, it is the second objective.
HHW’s Rule of Counting Citations
A reference to an X-system document should be taken as a valid
citation count of a particular Z-system patent if the X-system
document is an equivalent of the Z-system patent.


many issues with equivalents (e.g. multiple equivalents see paper)
data on equivalents obtained from OPS/ESPACE
comparison of citation count distribution with
and without corrections
14
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
How to ... issues
4 – how to deal with the NPL references?





total: 1,149,955 NPL references (after taking out
references to “Patent Abstracts of Japan” which are
implicitly patent literature references)
“highest” EP publication number included: EP1589793
59,433 references not from examiner/searcher (dropped)
239,135 references from WO documents which do not
enter the regional phase at the EPO (dropped)
remaining: 851,387 NPL references
• 525,664 references from EP search reports
• 383,247 from international (WO) search reports

on average: 31.3% X refs, 17.1% Y refs, 45.6% A refs,
6.0% other refs and 8.8% D refs
structure of NPL
reference types over time
15
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
How to ... issues
5 – how to get the date information?




highest publication number EP1474833, data for
1,452,041 documents (currently about 100,000
documents fewer than we have citation data for)
last 2005 date: June 7, 2005
dates of priority, application, publication, publication of
search reports (international, supplementary, EP), grant
etc.
matched with citations data to compute lags (by year)
16
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
17
Citation Lags
.1
lower quartile
2.28 yrs
median
4.03 yrs
mean
5.03 yrs
upper quartile
6.93 yrs
0
.05
Density
.15
.2
With and without corrections
0
5
10
15
Citation Lag (Years)
20
25
Source: Harhoff/Hoisl/Webb (2004) – authors’ computations based on EPO/OECD citation database.
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
18
Citation Lags
With and w/o use of references to non-EP/no-WO documents
percentile
lags of EP/WO
to EP/WO only
lags of EP/WO to
non-EP/non-WO
all citation
lags
1%
0.3 yrs
0.7 yrs
0.6 yrs
5%
1.0 yrs
1.6 yrs
1.5 yrs
10%
1.5 yrs
2.1 yrs
1.9 yrs
25%
2.3 yrs
3.8 yrs
3.2 yrs
50%
4.0 yrs
8.7 yrs
6.8 yrs
75%
7.0 yrs
17.8 yrs
14.3 yrs
90%
10.6 yrs
29.7 yrs
25.8 yrs
95%
13.0 yrs
41.2 yrs
35.9 yrs
99%
17.5 yrs
64.7 yrs
61.8 yrs
max
25.7 yrs
132.0 yrs
132.0 yrs
N
1,438,670
4,203,811
5,642,481
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
19
Citation Lags
With and w/o use of references to non-EP/no-WO documents
0
.1
.2
1
0
Density
.1
.2
0
0
20
40
citlag
Graphs by Citation lags for EP/WO only
60
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
20
Citation Lags
more details

among the oldest prior art referenced




WO1997034383 references US285345A (publication
date 18.9.1883) as a Y reference.
WO2001058249 references DE45870C (publication date
12.1.1889) as an A reference.
EP1408383A1 references CH1473A (publication date
31.5.1890) as an X reference.
reference type and citation lag



X: p25=3.13, p50=6.39, p75=13.48
Y: p25=2.72, p50=5.98, p75=13.60
A: p25=3.51, p50=7.33, p75=15.05
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
21
Non-Patent Literature
Classification of referenced NPL sources over time
NPL Reference Classifications
N= 785,383
100%
90%
80%
60%
50%
40%
30%
20%
10%
Priority Year
X-type NPL references
Y-type NPL references
A-type NPL referencs
Other NPL references
02
20
01
20
99
00
20
19
98
19
97
19
96
19
95
19
94
19
93
19
92
19
91
19
90
19
89
19
88
19
87
19
86
19
85
19
84
19
83
19
82
19
81
19
80
19
79
19
78
19
77
0%
19
Share
70%
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
22
Non-Patent Literature
Classification of referenced NPL sources over time
NPL Reference Classifications - USPTO as ISA
N= 105,506
100%
90%
80%
70%
Share
60%
50%
40%
30%
20%
10%
0%
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
Priority Year
X-type NPL references
Y-type NPL references
A-type NPL referencs
Other NPL references
1999
2000
2001
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
23
Non-Patent Literature
Classification of referenced NPL sources over time
NPL Reference Classifications - All ISAs but USPTO
100%
N=282,889
80%
Share
60%
40%
20%
0%
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
Priority Year
X-type NPL references
Y-type NPL references
A-type NPL referencs
Other NPL references
2000
2001
2002
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
24
Patent Literature Data
Classification of Patent Literature References – Raw Data
34.7%
US
WO
00
20
98
19
96
19
94
92
90
OT
19
JP
19
GB
19
88
19
FR
86
84
19
82
EP
19
DE
19
80
21.2%
19
19
78
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
25
Patent Literature Data
Classification of Patent Literature References – (Partial) HHW Rule
29.0%
DE
EP
FR
GB
JP
OT
US
WO
00
20
98
19
96
19
94
19
92
19
90
19
88
19
86
19
84
19
82
19
80
38.2%
19
19
78
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
26
Patent Literature Data
Source of EP Patent References – Overall Comparison
Source of Reference
DE
EP
FR
GB
JP
OT
US
WO
without
13.7 19.0
8.2
7.3
8.8
3.1
33.9
6.0
with
13.2 28.8
8.2
7.3
8.2
3.1
29.9
1.3
(partial)
correction for
equivalents
After (partially) correcting for equivalents,
the share of within-EP referencing
increases from 19.0% to 28.8% of all
patent references
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
27
Patent Literature Data
Average Share of X-Type References
30
25
Share (%)
Patent Quantiy and Patent Quality in Europe, forthcoming
in: Brian Kahin and Dominque Foray (eds.), Advancing
Knowledge and the Knowledge Economy, MIT Press (2006).
Assessing the Quality of Incoming Applications
20
15
10
5
0
1980
1982
1984
1986
1988
1990
1992
1994
1996
Application Year
US-Priority
JP-Priority
DE-Priority
1998
2000
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
28
Patent Literature Data
Comparing citation counts – with and w/o corrections
percentile
EP to EP only
EP/WO to EP/WO
with partial
HHW rule
1%
0.3 yrs
0.7 yrs
0.6 yrs
5%
1.0 yrs
1.6 yrs
1.5 yrs
10%
1.5 yrs
2.1 yrs
1.9 yrs
25%
2.3 yrs
3.8 yrs
3.2 yrs
50%
4.0 yrs
8.7 yrs
6.8 yrs
75%
7.0 yrs
17.8 yrs
14.3 yrs
90%
10.6 yrs
29.7 yrs
25.8 yrs
95%
13.0 yrs
41.2 yrs
35.9 yrs
99%
17.5 yrs
64.7 yrs
61.8 yrs
max
25.7 yrs
132.0 yrs
132.0 yrs
N
1,438,670
4,203,811
5,642,481
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
29
Still to be tackled
“self-citations” and “self-references”




misnomer to start with – term applies only for applicantinitiated references
For determining whether the reference points to prior art
were generated by the applicant itself (“self-reference”),
we have to get the name of the applicant of the referenced
document.
Similar for measures of “originality” – we need to get the
IPC codes of the referencing and the referenced document.
We have downloaded them – and are in the process of
computing the information.
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
Where to get the data? And when?

complete citation data up to 2001 from Colin Webb at
OECD - these data do not have the citations lags for
non-EP/WO references

extended dataset with citations up to November 2004
from Dietmar Harhoff at INNO-tec

new data with references till mid-2005 soon (to be done
by April 15 or never)

to be published on


the SING site at CEPR (http://wiki.cepr.org/sing/) – note:
there is a WIKI which allows for comments etc.
the INNO-tec site (http://www.inno-tec.de)
30
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
Interpreting Citation Indicators





Assessing the Quality of Incoming Applications
Patent Characteristics by Applicant Type
Citations in Value Equations
Citations in Opposition Likelihood Equations
Citations in Examination Duration Equations
31
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
32
Interpreting Citation Indicators
Citation Statistics by Type of Patent Holder
Variable
Independent
Inventor
University
Corporate
(N=39,071)
(N=5,434)
(N=550,144)
4.88
3.58
4.15
X-references
0.68
0.73
0.56
self-references
0.04
0.05
0.14
NPL references
0.36
2.37
0.73
citations
1.30
2.73
1.81
X-citations
0.10
0.39
0.17
self-citations
0.03
0.05
0.13
3.83
11.50
6.85
9.3
10.9
7.3
4.2 yrs
5.5 yrs
4.3 yrs
references
2nd order citations
claims
grant lag
Source: Harhoff/Hoisl/Webb (2005) – authors’ computations based on EPO/OECD citation
database.
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
33
Interpreting Citation Indicators
Source: Harhoff/Hoisl/Webb (2005) – marginal effects (tstats) authors’ computations based on EPO/OECD citation
database.
Citation Indicators in Value Equations
Variable
references
patent value - German
PATVAL study
(ordered probit, N=3,350)
n.s.
X-share
n.s.
Y-share
n.s.
share self references
n.s.
citations
0.0482 (3.74)
X-share
n.s.
Y-share
n.s.
share self citations
n.s.
2nd order citations
NPL references
control variables
pseudo-R-squared
0.0044 (2.52)
n.s.
technical field, year dummies,
add. variables
0.0062
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
34
Interpreting Citation Indicators
Citation Indicators in Opposition Equations
Source: Harhoff/Hoisl/Webb (2005) – marginal effects (tstats) authors’ computations based on EPO/OECD citation
database.
Variable
opposition incidence
(binary probit)
N=594,647
revocation incidence
(binary probit)
N=27,530
0.0004 (3.5)
n.s.
X-share
0.0270 (22.9)
0.0411 (3.5)
Y-share
0.0088 (6.8)
n.s.
-0.0394 (14.4)
n.s.
0.0070 (60.9)
-0.0066 (7.3)
X-share
0.0137 (8.7)
n.s.
Y-share
0.0125 (6.5)
n.s.
share self citations
-0.0128 (6.6)
-0.0463 (2.4)
2nd order citations
0.0003 (16.0)
n.s.
NPL references
-0.0009 (3.3)
n.s.
technical field, year
dummies, add. variables
dto. plus additional
variables
0.0693
0.0269
references
share self references
citations
control variables
pseudo-R-squared
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
35
Interpreting Citation Indicators
Source: Harhoff/Wagner (2005) - authors’ computations based
on EPO/OECD citation database.
Citation Indicators in Process Duration Equations
Variable
POOLED
OUTCOME
GRANT
GRANT
REFUSAL
DELAY
DELAY
n.s.
DELAY
X-share
DELAY
DELAY
ACCEL.
DELAY
GENERALITY
DELAY
n.s.
DELAY
n.s.
ORIGINALITY
DELAY
DELAY
DELAY
DELAY
citations
DELAY
ACCEL.
DELAY
DELAY
NPL references
DELAY
DELAY
DELAY
DELAY
references
control variables
claims, designated countries, workload at EPO, prediction error,
PCT
Results from competing-risk proportional hazard models, N=215,259
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
Summary of Results (1/2)


value equation

only citation counts matter, composition not relevant

2nd order citations appear to be relevant
opposition incidence equation

not number, but composition of references matters considerably

number of citations proxies for value

X-citations indicate anticipated interaction between opponents

2nd-order citations have some predictive power

self-citations and references have a strong negative effect on
opposition incidence – appear to indicate idiosyncratic research
paths
36
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
Summary of Results (2/2)


revocation incidence equation

weak predictive power (as should be!)

residual information still in X-references and 1st-order citations
duration equations



heterogeneous effects on different outcomes – pointing to
endogeneous applicant behavior
examinination of X-classified references apparently more timeconsuming
signal of search report to applicant
37
1st EPIP Workshop – Milano
European Patent Citations – How to count and how to interpret them?
Caveats and More Plans

Results for 2nd-order citations warrant attention (and
more experimentation). Currently in development:
second-order references.

Results for self-referencing surprisingly strong – should
be extended using a complete coding of patent holders
(for non-EP references).

Still far cry from a structural model of value, legal
robustness and other latent variables – but some
progress.

Inclusion of more refined NPL indicators to commence
shortly.
38