Transcript sews3 7462

Lotkaian Informetrics
and applications to
social networks
L. Egghe
Chief Librarian Hasselt University
Professor Antwerp University
Editor-in-Chief “Journal of Informetrics”
[email protected]
1-dimensional informetrics









# authors in a field
# journals in a field
# articles in a field
# references (or citations) in a field
# borrowings in a library
# websites, hosts, …
# web citations to a paper
# in- (or out-) links to/from a website
# downloads of an article
Growth
Exponential growth
All “new” fields grow exponentially
Otherwise there is S-shaped growth.
30
25
20
15
10
5
0
1992
1993
1994
1995
1996
1997
1998
1999
# web servers versus time
2000
2001
2- dimensional informetrics



# authors in a field (sources)
# articles in a field (items)
+ indicating which author has written which
papers
S = Set of sources
I = set of items
IPP = Information Production Process
Examples of IPPs
S
F
I
Authors
Articles
Journals
Articles
Articles
Citations (to/from)
Books
Borrowings
Words (= types)
Use of words in a text (= tokens)
Web sites
Hyperlinks (in-/out-)
Web sites
Web pages
Cities/villages
Inhabitants
Employees
Their production
Employees
Their salaries
…
1.
= size-frequency function:
for n = 1,2,3,…
= # sources with n items
2.
= rank-frequency function:
for r = 1,2,3,…
= # items in the source on rank r
(sources are ranked in decreasing order
of number of items they have)
Continuous model
Source densities
Item densities
Lotkaian Informetrics
The law of Lotka and the law of Zipf
Lotka (1926)
. The value
(see further).
is a turning point in informetrics
Lotka’s law is equivalent with Zipf’s law :
Linguistics
Zipf’s law in econometrics is called
Pareto’s law
Dependence of G on
. Existence of a Groos droop if
.
log-log scale
= decreasing straight line with slope =
Rank-frequency distributions for websites
The scale-free property
f : scale-free
such that
Theorem (i)⇔(ii):
(i) f is continuous, decreasing and
scale-free
(ii) f is a decreasing power function:
such that
i.e. Lotka’s law
Explanation of Lotka’s law based on
exponential growth of sources and items
(Naranan (1970)) and an interpretation of
Lotkaian IPPs as self-similar fractals
(Egghe (2005))
Fractals and fractal dimension
1.
Divide a line piece into 3 equal parts
⇒ we need 3=31 line pieces of this length
to cover the original line piece
:3 ⇒ need 3=31 ⇒ dim=1
2. Divide the sides of a square into 3 equal
parts ⇒ we need 9=32 squares with this
side length to cover the original square
:3 ⇒ need 9=32 ⇒ dim=2
3. The same for a cube
:3 ⇒ need 27=33 ⇒ dim=3
Construction of the triadic Koch curve
4.
For the triadic Koch curve
:3 ⇒ need 4=3D ⇒ dim=D
with
The Koch curve is a proper fractal with fractal
dimension
= Complexity theory
= Fractal theory
Mandelbrot
Naranan (Nature, 1970)
Theorem:
(i) The number of sources grows exponentially
in time t:
(ii) The number of items in each source grows
exponentially in time
(iii) The growth rate in (ii) is the same for every
source: (ii) and (iii) together imply a fixed
exponential function
for the number of items in each source at
time t.
Then this IPP is Lotkaian, i.e. the law of
Lotka applies: if f(p) denotes the number
of sources with p items, we have
where
Egghe (2005) (Book and JASIST)
(i)
The number of line pieces grows
exponentially in time t, here
proportional with 4t
(ii),(iii) 1/length of each line piece grows
exponentially in time t and with the
same growth rate 3. Hence we have
growth proportional with 3t.
Rephrased in terms of informetrics:
a (Lotkaian) IPP is a self-similar fractal and its
fractal dimension is given by the logarithm of
the growth rate of the sources, divided by the
logarithm of the growth rate of the items.
(which can be > or < 1). Hence, the exponent
in Lotka’s law satisfies the important relation:
This result was earlier seen by Mandelbrot but
only in the context of (artificial) random texts
(hence in linguistics).
Further applications of
Lotkaian Informetrics

Concentration theory (inequality theory):
Lorenz curves (cf. econometrics).
Egghe (2005) (Book, Chapter IV).

Fractional modelling of authorship (case of
multi-authored articles): determine
= # authors with
articles
(fractional counting: an author in an
m-authored paper receives a score ).
Theoretical and experimental fractional frequency distributions (case of i=4).

Dynamics of Lotkaian IPPs, described via
transformations on the sources and on the
items: includes the description of dynamics
of networks.
Relations with 3-dimensional informetrics:
See new journal:
L. Egghe. General evolutionary theory of
IPPs and applications to the evolution of
networks. Journal of Informetrics 1(2), 115122, 2007
Item transformation
Source transformation
New rank-frequency function
Theorem: New size-frequency function
where
Case
is example of “linear 3
dimensional informetrics”
Sources1 → Items1 = Sources2 → Items2
Examples:
1. Webpages → hyperlinks → use of
hyperlinks
2. Library subject categories → books
→ borrowings
See further.
Back to the general case.
Power law transformations in Lotkaian
IPPs
Theorem:
is only dependent on b/c due to the
scale-free nature of Lotkaian systems.
Corollary:
With this, one can study the evolution of an
IPP, e.g. a part of WWW: V. Cothey (2007):
confirms theory except in one case where
non-Lotkaian evolution is found, probably due
to “automatic” creation of web pages
(deviation from a social network).
Further application:
IPPs without low productive sources
(Egghe and Rousseau (2006))
Take
: sources remain but they
grow in number of items:
Now
and (since
)
Evolution: decreasing Lotka exponent
and no low productive sources
Examples
Country sizes: data from
www.gazetteer.de (July 10, 2005): 237
countries : = 1.69 (best fit)
2. Municipalities in Malta (1997 data): 67
municipalities: = 1.12 (best fit)
3. Database sizes: on the topic “fuzzy set
theory” (20 largest databases on this
topic) (Hood and Wilson (2003)):
= 1.09 (best fit)
4. Unique documents in databases (20
databases above): =1.33 (best fit).
1.
Application of Lotka’s law to the
modelling of the cumulative first-citation
distribution
i.e.
the distribution over time at which an
article receives its first citation.

The time t1 at which an article
receives its first citation is an
important indicator of the visibility of
research.

At t1 the article switches its status
from “unused” to “used”.
 t1
is a measure of immediacy but, of
course, different from the immediacy
index (Thomson Scientific).
The distribution of t1 over a group of
articles is the topic of the present study.
We will study the cumulative first-citation
distribution
= cumulative fraction of all
papers that have, at t1, at
least 1 citation.

Rousseau (1994) uses two different
differential equations to model two
types of graphs: a concave one and an
S-shaped one. These equations are
not explained and are not linked to any
informetric distribution.

In Egghe (2000), I use only 2
elementary informetric tools :
= the density function of
citations to an article, t time
after its publication
(exponential,
),
= the density function of the
number of papers with A
citations in total (Lotka,
),
(only ever cited papers
are used here).

Normalizing to distributions :
becomes
A citations in total
becomes
for an article with
but we will use
the fraction of ever cited articles, in
order to include also the never cited
articles.
Theorem :


concave if
S-shaped if
, hence explaining both shapes in one model.
Note the turning point of
.
Proof : A first citation is received if
(*)
⇒ Cumulative fraction of all articles that
are already cited at time t1:
(**)
⇒ (*) into (**) yields
Motylev (1981)
fit :
Rousseau (1994)
JACS to JACS data of Rousseau
Time-unit = 2 weeks, 4-year period
fit :