Transcript Document
Scientific Web Intelligence
The Birth of a New Research Field
Mike Thelwall
Statistical Cybermetrics Research Group
University of Wolverhampton, UK
The Problem
To map patterns of communication between
researchers in a country based upon university
web sites
Patterns of communication are also mapped based
upon journal citations or journal title words
Provides useful information about the structure and
evolution of research fields
Can identify previously unknown field connections
Web analysis could illustrate wider and more
current patterns
Part 1: Hyperlink Analysis
Citation counts are known to be reasonable
indicators of research quality but is the same true
for inlink counts?
Counts of links to universities within a country can
correlate significantly with measures of research
productivity
The significance of this result is in giving
‘permission’ to investigate the use of inter-university
links for researching scholarly communication
Links to UK universities against
their research productivity
The reason for the
strong correlation is
the quantity of Web
publication, not its
quality
This is different to
citation analysis
Most links are only loosely
related to research
90% of links between UK university sites have
some connection with scholarly activity,
including teaching and research
But less than 1% are equivalent to citations
So link counts do not measure research
dissemination but are more a natural by-product
of scholarly activity
Cannot use link counts to assess research
Can use link counts to track an aspect of
communication
Some Hyperlink Patterns
Patterns in counts of links between
university Web sites
Universities tend to link to
neighbours
Universities
cluster
geographically
Language is a factor in
international interlinking
English the dominant language for Web sites in
the Western EU
In a typical country, 50% of pages are in the
national language(s) and 50% in English
Non-English speaking extensively interlink in
English
{Research with Rong Tang & Liz Price}
Can map patterns of international
communication
Counts of links
between EU
universities in
Swedish are
represented by
arrow thickness.
Counts of
links between
EU
universities in
French are
represented
by arrow
thickness.
Which
language???
Which
language???
Disciplinary Patterns
Links and subject areas
Linking patterns vary enormously
by discipline
No evidence of a significant geographic trend
Disciplinary differences in the extent of
interlinking: e.g., history Web use is very low,
Chemistry is very high
Individual research projects can have an
enormous impact upon individual departments
E.g. Arts web sites are often for specific exhibitions
or for digital media projects
Links not frequent enough to reliably reveal
patterns of interdiscipliniarity
Stretching links: colinks,
couplings
For the UK academic Web, about 42% of
domains connected by links alone host similar
disciplines, and about 43% connected by links,
colinks and couplings
But over 100 times more domains are colinked or
coupled than are directly linked
Links in any form are less than 50% reliable as
indicators of subject similarity
Text Mining Approaches
Hyperlinks are not frequent enough or
systematic enough to yield reliable
evidence of connections at a low level
Alternative is to look for words in common
E.g., the frequency with which words
associated with psychology are found in
computer science web sites
Clustering web pages/sites based upon word
occurrences (c.f. journal title word clustering)
Text clustering – early results
Word
business
marketing
finance
economics
banking
management
sitemap
accounting
auckland
Frequency
59806
16987
8300
15509
2010
76754
2419
8162
55604
Domains
408
242
217
261
123
465
62
197
414
Importance
0.005902
0.004476
0.002826
0.002726
0.002717
0.002569
0.001874
0.001613
0.001546
Which discipline?
Word
template
assignment
copyright
changed
sst
semester
systems
lab
comments
Frequency
3356
15610
16780
7172
199
18364
44521
7709
16931
Domains
147
240
278
284
33
319
451
261
354
Importance
0.001355
0.001186
0.001166
0.001152
0.001071
0.001009
0.000949
0.000861
0.000842
Scientific Web Intelligence
Standard hyperlink and text mining
approaches are inadequate for identifying
low level inter-subject connections
Either extensive human intervention or
artificial intelligence techniques needed to
extract useful information
Hence the founding of Scientific Web
Intelligence
Scientific Web Intelligence
Objective: to combine techniques from
Information Science, Web Mining and Web
Intelligence to extract patterns of
interdiscipliniarity from university Web
sites
Opportunities
Develop graphical techniques to display the
data
Develop AI/Data Mining techniques to
analyse the data
Extend the techniques to other domains –
e.g. business web intelligence