Transcript Webometrics

The July 2011 Webometrics
repository ranking
Isidro F. Aguillo
The views expressed in this presentation are those of the presenter,
not necessarily those of the CCHS or the CSIC.
repositories.webometrics.info
Agenda
• Introduction to the Cybermetrics Lab
• Webometrics, an emerging discipline
• Webometrics, OA and repositories
• Ranking Web
– Preliminary results July 2011
• Final comments
• Open debate
repositories.webometrics.info
2
The Cybermetrics Lab
• Scholars making scientific research
– Researchers belonging to the National Research Council (CSIC)
– The largest Spanish research public organization
– Recognised by our peers
– 15 years experience in quantitative analysis and evaluation of scholar communication and
academic institutions
– Papers in referred scientific journals, contributions to international conferences, reports to
governmental bodies
– Funded by public resources
– International cooperation projects funded by European Commission
• Research Agenda
– Promote Open Access initiatives
– Global coverage, including developing countries
– Building Cybermetrics/Webometrics as an emerging discipline
repositories.webometrics.info
3
Webometrics
4
Activity
Impact
Size
Number of webpages, rich files,
academic papers, media files,
languages, age
Visibility
Number of external inlinks, Web impact
factor, g-factor, PageRank
Web 2.0
Social networks presence, blogmetrics,
wikimetrics
Networks
Inter-linking, co-linking, clusters,
similarity, network measurements
Search Engines
Size, geographical coverage,
languages, biases, algorithms, updating
frequency, operators
Mentions
Names of authors, papers, institutions,
journals, hot topics
Position
Analytics (usage)
Presence
Presence in search engines and
directories
Popularity
TrafficRank
Position
Rank in search results
Visits, visitors
Number of visits, visitors, geographical
and temporal distribution
Criteria
Frequency, presence in selected html
tags, title, URL, bad practices
Behavior
Patterns of visits, referrers, referrals
repositories.webometrics.info
Webometrics, OA and repositories
• Webometrics requires public Web
– Direct crawling
– OA Electronic Journals
– Repositories
– Indirect crawling: Search engines as proxies
– Link analysis
– Mention analysis
• Analytics
– Usage
– from log files
– Google Analytics or similar
• OpenAIRE WP8
– Combining Bibliometrics, Webometrics and Analytics indicators
repositories.webometrics.info
5
A few objectives and some problems
• Priorities in OA initiatives
– Populate the repositories
– Obtaining mandates
– Applying standards
– Increase visibility
• Intellectual property issues
– Authors not transferring full rights to editors
– Participation in repositories intended for:
– Increasing the number of citations
– Improving author (and institutional) prestige
– But … current OA practices means some rights are being lost
– At the level of repository
– At the level of institution
repositories.webometrics.info
6
Transfer of “institutional” rights
7
• Research results are the most important assets of the universities, but in
a few cases the repository is outside the institutional webdomain
• HAL Sciences de l'Homme et de la Société
http://halshs.archives-ouvertes.fr/
• White Rose Consortium ePrints Repository
http://eprints.whiterose.ac.uk/
• University of Arizona's Campus Repository
http://arizona.openrepository.com/
• Paris Institute of Technology Pastel Theses
http://pastel.archives-ouvertes.fr/
• Universidad de Chile Cybertesis
http://www.cybertesis.cl/
• Open Access Server Woods Hole
http://darchive.mblwhoilibrary.org/
• TeesRep Teesside University
http://tees.openrepository.com/
• Auckland Univ Technology ScholarlyCommons
http://aut.researchgateway.ac.nz/
• University of Wolverhampton Digital Repository
http://wlv.openrepository.com/
• HAL Ecole Polytechnique
repositories.webometrics.info
http://hal-polytechnique.archives-ouvertes.fr/
A different point of view
8
• Regarding naming
– Institutional repository URL should be in the institutional web domain
– The relevant item is the full text file not the webpage of the record
– It is recommended that the URL of the file includes:
– Institutional webdomain
– Last name of (main) author
– Explicit file type (something.pdf)
• Regarding linking
– The item URL (not the record) should be easily linkable (citable). Short, no complex
or long numerical codes
– Nothing against purls but not as main linking target
– http://dx.doi.org/
– http://hdl.handle.net/
repositories.webometrics.info
Recommended URL
http://www.openstarts.units.it/dspace/bitstream/10077/2267/1/13.pdf
repositories.webometrics.info
9
Discrepancies in record’s numbers
http://dare.uva.nl/document/131441
repositories.webometrics.info
10
DOI recognise editor not author
11
http://digitalcommons.bolton.ac.uk/cmri_journalspr/48/
repositories.webometrics.info
Complex URLs
12
http://doras.dcu.ie/15962/4/OPTICS-S-08-01522.pdf
repositories.webometrics.info
http://doras.dcu.ie/15962/
13
Ranking Web of Repositories
(July 2011)
July 2011 edition
• Repositories with their own domain or subdomain
– 1,222 repositories
– Including 1,154 institutional repositories
– Plus 49 “portals”
• Major changes from previous editions
– Sources
– Exalead data no longer collected
– Yahoo Site Explorer instead of Yahoo Search
– Only for Size
– New formats added: docx, pptx, eps
– Total number of rich files excluded from Size count
– Scholar full count (50%) + Scholar 2006-2010 (50%)
repositories.webometrics.info
14
Methodology
Source
Weight
Indicator
site2
20%
SIZE
Google
Yahoo
Bing
filetype2
(pdf, doc, docx, ppt, pptx,
ps, eps)
15%
RICH
FILES
Google
Scholar
site
(al least summaries)
50% total+50%(2006-10)
15%
SCHOLAR
50%
VISIBILITY
Google
Yahoo SE1
Bing
Yahoo SE1
1
Operator
15
linkdomain
Normalization
Lognormalization3
Yahoo is using Bing database, except for Site Explorer (SE) and a few national mirrors (till mid 2012)
of rich files excluded from the global size count
3 ln(a +1)/ln(a
i
max+1)
2 Number
repositories.webometrics.info
Log-normalization
16
SCORE
WR
log-norm
z-score
QS
ARWU
HEEACT
CWTS
RANK
repositories.webometrics.info
17
Top
Repositories
repositories.webometrics.info
18
Top
Institutional
Repositories
repositories.webometrics.info
19
Top
“Portals”
repositories.webometrics.info
Final comments
20
• Providers and end-users of repositories are scientists and their
institutions
– For them papers are the most important asset they produce
– Granting increased access and visibility is universally acknowledged
– But some practices are dislodging deposited material from authorships, making
difficult to cite (link) the papers and penalizing the “prestige” of the scientists and
their academic employers
• Ranking Web of Repositories intends to promote OA initiatives and
support best practices
– Current classification is still not reflecting the repositories diversity, but further
efforts will be done in the future
– Methodology is also evolving, but overall results are not changing abruptly among
consecutive editions
repositories.webometrics.info
Thank you!
21
Questions?
 [email protected]
 repositories.webometrics.info
repositories.webometrics.info