Transcript Microsoft PowerPoint - the NCRM EPrints Repository
Virtual Knowledge Studio (VKS)
What is Webometrics?
Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Information Studies
1. Introduction □ □ Webometrics is concerned with gathering data on and measuring aspects of the Web □ □ □ □ □ □ web sites web pages hyperlinks web search engine results YouTube video commenter networks MySpace Friend networks …for very varied social science purposes
New problems: Web-based phenomena □ Webometrics can be applied to understanding web-based phenomena □ Why do web sites interlink?
□ Which web sites interlink?
□ What interlinking patterns exist?
□ What topics are frequently blogged about?
Old problems: Offline phenomena reflected online □ Some offline phenomena have measurable online reflections □ International communication □ Inter-university collaboration □ University-business collaboration □ The impact or spread of ideas □ Public opinion
2. Examples Blog searching - blogpulse.com
Example: Identifying and tracking public science concerns in blogs Over 100,000 Blogs and other sources tracked daily via RSS feeds Objective: to identify and track public concerns about science E.g., “Schiavo” identified and tracked as potential public science concern
Example: The online impact of research groups (NetReAct)
Austria Switzerland Geopolitical connected Belgium Germany Example: Links between EU universities Norway France UK Spain Finland Normalised linking, smallest countries removed Sweden Poland Italy NL
International biofuels research network
Example: MySpace age profiles
percentage of profiles containing swearing US males 16-19 US females 16-19 UK males 16-19 moderate strong very strong sample size 10% 11%
33%
UK females 16-19
18% 47% 38%
33% 38% 2% 2%
8% 3%
1,530 1,287 171 130 (typical sample size 20-148 for non-web swearing research)
emphatic adverb/adjective OR adverbial booster OR premodifying intensifying negative adjective (36% of swearing) □ □ □ □ □ □ □ □ and we r guna go to town again n make a ryt fuckin nyt of it again lol see look i'm fucking commenting u back lol and stop fucking tickleing me!! Thanks for the party last night it was fucking good and you are great hosts. That 50's rock and roll weekender was fucking mint!
Fuckin my space, my arse 1/2 d ppl cudnt even speak fuckin english!
yeah so me and sarah broke up and everythings fucking shit
YouTube – Video poster ages
YouTube friend network
Online impact - Keywords in web pages mentioning IWRM
Data Gathering/Processing Tools □ Blogpulse.com – blog network diagrams □ LexiURL Searcher – links, web text, YouTube, Flickr, Technorati □ Issue Crawler, Google TouchGraph links
Discussion points for online data □ □ Validity – is the underlying meaning of the text/video/picture readily apparent to the researcher?
□ Possibly not to any great degree for teenagers’ MySpace comments or very personal YouTube videos Reliability –are search engines accurate/good at returning the correct results?
□ Google blog search shows unreliability – very variable over time □ Researchers can triangulate different similar search engines or over time to test reliability
Discussion points for online data □ Coverage – to what extent is all the phenomena of interest covered by the source (e.g., search engine) used?
□ Sample bias – are certain types of people over-represented? (e.g., the more literate, the more vocal, the more politically active, youth, educated, creative types…)
Summary
□ The web contains a wide variety of interesting web and “web 2.0” content posted by many different people in many different formats □ Webometric methods can give insights into this data
Books
□ □ □ Thelwall, M. (2009). Introduction to
webometrics: Quantitative web research for
the social sciences. New York: Morgan & Claypool.
Rogers, R. (2005). Information politics on the Web. Massachusetts: MIT Press.
http://lexiurl.wlv.ac.uk http://webometrics.wlv.ac.uk
http://www.issuecrawler.net
Important considerations
□ Data accuracy □ Data cleaning □ Context to help interpret results □ Report results carefully
Example: Analysis of the accuracy of search engine results Live Search results analysis