Game Theoretic Modeling of Online Knowledge Creation in

Download Report

Transcript Game Theoretic Modeling of Online Knowledge Creation in

Game Theoretic Modeling of Online Knowledge Creation in Wikipedia

Narayan B. Mandayam

WINLAB

Collaborators: S. Anand*, O. Arazy † , and O. Nov* * NYU Poly †University of Haifa & Alberta School of Business Acknowledgment: NAKFI

DNA of Silicon Brains?

  

Peer Production

  Large number of individuals co-create knowledge Examples- Citizen science projects, Citizen journalism, Open source software (Linux Kernel Development), Wikipedia

From Individual Informed Brains to a Society Scale Informed Brain

 Uncovering the DNA of Social Knowledge Creation

Inspired by similar efforts in the life sciences, such as the Human Genome Project, we seek to explore

  basic patterns or “building blocks” of the process through which individual human brains co-create society-scale silicon brains the relationship between sequential patterns of these building blocks and attributes of the resulting silicon brains 2 WINLAB

Wikipedia as a Silicon Brain

 Among the most popular information repositories on the web   18B page views (500M unique visitors/month) and counting 6 th most popular after Google, Microsoft, Facebook and Yahoo  The largest “collaborative effort” in human history    Content: ~34M articles (~5M in English) Editors: ~23M editors/contributors (130K active in last month, 30K with at least 5 edits) Administrators: ~1400  Wikipedia is a silicon-based “brain” - a large scale knowledge repository created collaboratively   people contribute their knowledge, expertise and energy a common pool (information good) accessible to everybody 3 WINLAB

DNA of Wikipedia

   Wikipedia is built on Wiki technology  A web-based collaborative authoring tool A contributor can add content, add to or delete existing content  Similar to a Google doc or multi-user editing with ‘Track Changes’ in MS-Word Each “edit” made by a user, creates a new version of the wiki page  All versions are tracked in the “History” page (adopting version control principles from software development) WikiDNA 4 WINLAB

DNA of Wikipedia

5 WINLAB

Characteristics of Collaboration in Wikipedia

 Contributors Motivations    Reputation enhancement, Ego, Express one’s opinions Contributors’ goal is to increase the content that they “own” “Competitive” in nature, “Refactoring” of others’ contributions  Comprehensive Array of Governance Mechanisms     Automated tools Social norms and procedures “5 pillars” Quality management, Conflict Management, Status management  Act to ensure information credibility, unintentional biases, intentional biases (“Wiki lobbying”), vandalism Product Quality  Information Quality, Accuracy, Completeness, Objectivity, Representation 6 WINLAB

Modeling Contributor Activity and Governance Mechanisms in Wikipedia

  Noncooperative Game for Modeling Contributor Interactions Stackelberg Game for Modeling Governance Mechanism Interactions with Contributor Actions   Derive insights from model Validate model with data from Wikipedia 7 WINLAB

Non-cooperative game amongst contributors

   Contributors (users) are driven by selfish motives  Each user wishes to maximize her fractional ownership in the page Let 𝑥 𝑖  be the content owned by user 𝑖 in the page Could be the number of sentences owned by the user in the page The “utility” of user user 𝑖 𝑖 in the page, i.e., would be the fractional ownership of    Each user expends an “effort” 𝛽 i to make a unit contribution Governance mechanisms impose an additional cost on users  “ 𝑡 ” is the unit incremental effort or governance factor Define a net utility or value function 𝑛 𝑖  Users noncooperatively determine 𝑥 i = 𝑢 𝑖 − (𝛽 that will maximize 𝑛 𝑖 𝑖 + 𝑡 )𝑥 𝑖 ∀ 𝑖 8 WINLAB

Nash equilibrium of the noncooperative game

 The net utility, 𝑛 𝑖,  is a concave function of 𝑥 𝑖 The non-cooperative game has a unique Nash equilibrium (NE)  NE can be obtained from the first order necessary conditions  NE solved by a system of non-linear equations  𝛼 𝑖 = 1/ (𝑡 + 𝛽 𝑖 )  Rewrite the system of equations as  is the diagonal matrix  (.) T is the transpose of a vector or matrix, 1 is the vector of all ones, 0 is a vector of all zeroes, I is the identity matrix 9 WINLAB

Solution and Implications of Nash equilibrium

   Write x=Py  the columns of

P

are the orthonormal eigen vectors of

11

T Orthogonal transformation yields N quadratic equations   Equation 𝑘 Solve for

y

depends on 𝑦 𝑘 through 𝑦 𝑁 by backward substitution Use x=Py to obtain the NE and the fractional ownership at NE as  Implications of Nash Equilibrium (for large N)    For a contributor to have “non-zero” ownership , 𝛽 𝑖 < 𝐸[𝛽] Only contributors that expend less than average effort survive edit wars and governance mechanisms Governance: strives for fair and balanced viewpoint?

10 WINLAB

How to measure quantities in model from data?

Analyzing history of each Wiki page yields the following

     List of contributors ( 𝑁 ) The total number of edits made by each user (𝑇 𝑖 ) The final content owned by each user ( 𝑠 𝑖 ) (in sentences) The size of all edits made by each contributor, Levenshtein distance) 𝑒 𝑖 (measured as a Levenshtein distance is a string metric for measuring the difference between 2 sequences.  Informally, the Levenshtein distance between two words is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one word into the other.

  Effort per unit contribution 𝛽 𝑖 is computed as 𝑒 𝑖 𝑇 𝑖 Governance factor per unit contribution from administrator and system edits 𝑡 is computed similarly 11 WINLAB

How to model governance?

  Recall “5 pillars” emphasizing unbiasedness  Ensuring no one group takes over ownership of content 

Wikipedia is free content that anyone can edit, use, modify,

and distribute: Since all editors freely license their work to the public, no editor owns an article and any contributions can and will be mercilessly edited and redistributed.

Recall “Product Quality” goals:  Ensuring information quality, accuracy, completeness, objectivity, representation  How do we measure quality?

 How does the governance factor “ 𝑡 ” get determined?

 What does Wikipedia data tell us?

12 WINLAB

Sample Data and Quality Assessment

  Representative sample of 89 Wikipedia articles used in (Arazy et al., 2011, 2013)   Stratified sampling by topic (e.g., culture, geography) For each article, details of every edit made (and the contributor making it) from article’s inception to January 2007; the average article: # of edits = 91 # of unique contributors = 49 Set includes measures of information credibility (7-point Likert scale)    5-6 students independently analyze each page and produce detailed reports by comparing to external resources Senior university librarians independently analyze articles (employ students’ report + other sources) and rate: information quality, accuracy, completeness, objectivity, and representation Senior librarians sit together, argue differences, and arrive on consensus 13 WINLAB

Wiki pages for Quality Assessment (1/2)

 1956 Trans-Canada Air Lines accident, Abreu Camp, Alcohol 120% Alpha Iota Omicron Ancient DNA Anime Vegas Antonio Inoki vs Renzo Gracie Arms sales to Iraq 1973-1990 Art Finley Ashton, Illinois Australian contribution to the 1991 Gulf War Battle of Magdhaba Belle Glade, Florida Bess of Hardwick Biphenyl Blue and white (porcelain) BMW 3 Series Briccriu Cameron Bright Canadian federal election, 1930 Chandrashekarendra Saraswati Chikkamagaluru district Commonwealth Scientific and Industrial Research Organisation Construction Core mantle boundary Dhol Dianthus  Dragoon Sniper Rifle Dzhezkazgan Edouard Pingret Electronic lock Eleonore Duplay Empire Theatres Fawsley Ferdinand III, Grand Duke of Tuscany Fiat 1300/1500 Flying car Frunzik Mkrtchyan Gneisenau class battlecruiser Graz Great Dismal Swamp Greenup, Illinois Hero of the Soviet Union High pressure area High-end audio cables Ier arrondissement In a Fix Irving Kanarek Jacobi identity Jay Fiedler JE Khopesh 14 WINLAB

Wiki pages for Quality Assessment (2/2)

 K-pop Ludwigsfelde Magnetohydrodynamic drive Medieval churches of York Meow Wars Merrimac Ferry Mr. Potato Head Multitrack recording Myles Brand Newtonian telescope Nueces massacre Operation Osprey Orange Revolution Otis, Colorado Perfect game (bowling) Peter Jackson Philippe, comte de Paris Pine Lawn, Missouri Pledge of Allegiance Poisson algebra Politics of Alberta  R. Lee Ermey RBC Center Sandkings (novelette) Self-service password reset Shoulder strap Smelly Cat Spoiler effect Stephen Goldsmith Student unionism in Australia System Requirements Specification The Bad Plus The Real World: Austin Thrombosis Timesplitters Tippmann A-5 Treaty of Mutual Cooperation and Security between the United States and Japan Uncle Vanya Unidentified submerged object Urbanization in Africa WDC 65C02 Well-Tempered Clavier William Holborne Winged bean 15 WINLAB

Data: Governance, Quality & Unbiasedness

Quality vs Difference between maximum and minimum fractional ownership  Average quality is correlated to reducing difference between max and min fractional ownerships (bias) “5 Pillars” (ensuring no one group takes over ownership of content) 16 WINLAB

A Stackelberg Model for Wikipedia

Governance Model Contributor Model   Governance factor “ 𝑡 ” obtained by minimizing the difference between the maximum and minimum fractional ownerships (bias) Optimization is constrained due to desire for “limited” governance 17 WINLAB

Wikipedia Articles for Empirical Data Validation

 1000 pages from January 2012 dump of English Wikipedia  219,811 distinct contributors  Lifespan: 129 – 4078 days, Avg. duration: 2681 days (~7.35 years)  25 Topical Categories: Agriculture, Arts, Business, Chronology, Concepts, Culture, Education, Environment, Geography, Health, History, Humanities, Humans, Language, Law, Life, Mathematics, Medicine, Nature, People, Politics, Science, Society, Sports, Technology  4 Maturity Strata (Number of Revisions): 1-10, 11-100, 101-1000, 1000+  40 articles in each topical category , 250 articles in each maturity stratum 18 WINLAB

Validating Analysis against Data-1

  Values of 𝛽 𝑖 are used to obtain the equilibrium ownership 𝑥 𝑖 theoretic analysis Fractional ownership from analysis and data are compared from the game Abreu Camp Paris   Only contributors with 𝛽 𝑖 < 𝐸[𝛽] retain non-zero content ownership Similar trend observed for all pages 19 WINLAB

Validating Analysis against Data -2

 Fractional ownership from analysis and data in order of decreasing 𝛽 𝑖  Only contributors with 𝛽 𝑖 < 𝐸[𝛽] retain non-zero content ownership 20 WINLAB

Validating Model: Estimation Error & Significance

 Can we perform a linear fit to match the fractional ownership obtained from data and that from analysis?

 Let be the vector of ownership obtained from Analysis    Let be the vector of ownership obtained from Data Estimate using 𝜌 and 𝛿 that results in least mean square error Use 300 pages as training data and estimate 𝜌 and 𝛿 , and error for other pages Error 11-15%, Significance Test p =0.03

21 WINLAB

Conclusions and Future Directions-1

  Mathematical Models for Collaborative Knowledge Creation  Wikipedia as an example of a silicon brain Developed a Game Theoretic Model for Knowledge Creation in Wikipedia   Non cooperative game for contributor interactions   Users noncooperatively maximize their content ownership Effort measured as function of Levenshtein distance of edits Stackelberg model for influence of governance mechanisms on editors   Governance factor is the implicit outcome of emphasis on “5 pillars” and “product quality” Reducing difference between maximum and minimum fractional ownership subject to objectivity/quality constraints 22 WINLAB

Conclusions and Future Directions-2

    Nash Equilibrium Implications   Only users who expend less than “average effort” have non zero content ownership Seems counterintuitive but a consequence of governance Unintended consequences of governance: Wiki-bureaucracy, Difficult to navigate rules, Driving away contributors  Model can offer guidelines on how to moderate governance Dynamic models that track sequential actions and interactions   Dynamic games Evolutionary game theory Model improvements and validation with larger data sets  Improved modeling/analysis of governance; Other user metrics 23 WINLAB

References-1

          Anand S., Arazy O., Mandayam N.B., Nov O. 2013, A Game Theoretic Analysis of Wikipedia, Proceedings of the 4th International Conference on Decision and Game Theory for Security (GameSec 2013), LNCS 8252, pp. 29-44, Springer 2013 Arazy O., Stroulia E., Ruecker S., Arias C., Fiorentino C., Ganev V., and Yau T., 2010, Recognizing Contributions in Wikis: Authorship Categories, Algorithms, and Visualizations, Journal of the American Society for Information Science and Technology (JASIST), 61(6), pp. 1166-1179. Arazy O. and Kopak R., 2011, On the Measurability of Information Quality, Journal of the American Society for Information Science and Technology (JASIST), 62(1), pp. 89-99.

Arazy, O., Nov, O., Patterson, R., & Yeo, L. (2011). Information quality in Wikipedia: The effects of group composition and task conflict. Journal of Management Information Systems, 27(4), 71-98. Arazy, O., Yeo, L., & Nov, O. (2013). Stay on the Wikipedia task: When task‐related disagreements slip into personal and procedural conflicts. Journal of the American Society for Information Science and Technology. Arazy O., Nov O. and Ortega F., The [Wikipedia] World is Not Flat: on the organizational structure of online production communities,

the 22nd European Conference on Information Systems (ECIS’2014)

, Tel-Aviv, Israel, June 9-11, 2014. Auray, N., 2012, Online Communities and Governance Mechanisms, in Brousseau E., Marzouki M., and Adel C. (eds), Governance, Regulation and Powers on the Internet, pp. 211-231 Butler, B., Joyce, E., & Pike, J. (2008, April). Don't look now, but we've created a bureaucracy: the nature and roles of policies and rules in Wikipedia. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 1101-1110). ACM. Butler, B., Sproull, S., Kiesler, S. & Kraut, R. E. 2007. Community effort in online communities: who does the work and why. In: WEISBAND, S. (ed.) Leadership at a Distance Research in Technologically Supported Work. London UK: Lawrence Erlbaum Associates Forte, A., Larco, V., & Bruckman, A. (2009). Decentralization in Wikipedia governance. Journal of Management Information Systems, 26(1), 49-72. 24 WINLAB

References-2

             Halfaker, A., Geiger, R. S., Morgan, J. T., & Riedl, J. (2013). The rise and decline of an open collaboration system how wikipedia’s reaction to popularity is causing its decline. American Behavioral Scientist, 57(5), 664-688. Hilligoss, B., & Rieh, S.Y. (2008). Developing a unifying framework of credibility assessment: Construct, heuristics, and interaction in context. Information Processing & Management, 44(4), 1467–1484 Kittur, A., Suh, B., Pendleton, B. A., & Chi, E. H. (2007, April). He says, she says: conflict and coordination in Wikipedia. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 453-462). ACM.

Krieger, M., Stark, E. M., & Klemmer, S. R. (2009). Coordinating tasks on the commons: designing for personal goals, expertise and serendipity. In CHI ’09: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 1485-1494). New York, NY: Kriplean, T., Beschastnikh, I., McDonald, D. W., & Golder, S. A. (2007, November). Community, consensus, coercion, control: cs* w or how policy mediates mass participation. In Proceedings of the 2007 international ACM conference on Supporting group work (pp. 167-176). ACM. Navarro, G. (2001). A guided tour to approximate string matching. ACM computing surveys (CSUR), 33(1), 31-88.

Niederer, S., & Van Dijck, J. (2010). Wisdom of the crowd or technicity of content? Wikipedia as a sociotechnical system. New Media & Society, 12(8), 1368-1387 Nov, O. (2007). What motivates wikipedians?. Communications of the ACM,50(11), 60-64.

O'neil, M. 2009. Cyberchiefs. Autonomy and Authority in Online Tribes, London, UK, Pluto Press Rafaeli, S., & Ariel, Y. (2008). 11 Online Motivational Factors: Incentives for Participation and Contribution in Wikipedia.

Schroer, J., & Hertel, G. (2009). Voluntary engagement in an open web-based encyclopedia: Wikipedians and why they do it. Media Psychology, 12(1), 96-120.

Stvilia, B., Twidale, M., Smith, L. & Gasser, L. 2008. Information quality work organization in Wikipedia. Journal of the American Society for Information Science and Technology, 59, 983-1001 Viégas, F. B., Wattenberg, M., & McKeon, M. M. (2007). The hidden order of Wikipedia. In Online communities and social computing (pp. 445-454). Springer Berlin Heidelberg.

25 WINLAB