“Never doubt that a small group of thoughtful, committed citizens can change the world.

Download Report

Transcript “Never doubt that a small group of thoughtful, committed citizens can change the world.

“Never doubt that a small group
of thoughtful, committed citizens
can change the world. Indeed, it
is the only thing that ever has.”
--Margaret Mead
Thank You R Hackers of
NYC
Harvesting & Analyzing
Interaction Data in R: The
Case of MyLyn
Sean P. Goggins, PhD
Drexel University
[email protected]
MyLyn Research Collaborators:
Peppo Valetto, PhD (PI) & Kelly Blincoe
I Study Small Groups
I use electronic trace data, interviews, field notes, electronic content & surveys for raw data
MyLyn – Software
Engineering
CANS/Sakai – Online
Learning
Small Group
Interactions
Virtual Math Teams
http://www.mathforum.org
Health Care
Communities (Under
NDA)
Coolest Open* Data to Me
 Group’s Emerging & Evolving
 Group Formation & Development
 The long tail of social computing, which I
describe as everything *except* Wikipedia
& Facebook
 Groups constructing knowledge, creating
information and forming identity.
*Available, but not always easy to get in an analyzable form
Data
Points

Harvesting Small, Open Data [MyLyn]

Analyzing


Harvest
Analyze
Temporal Changes in the MyLyn Network

Work

Talk
Libraries Used & Source Code

StatNet

iGraph

TNET

R Sourcecode and Data will be available for download at
http://www.groupinformatics.org . If you use this data or scripts please cite:

Goggins, S. P., Laffey, J., Amelung, C., and Gallagher, M. 2010. Social Intelligence In Completely Online Groups. IEEE International Conference on Social
Computing. 500-507. DOI=10.1109/SocialCom.2010.79.

Blincoe, K., Valetto, G., and Goggins, S. 2011. Leveraging Task Contexts for Managing Developers’ Coordination. Under Review.
Data
Harvest
Analyze
Data for R
An Example From the MyLyn Project
More About MyLyn:
http://tasktop.com/blog/
http://www.eclipse.org/mylyn/
.zip file
MyLyn Context
Uploads
Work
Bug Database
Talk
MySQL Database
Talk
Talk
HTML Parser
Talk Cues
Work
Talk
Coordination Requirements &
Dependencies
MyLyn Data Has 2 Advantages
for Analysis compared to
source
Control systems analysis:
1. You see files *viewed*
together
2. Discourse on a Bug is
directly connected to the
files read and edited
1. Closer connection
between analysis of
work & talk.
Talk
Work
Data
Harvest
Analyze
Harvesting Data for R
An Example From the MyLyn Project
MyLyn Interaction Datamart
Data
Harvest
Analyze
Talk
Bug/Task
Context
• Files Accessed &
Edited
• Bugs/Tasks
Worked on
Developer
Context
• Initial Bug
Description ->
Task
• Discussion
Related to Bugs
• Work – Bug/Task
Action
• Talk – Bug/Task
Discussion
Integrated
Repository
Work
CANS
Interaction Warehouse
Work
Talk
ETC
MyLyn
Data
Harvest
Analyze
Analyzing Open Data with R
An Example From the MyLyn Project
Analysis Tools
 Eight Mylyn Releases (Temporal Analysis)
 R Packages Used
 TNET
 iGraph
 Statnet
Weighted Network: TNET
The Dense Graph (Work)
 Developers create a dense graph. Not a complete
graph, but dense.
Work
A Sparser Graph (Talk)
 Commenter's create a sparse graph
Talk
Release One (2.0) Analysis
Code
Work
Discussion
Talk
iGraph
STATNET for Discussion
 StatNet
Red = Bug Commenter
Blue = Bug Opener
Talk
StatNET
Release One
Work & Talk
Release 1 (2.0) iGraph & Statnet
Talk
Red = Bug Commenter
Blue = Bug Opener
Clusters
StatNET
iGraph
In Degree &
Out Degree
Release One (2.0): Filtered
Code
Discussion
Talk
Work
Google
Summer
Coder
304, 373, 399 & 143 form
The Strongest Connections
In both networks
Red = Bug Commenter
Blue = Bug Opener
Release One (2.0): Filtered
Code
Work
Google
Summer
Coder
304, 373, 399 & 143 form
The Strongest Connections
In both networks
Discussion
Talk
457, 391 & 159 –
Comment & Open
Red = Bug Commenter
Blue = Bug Opener
Compare Over Time
First & Last Release
Release 1 (2.0) Compared to Release 8 (3.3)
Talk
304, 399, 143, 159, 173, 373
StatNET & ordinary plotting
399, 118, 304, 159, 391, 416
Release 1 (2.0) Compared to Release 8 (3.3)
Work
143 & 304 disengaged
Or missing entirely
304, 373, 399 & 143
Two disconnected
Graphs in release 8
iGraph
Release Eight
Work & Talk
Release 8 (3.3): Filtered
Discussion
Talk
Code
Nobody is
“Just Blue”
Work
Red = Bug Commenter
Blue = Bug Opener
Release 8 (3.3): Filtered
Discussion
Code
Talk
Work
Notice 416 in Talk & Second Coder Graph
Red = Bug Commenter
Blue = Bug Opener
Release 8 (3.3) iGraph & Statnet
399, 118 & 159 are significant, But play with different clusters of Other people.
Talk
Red = Bug Commenter
Blue = Bug Opener
Clusters
iGraph
Blue
Cluster
StatNET
In Degree &
Out Degree
Releases One  Eight
High Level Views Over Time
Discussion, Releases 1 – 8
Where there is no color,
There are multiple, incomplete
Graphs.
Code, Releases 1 – 8
One Possible explanation:
A few central
People who slowly but
Observably begin to engage
Other contributors in
An open source software
Development project.
Structure evolves
Key Groups Evolve
iGraph
Next Step: The Story
But that’s the research part, not the cool “R Stuff ” Part
The People
373
304
399
159
143
Our next step is piecing together a narrative about the groups that emerged on this project, and
describing each of the individuals. This is all open data. When we finish this part, we will publish
one or more papers. For now, Let’s look at the cool “R Stuff ”
Interaction Traces from Small
Groups: The Case of MyLyn
Sean P. Goggins, PhD
Drexel University
[email protected]
Collaborators:
Peppo Valetto, PhD & Kelly Blincoe
Questions? In the after session.