No Slide Title

Download Report

Transcript No Slide Title

Combining Bibliometric and Knowledge
Elicitation Techniques to Map a
Knowledge Domain
Katherine W. McCain*, June M. Verner, Gregory W. Hislop,
William Evanco, & Vera Cole.
College of Information Science & Technology
Drexel University
KATE'S
PHILADELPHIA
BRAND
BIBLIOMETRICS
PHILADELPHIA brand Bibliometrics
Organizations
ISI: Gene Garfield, Henry Small
Drexel: Belver Griffith, Howard White, Chaomei Chen, Xia
Lin, Carl Drott, Jackie Mancall, and a host of grad students
Center for Research Planning: Dick Klavans, Len Simon
Major themes:
citation analysis/core literatures;
aging of scholarly literatures;
single period and longitudinal studies of scholarly literatures
and fields;
real-time, on-the-fly mapping of literatures, fields, paradigm
shifts, vocabulary structures, etc.;
bibliometric applications in collection management,
competitive intelligence, institutional evaluation, etc.
AGENDA
Introduction: Domain analysis & software engineering
Mapping methods:
Author Cocitation Analysis
Knowledge Elicitation – card sorting
Results
ACA clusters & map
PFNet author network
Card sorting clusters & map
Comparisons of ACA and KE results
Conclusions
DOMAIN ANALYSIS
SYSTEMS ANALYSIS: the task of identifying the operations and
objects needed to specify information processing in a particular
application domain
INFORMATION SCIENCE: the study of the field (knowledge
domain) as a thought or discourse community. It focuses on such
topics as knowledge organization, structure, cooperation patterns,
language and communication forms, information systems, and
relevance criteria as a way of understanding these communities
(Hjørland, B., & Albrechtsen, H. (1995)
An Aside On DISCOURSE COMMUNITY
A group (likely to be geographically dispersed) who share:
a common public goal or goals
a body of specialized knowledge
mechanisms of intercommunication and participation
a genre (e.g. scholarly journal)
a specialized vocabulary
Adapted from John Swales, Genre Analysis (1990 Cambridge)
SOFTWARE ENGINEERING
The establishment and use of sound engineering
principles in order to obtain economically software that
is reliable and works efficiently on real machines.
the technological and managerial discipline concerned
with systematic production and maintenance of software
products that are developed and modified on time and
within cost estimates
DOMAIN ANALYSIS OF SOFTWARE
ENGINEERING
a study of the journal literature of software engineering, based on
both author referencing patterns and index term assignments
a study of the factors that affect the “visibility” of software
engineering authors
an INSPEC-based co-descriptor mapping of software
engineering
a conjoint study of the intellectual and cognitive structure of
software engineering
Citation content analysis of Brooks’ Mythical Man-Month
TWO APPROACHES TO MAPPING SE
BIBLIOMETRICS: Cocited author mapping uses the patterns of
co-occurrence of authors’ names in reference lists to examine
the intellectual structure of scholarly literatures and, by
extension, the fields that produce those literatures
KNOWLEDGE ELICITATION: the process of collecting from a
human source of knowledge, information that is thought to be
relevant to that knowledge. [Cooke]
Card sorting: structural analysis of mental models elicited via sorting
named cards into piles
AUTHOR COCITATION ANALYSIS
AUTHOR SELECTION: authors highly cited in texts and in the
core SE literature = 60 authors selected for study
COCITATION DATA GATHERED: cocitation counts retrieved
from SCISEARCH, 1990 – 1997
ANALYSIS:
Raw cocitation counts -- PFNets
Correlation matrix – cluster analysis & multidimensional scaling
60 AUTHORS
Abdel-Hamid, Tarek K.
Fagan, M. E.
Kaiser, G. E.
Rombach, H. D.
Albrecht, Allan J.
Fenton, Norman E.
Kemerer, C. F.
Rumbaugh, James
Basili, Victor R.
Garlan, David
Kernighan, Brian W.
Selby, R. W.
Beizer, Boris
Ghezzi, Carlo
Kitchenham, Barbara A.
Shaw, Mary
Biggerstaff, Ted J.
Gilb, Tom
Lehnman, M. M.
Shepperd, M.
Boehm, Barry W.
Glass, Robert L.
McCabe, Thomas J.
Shneiderman, Ben
Booch, Grady
Goldberg, Adele
Meyer, Bertrand
Sommerville, Ian
Brooks, Frederick P., Jr.
Gomaa, Hassan
Mills, Harlan D.
Tichy, W. F.
Card, David N.
Grady, Robert B.
Musa, John D.
Tracz, Will
Clarke, Lori A.
Harrison, W.
Myers, Glenford J.
Wasserman, A. I.
Coad, Peter
Hoare, C.A.R
Parnas, David L.
Weiser, M.
Curtis, Bill
Humphrey, Watts S.
Pfleeger, Shari L.
Weyuker, Elaine J.
David, Allan M.
Jackson, Michael A.
Pressman, Roger S.
Wing, Jeanette, M.
DeMarco, Tom
Jacobson, Ivar
Prieto-Diaz, R.
Yourdon, Edward
Dijkstra, Edsger W.
Jones, T. Capers
Ramamoorthy, C. V.
Zave, Pamela
Data Gathering for ACA
CITATIONS
* Multiple forms o f autho rs '
names were used in the search
stra teg ies
JONES TC
1 59 3 33 19 7 10 6 6 39
HUMPHREY W
WEYUKER E
74
32
66
5
9 1 29 22 49 2 30
15
8
3 63 14 1 27 6
HUMPHREY W
Retrieval Strategy *
5 58 3 38 27 1 1 39 2
JONES TC
Source Papers
CA = BROOKS FP
AND
CA = PFLEEGER S
GLASS RL
GLASS RL
Pfleeger, S...
Wey uk er, E
3 67 1 18 28 8
DIJKSTRA E
Jo nes, TC..
1 33 3 12 13
BROOKS FP
DIJKSTRA E
Raw Cocitation
Matrix
8 31
BASILI V
BROOKS FP
CA = BROOKS FP
AND
CW = JONES TC
*
BASILI V
Broo ks , FP. ..
Broo ks , FP. ..
ALBRECHT W
1 . 1 98 2
2 . 1 98 7
3.
4.
5 . 1 98 1
6.
7 . 1 97 3
8 . 1 98 4
9.
1 0.
Analytical Tools for Raw Cocitation counts
Analytical Tools for Proximity Matrix
ACA ANALYSES
Raw Cocitation Matrix
PFNet: links nodes (authors) based on their single highest
co-occurrence counts. The result is generally a network
structure with some authors appearing as major foci (many
links to others) representing specialties
Correlation Matrix
Hierarchical cluster analysis: 8 cluster solution identifies
major subject clusters
Multidimensional scaling: 2 dimensional map shows overall
structure and major themes
Knowledge Elicitation Methods
Interviews and observation
Process tracing (e.g. protocol analysis)
Conceptual techniques
Card sorting is a conceptual technique that can be done alone or
combined with semi-structured interviews.
Card Sorting
Software engineers contacted via e-mail, invited to
participate in study
Task: sort cards bearing authors’ names into piles,
label piles, complete short questionnaire
As many piles as desired
Piles with single authors
Pile of “don’t know” or “aren’t software engineers
46 respondents participated in postal mail study (a few
interviews)
Don't
Know
Metr ics
Brooks, F.
Formal
Methods
Stack of cards w ith authors' name
sent to respondents w ith instructions
0
1
8
HOARE
1
2
5
37
JACOBSON
0
0
30
4
3
7
28
0
0
1
0
0
0
2
1
1
SOMMERVILLE
3
BASILI
PFLEEGER
2
PFLEEGER
DIJKST RA
JACOBSON
1
HOARE
0
DIJKST RA
BOOCH
BOOCH
7
ABDEL-HAMID
BASILI
Cards w ere sorted into piles
and labeled, based on
respondents' perceptions
RAW "CO-PILE" COUNTS
Card Sorting Procedure
CARD SORTING ANALYSES
(correlation matrix)
Hierarchical cluster analysis—8 cluster level
Multidimensional scaling – 2 dimensional map
LOW F ORMAL
T rac z• SW ARCHITECTURE/
SW REUSE
Priet o-Diaz
Ja cobson
Biggerstaff •
•
SW P ROJECT MGT
DeMarc o
Rumbaugh •
•
•
• Coad
Yourdon
Abde l-Hamid Kemere r
Kaise r
Booc h•
••
•
•
•
Gomaa
Boehm•
Pressma n
OBJECT-ORIENTED
•
ANALYSIS & DESIGN/
• Wasserman
•Brooks
Humphrey
P ROGRAMMING
Rombac h
SYSTEMS
•
Ja ckson
• • Kitchenham
•
ANALYSIS
Albre
cht
•
•
Davi
s
Curti
s
Basil
i
•
L ehman
•
• Card
•
& DESIGN
Gilb
•
• Fe nton
•
Grady
•
•
Meye r•
Shaw
Shne ide rman
Pflee ge r
MICRO
MACRO
•
LEVEL
LEVEL
Fa ga n
•
Z ave • •
Somme
rvil
le
•
SW P ERF ORMANCE
Shepperd
•
Goldberg •
Pa rna s
•
• Ramamoorthy
Se lby
•
McCabe
Myers
Musa
Garla n
•
•
•
Ghez zi Wing
Glass
SW METRICS
Jones
Beiz er•
•
•
•
•
F ORMAL AP PROACHES
•Mills
TO DEVELOPMENT/
F ORMAL METHODS
• Harri son
Kernighan•
Weyuke r
•
T ichy
•
Hoare•
Dijkstra •
Weiser
•
Cocit at ion Map of 60 Highly Cit ed
Authors in Soft ware Engineeri ng
1990 - 1997
SW TESTING/
RELIABILITY
Clarke
•
T ICHY
JACOBSON
KEMERER
JACKSON
KAISER
ALBRECHT
RUMBAUGH
GOLDBERG
SHNEIDERMAN
MUSA
JONES
YOURDON
COAD
KERNIGHAN
CURTIS
LEHMAN
DEMARCO
MEYER
KITCHENHAM
BOOCH
F AGAN
HUMPHREY
SHAW
PRIETO-DIAZ
BOEHM
GARLAN
T RACZ
HOARE
Z AVE
SOMMERVI LLE
GLASS
GHEZZ I
MILLS
PRESSMAN
BROOKS
ABDEL-HAMID
DIJKST RA
WING
RAMAMOORTHY
WASSERMAN
GILB
BIGGERSTAFF
WEI SER
PARNAS
GOMAA
GRADY
DAVIS
PFLEEGER
BASILI
ROMBACH
F ENTON
SELBY
CARD
PFNet of Raw Cocitation Counts
for 60 Softwar e Engi neeri ng Author s
1992 - 1997.
MCCABE
WEYUKER
MYERS
BEIZ ER
SHEPPERD
HARRISON
CLARKE
Comparisons: ACA and KE
Cluster similarity – most authors in similar
clusters in terms of membership. Some
differences in labeling
There are differences between the way authors’
works are cited and the way the authors are
perceived in terms of labels (known for textbook
writing, cited for specific textbook content)
CARD SORTING CLUSTERS
JONES
BASILI
PFLEEGER
ROMBACH
SW METRICS
CARD
MCCABE
GRADY
FENTON
KITCHENHAM
HARRISON
SELBY
SHEPPERD
KEMERER
ALBRECHT
COCITATION CLUSTERS
BASILI
PFLEEGER
ROMBACH
CARD
SW METRICS
MCCABE
GRADY
FENTON
KITCHENHAM
HARRISON
SELBY
SHEPPERD
WEYUKER
KEMERER
ALBRECHT
SE MANAGEMENT
PROCESS MODELING
BOEHM
GILB
CURTIS
HUMPHREY
ABDUL-HAMID
LEHMAN
BOEHM
GILB
SE PROJECT
CURTIS
MANAGEMENT
HUMPHREY
ABDUL-HAMID
LEHMAN
BROOKS
CARD SORTING CLUSTERS
FORMAL
METHODS/
SW ARCHITECTURE
OBJECT ORIENTED
PROGRAMMING &
DESIGN
SE METHODOLOGIES/
SE TEXTS
GARLAN
RAMAMOORTHY
DIJKSTRA
HOARE
PARNAS
SHAW
WING
ZAVE
GHEZZI
KERNIGHAN
BOOCH
RUMBAUGH
JACOBSON
MEYER
COAD
GOLDBERG
PRESSMAN
SOMMERVILLE
DEMARCO
YOURDON
WASSERMAN
GOMAA
JACKSON
BROOKS
GLASS
MILLS
MYERS
DAVIS
COCITATION CLUSTERS
JONES
DAVIS
DIJKSTRA
HOARE
PARNAS
SHAW
WING
ZAVE
GHEZZI
KERNIGHAN
BOOCH
RUMBAUGH
JACOBSON
MEYER
COAD
GOLDBERG
FORMAL METHODS/
FORMAL APPROACHES
OO ANALYSIS
& DESIGN
PROGRAMMING
SHNEIDERMAN
PRESSMAN
SYSTEMS ANALYSIS
SOMMERVILLE & DESIGN
DEMARCO
YOURDON
WASSERMAN
GOMAA
JACKSON
CARD SORTING CLUSTERS
BIGGERSTAFF
SW REUSE
TRACZ
PRIETO-DIAZ
SW TOOLS &
ENVIRONMENTS
KAISER
TICHY
COCITATION CLUSTERS
BIGGERSTAFF
TRACZ
SW ARCHITECTURE
PRIETO-DIAZ SW REUSE
KAISER
TICHY
GARLAN
Comparisons: ACA and KE
Map similarity – similar distribution of authors and
clusters along X-axis (r=0.73) but not along Y-axis
(r=-0.08)
The most important structural theme in Software
Engineering, the “micro   macro” dimension, exists
in both citation patterns and in perceptions of the field
by citing authors. Along the Y-axis, citing patterns focus
on the content of authors’ work while general
perceptions include more aspects of the authors’
personae.
Conclusions
Boehm, Basili, Booch, and Hoare are central figures in the
Software Engineering R&D literature; we can identify other
authors as probable linkers between research specialties.
The main organizing principle in SE is a continuum of activities
related to the process of software design, development, and
evaluation.
Key specialties in Software Engineering (in the decade of the
1990s) included Object-Oriented Programming, Analysis &
Design, Formal Methods, Software Reuse, Software Testing &
Reliability, Software Process Management, and Software
Metrics.
Conclusions
ACA (mapping, PFNets) and KE (cardsorting) provide
complementary views of software engineering. KE methods
increase our understanding of the domain by capturing subjects’
mental models of the domain and providing additional
information about mapped entities
ACA and KE provide useful cross-validation. The structure of the
literature as seen through networks of author indebtedness
(citation of previous work) is a good reflection of their mental
models of the field, the place of the (cited) authors, and the
relationships among their contributions