Introduction: Ice Breaker

Download Report

Transcript Introduction: Ice Breaker

Field Profiling
Productivity
Top
Journals
Top
Researchers
Measuring Scholarly Impact in the field of Semantic Web
Data: 44,157 papers with 651,673 citations from Scopus (1975-2009), and
22,951 papers with 571,911 citations from WOS (1960-2009)
Impact through citation
Impact
Top
Journals
Top
Researchers
Rising Stars


In WOS, M. A. Harris (Gene Ontology-related research), T.
Harris (design and implementation of programming languages)
and L. Ding (Swoogle – Semantic Web Search Engine) are
ranked as the top three authors with the highest increase of
citations.
In Scopus, D. Roman (Semantic Web Services), J. De Bruijn
(logic programming) and L. Ding (Swoogle) are ranked as top
three for the significant increase in number of citations.
Ding, Y. (2010). Semantic Web: Who is Who in the field, Journal of Information
Science, 36(3): 335-356.
Section 1
DATA COLLECTION
Steps

Step 1:

Data collection
Using journals
 Using keywords


Example

INFORMATION RETRIEVAL, INFORMATION STORAGE and
RETRIEVAL, QUERY PROCESSING, DOCUMENT RETRIEVAL, DATA
RETRIEVAL, IMAGE RETRIEVAL, TEXT RETRIEVAL, CONTENT
BASED RETRIEVAL, CONTENT-BASED RETRIEVAL, DATABASE
QUERY, DATABASE QUERIES, QUERY LANGUAGE, QUERY
LANGUAGES, and RELEVANCE FEEDBACK.
Web of Science

Go to IU web of science
http://libraries.iub.edu/resources/wos
 For example,
 Select Core Collection
 search “information Retrieval” for topics, for all
years

Web of Science
Output
Output
Python







Download Python: https://www.python.org/downloads/
In order to run Python flawlessly, you might have to change
certain environment settings in Windows.
In short, your path is:
My Computer ‣ Properties ‣ Advanced ‣ Environment Variables
In this dialog, you can add or modify User and System
variables. To change System variables, you need non-restricted
access to your machine (i.e. Administrator rights).
User variable: C:\Program Files (x86)\Python27\Lib;
Or go to command line using “Set” and “echo %path%”
Python Script for conversion
#!/usr/bin/env python
# encoding: utf-8
"""
conwos.py
convert WOS file into format.
"""
import sys
import os
import re
paper = 'paper.tsv'
reference = 'reference.tsv'
defsource = 'source'
def main():
global defdestination
global defsource
source = raw_input('What is the name of source folder?\n')
if len(source) < 1:
source = defsource
files = os.listdir(source)
fpaper = open(paper, 'w')
fref = open(reference, 'w')
uid = 0
for name in files:
if name[-3:] != "txt":
continue
fil = open('%s\%s' % (source, name))
print '%s is processing...' % name
first = True
Conwos1.py
Python Script for conversion
for line in fil:
line = line[:-1]
if first == True:
first = False
else:
uid += 1
record = str(uid) + "\t"
refs = ""
elements = line.split('\t')
for i in range(len(elements)):
element = elements[i]
if i == 1:
authors = element.split('; ')
for j in range(5):
if j <
len(authors):
record += authors[j] + "\t"
else:
record += "\t"
elif i == 29:
refs = element
refz = getRefs(refs)
for ref in refz:
fref.write(str(uid) + "\t" + ref + "\n")
continue
record += element + "\t"
fpaper.write(record[:-1] + "\n")
fil.close()
fpaper.close()
fref.close()
Python Script for conversion
def getRefs(refs):
refz = []
reflist = refs.split('; ')
for ref in reflist:
record = ""
segs = ref.split(", ")
author = ""
ind = -1
if len(segs) == 0:
continue
for seg in segs:
ind += 1
if isYear(seg):
record += author[:-2] + "\t" + seg + "\t"
break
else:
author += seg + ", "
ind += 1
if ind < len(segs):
if not isVol(segs[ind]) and not isPage(segs[ind]):
record += segs[ind] + "\t"
ind += 1
else:
record += "\t"
else:
record += "\t"
Python Script for conversion
if ind < len(segs):
if isVol(segs[ind]):
record += segs[ind][1:] + "\t"
ind += 1
else:
record += "\t"
else:
record += "\t"
if ind < len(segs):
if isPage(segs[ind]):
record += segs[ind][1:] + "\t"
ind += 1
else:
record += "\t"
else:
record += "\t"
if record[0] != "\t":
refz.append(record[:-1])
return refz
Python Script for conversion
def isYear(episode):
pattern = '^\d{4}$'
regx = re.compile(pattern)
match = regx.search(episode)
if match != None:
return True
def isVol(episode):
pattern = '^V\d+$'
regx = re.compile(pattern)
match = regx.search(episode)
if match != None:
return True
def isPage(episode):
pattern = '^P\d+$'
regx = re.compile(pattern)
match = regx.search(episode)
if match != None:
return True
if __name__ == '__main__':
main()
Convert output to database
Using python script: conwos1.py
 Output: paper.tsv, reference.tsv

Convert output to database

Paper.tsv
Convert output to database

Reference.tsv
Load them to Access

Import data from external data at Access
Access Tables

Paper table
Access Tables

Citation table
Section 2
PRODUCTIVITY & IMPACT
Productivity

Top Authors

Find duplicate records (Query template)
Productivity

Top Journals

Find duplicate records (Query template)
Productivity

Top Organizations

Find duplicate records (Query template)
Impact

Highly cited authors

Find duplicate records (Query template)
Impact

Highly cited journals

Find duplicate records (Query template)
Impact

Highly cited articles

Find duplicate records (Query template)
Other indicators

What are other indicators to measure
productivity and impact:
Time
 Journal impact factor
 Journal category
 Keyword


… think about something in-depth, what are your
new indicators?
Section 3
AUTHOR-COCITATION
NETWORK
Top 100 highly cited authors


First select the set of authors with
whom you want to build up the
matrix
Select top 100 highly cited authors
Author Cocitation Network
Author Cocitation Network
Author Cocitation Network
Load the network to SPSS
Load the network to SPSS
Section 4
CLUSTERING
Clustering Analysis
Aim: create clusters of items that have
similarity with others in the same cluster and
differences with those outside of the cluster.
 So to create similarity within cluster and
difference between clusters.
 Items are called cases in SPSS.
 There are no dependent variables for cluster
analysis

Clustering Analysis
The degree of similarity and dissimilarity is
measured by distance between cases
 Euclidean Distance measures the length of a
straight line between two cases
 The numeric value of distance should be at
the same measurement scale.
 If it is based on different measurement scales,

Transform to the same scale
 Or create a distance matrix first

Clustering
Hierarchical clustering does not need a
decision on the number of cluster first, good
for a small set of cases
 K-means does need # of clusters first, good
for a large set of cases

Hierarchical Clustering
Hierarchical Clustering
Hierarchical Clustering: Data

Data.



The variables can be quantitative, binary, or count data.
Scaling of variables is an important issue--differences in
scaling may affect your cluster solution(s).
If your variables have large differences in scaling (for
example, one variable is measured in dollars and the other
is measured in years), you should consider standardizing
them (this can be done automatically by the Hierarchical
Cluster Analysis procedure).
Hierarchical Clustering: Data

Case Order


Cluster solution may depend on the order of cases in the
file.
You may want to obtain several different solutions with
cases sorted in different random orders to verify the
stability of a given solution.
Hierarchical Clustering: Data

Assumptions.



The distance or similarity measures used should be appropriate for the
data analyzed.
Also, you should include all relevant variables in your analysis.
Omission of influential variables can result in a misleading solution.
Because hierarchical cluster analysis is an exploratory method, results
should be treated as tentative until they are confirmed with an
independent sample.
Hierarchical Clustering: Method

Nearest neighbor or single linkage


Furthest neighbor or complete linkage


The dissimilarity between cluster A and B is represented by the
maximum of all possible distances between cases in A and B
Between-groups linkage or average linkage


The dissimilarity between cluster A and B is represented by the
minimum of all possible distances between cases in A and B
The dissimilarity between cluster A and B is represented by the
average of all possible distances between cases in A and B
Within-groups linkage

The dissimilarity between cluster A and B is represented by the
average of all the possible distances between the cases within a single
new cluster determined by combining cluster A and B.
Hierarchical Clustering: Method

Centroid clustering


Ward’s method


The dissimilarity between cluster A and B is represented by the
distance between the centroid for the cases in cluster A and the
centroid for the cases in cluster B.
The dissimilarity between cluster A and B is represented by the “loss
of information” from joining the two clusters with this loss of
information being measured by the increase in error sum of squares.
Median clustering

The dissimilarity between cluster A and cluster B is represented by the
distance between the SPSS determined median for the cases in cluster
A and the median for the cases in cluster B.
All three methods should use squared Euclidean distance rather than
Euclidean distance
Measure for Interval








Euclidean distance. The square root of the sum of the squared differences
between values for the items. This is the default for interval data.
Squared Euclidean distance. The sum of the squared differences between
the values for the items.
Pearson correlation. The product-moment correlation between two
vectors of values.
Cosine. The cosine of the angle between two vectors of values.
Chebychev. The maximum absolute difference between the values for the
items.
Block. The sum of the absolute differences between the values of the item.
Also known as Manhattan distance.
Minkowski. The pth root of the sum of the absolute differences to the pth
power between the values for the items.
Customized. The rth root of the sum of the absolute differences to the pth
power between the values for the items.
Transform values






Z scores.Values are standardized to z scores, with a mean of 0 and a
standard deviation of 1.
Range -1 to 1. Each value for the item being standardized is divided by the
range of the values.
Range 0 to 1. The procedure subtracts the minimum value from each item
being standardized and then divides by the range.
Maximum magnitude of 1. The procedure divides each value for the item
being standardized by the maximum of the values.
Mean of 1. The procedure divides each value for the item being
standardized by the mean of the values.
Standard deviation of 1. The procedure divides each value for the variable
or case being standardized by the standard deviation of the values.
Hierarchical Clustering: Method
Hierarchical Clustering
Identify relatively homogeneous groups of
cases (or variables) based on selected
characteristics, using an algorithm that starts
with each case (or variable) in a separate
cluster and combines clusters until only one is
left.
 Distance or similarity measures are generated
by the Proximities procedure

Hierarchical Clustering: Statistics
Hierarchical Clustering: Statistics

Agglomeration schedule


Proximity matrix


Displays the cases or clusters combined at each stage, the distances
between the cases or clusters being combined, and the last cluster
level at which a case (or variable) joined the cluster.
Gives the distances or similarities between items.
Cluster Membership

Displays the cluster to which each case is assigned at one or more
stages in the combination of clusters. Available options are single
solution and range of solutions.
Hierarchical Clustering: Plot

Dendrograms


Icicle plots


can be used to assess the cohesiveness of the
clusters formed and can provide information
about the appropriate number of clusters to keep.
display information about how cases are combined
into clusters at each iteration of the analysis.
(User can specify a range of clusters to be
displayed)
Orientation: a vertical or horizontal plot.
Hierarchical Clustering: Plot
Hierarchical Clustering: Result

Dendrogram using Average Linkage
(Between Groups)
Dendrogram with Ward linkage
K-Means Clustering
K-Means can handle large number of cases
 But it requires users to specific # of clusters

K-Means Clustering: Method

Iterate and classify
# of iteration
 Convergence criteria


Classify only

No iteration
K-Means Clustering: Method

Cluster Centers
 initial cluster centers and the file (if required), which
contains the final cluster centers.
 Read initial from: we specify the file which contains the
initial cluster centers, and
 in Write final as: we specify the file which contains the final
cluster centers.
K-Means Clustering: Method

Iterate


By default 10 iterations and convergence criterion 0 are
given.
Use running means
 Yes: cluster centers change after the addition of each
object.
 No: cluster centers are calculated after all objects have
been allocated to a given cluster.


Maximum Iterations (no more than 999)
Convergence Criterion
K-Means Clustering: Method
K-Means Clustering: Statistics

The output will show the following
information:
initial cluster centers,
 ANOVA table.
 Each case distance from cluster center.

K-Means Clustering: Method
K-Means
K-Means: Result

Initial Cluster Centers

Vectors with their values based on the # of cluster
variables.
K-Means: Result
WEEK12
Courtesy: Angelina Anastasova, Natalia Jaworska from University of
Ottawa
MULTIDIMENSIONAL
SCALING
Multidimensional Scaling (MDS):
What Is It?



Generally regarded as exploratory data analysis.
Reduces large amounts of data into easy-to-visualize
structures.
Attempts to find structure (visual representation) in a set of
distance measures, e.g. dis/similarities, between objects/cases.



Shows how variables/objects are related perceptually.
How? By assigning cases to specific locations in space.
Distances between points in space match dis/similarities as
closely as possible:
Similar objects: Close points
Dissimilar objects: Far apart points
MDS Example: City Distances
Distances
Matrix:
Symmetric
Cluster
Spatial Map
Dimensions
1: North/South
2: East/West
The Process of MDS: The Data


Data of MDS: similarities, dissimilarities,
distances, or proximities reflects amount of
dis/similarity or distance between pairs of objects.
Distinction between similarity and dissimilarity data
dependent on type of scale used:
Dissimilarity scale: Low =high similarity &
High =high dissimilarity.
Similarity scale: Opposite of dissimilarity.


E.g. On a scale of 1-9 (1 being the same and 9
completely different) how similar are chocolate bars A
and B? Dissimilarity scale.
SPSS requires dissimilarity scales.
Data Collection for MDS (1)




Direct/raw data: Proximities’ values directly obtained
from empirical, subjective scaling.
• E.g. Rating or ranking dis/similarities (Likert scales).
Indirect/derived data: Computed from other measurements:
correlations or confusion data (based on mistakes) (Davidson, 1983).
Data collection: Pairwise comparison, grouping/sorting tasks,
direct ranking, objective method (e.g. city distances).
Pairwise comparisons: All object pairs randomly presented:
# of pairs = n(n-1)/2, n = # of objects/cases
Can be tedious and inefficient process.
Type of MDS Models (1)
MDS model classified according to:
1) Type of proximities:



Metric/quantitative: Quantitative information/interval data about
objects’ proximities e.g. city distance.
Non-metric/qualitative: Qualitative information/nominal data
about proximities e.g. rank order.
2) Number of proximity matrices (distance, dis/similarity
matrix).
• Proximity matrix is the input for MDS.
• The above criteria yield:



1) Classical MDS: One proximity matrix (metric or non-metric).
2) Replicated MDS: Several matrices.
3) Weighted MDS/Individual Difference Scaling: Aggregate
proximities and individual differences in a common MDS space.
Types of MDS (2)



More typical in Social Sciences is the classification of
MDS based on nature of responses:
1) Decompositional MDS: Subjects rate objects on an
overall basis, an “impression,” without reference to
objective attributes.
Production of a spatial configuration for an individual and
a composite map for group.
2) Compositional MDS: Subjects rate objects
on a variety of specific, pre-specified attributes
(e.g. size).
No maps for individuals, only composite maps.
The MDS Model

Classical MDS uses Euclidean principles to model
data proximities in geometrical space, where distance
(dij) between points i and j is defined as:
xi and xj specify coordinates of points i
and j on dimension a, respectively.

The modeled Euclidean distances are related to the observed
proximities, ij, by some transformation/function (f).

Most MDS models assume that the data have the form:
ij = f(dij)

All MDS algorithms are a variation of the above (Davidson,
1983).
Output of MDS

MDS Map/Perceptual Map/Spatial Representation:
1) Clusters: Groupings in a MDS spatial
representation.




These may represent a domain/subdomain.
2) Dimensions: Hidden structures in data. Ordered
groupings that explain similarity between items.
Axes are meaningless and orientation is arbitrary.
In theory, there is no limit to the number of
dimensions.
In reality, the number of dimensions that can be
perceived and interpreted is limited.
Diagnostics of MDS (1)





MDS attempts to find a spatial configuration X such
that the following is true: f(δij) ≈ dij(X)
Stress (Kruskal’s) function: Measures degree of
correspondence between distances among points on the
MDS map and the matrix input.
Proportion of variance of disparities
not accounted for by the model:
Range 0-1: Smaller stress = better representation.
None-zero stress: Some/all distances in the map are
distortions of the input data.
Rule of thumb: ≤0.1 is excellent; ≥0.15 not tolerable.
Diagnostics of MDS (2)

R2 (RSQ): Proportion of variance of the disparities
accounted for by the MDS procedure.


Weirdness Index: Correspondence of subject’s map and the
aggregate map  outlier identification.


R2≥0.6 is an acceptable fit.
Range 0-1: 0 indicates that subject’s weights are proportional to the
average subject’s weights; as the subject’s score becomes more
extreme, index approaches 1.
Shepard Diagram: Scatterplot of input proximities (X-axis)
against output distances (Y-axis) for every pair of items.

Step-line produced. If map distances fall on the step-line this
indicates that input proximities are perfectly reproduced by the MDS
model (dimensional solution).
Interpretation of Dimensions




Squeezing data into 2-D enables “readability” but may
not be appropriate: Poor, distorted representation of the
data (high stress).
Scree plot: Stress vs.
number of dimensions.
E.g. cities distance
Primary objective in dimension interpretation: Obtain
best fit with the smallest number of possible
dimensions.
How does one assign “meaning” to dimensions?
Meaning of Dimensions
Subjective Procedures:
 Labelling the dimensions by visual inspection,
subjective interpretation, and information from
respondents.
 “Experts” evaluate and identify the dimensions.

Validating MDS Results

Split-sample comparison:


Multi-sample comparison:


Original sample is divided and a correlation
between the variables is conducted.
New sample is collected and a correlation is
conducted between the old and new data.
Comparisons are done visually or with a
simple correlation of coordinates or variables.
Assessing whether MDS solution
(dimensionality extraction) changes
in a substantial way.
MDS Caveats




Respondents may attach different levels of
importance to a dimension.
Importance of a dimension may change over time.
Interpretation of dimensions is subjective.
Generally, more than four times as many objects as
dimensions should be compared for the MDS model
to be stable.
“Advantages” of MDS



Dimensionality “solution” can be obtained from
individuals; gives insight into how individuals differ
from aggregate data.
Reveals dimensions without the need for defined
attributes.
Dimensions that emerge from MDS can be
incorporated into regression analysis to assess their
relationship with other variables.
“Disadvantages” of MDS

Provides a global measure of dis/similarity but
does not provide much insight into subtleties
(Street et al., 2001).
Increased dimensionality: Difficult to represent
and decreases intuitive understanding of the
data. As such, the model of the data becomes
as complicated as the data itself.
 Determination of meanings of dimensions is
subjective.

“SPSSing” MDS
• In the SPSS Data Editor window, click: Analyze > Scale >
Multidimensional Scaling
• Select four or more Variables that you want to test.
• You may select a single variable for the Individual Matrices for window
(depending on the distances option selected).
• If Data are distances (e.g. cities distances) option is selected, click on
the Shape button to define characteristic of the dissimilarities/proximity
matrices.
• If Create distance from data is
selected, click on the Measure button to
control the computation of dissimilarities,
to transform values, and to compute
distances.
• In the Multidimensional Scaling dialog box, click on the Model button to
control the level of measurement, conditionality, dimensions, and the scaling
model.
• Click on the Options button to control the display
options, iteration criteria, and treatment of missing values.
MDS: A Psychological Example



“Multidimensional scaling modelling approach to latent
profile analysis in psychological research” (Ding, 2006)
Basic premise: Utilize MDS to investigate types or
profiles of people.
“Profile:” From applied psych where test batteries are
used to extract and construct distinctive
features/characteristics in people.
MDS method was used to:


Derive profiles (dimensions) that could provide
information regarding psychosocial adjustment patterns in
adolescents.
Assess if individuals could follow different profile patterns
than those extracted from group data, i.e. deviations from
the derived normative profiles.
Study Details: Methodology



Participants: College students (µ=23 years, n=208).
Instruments:
Self-Image Questionnaire for Young Adolescents
(SIQYA). Variables:


Body Image (BI), Peer Relationships (PR), Family Relationships
(FR), Mastering & Coping (MC), Vocational-Educational Goals
(VE), and Superior Adjustment (SA)
Three mental health measures of well-being:
 Kandel Depression Scale
 UCLA Loneliness Scale
 Life Satisfaction Scale
Data for MDS

Scored data for MDS profile analysis

Sample data for 14 individuals:
BI=body image, PR=peer relations, FR=family relations, MC=mastery & coping,
VE=vocational & educational goal, SA=superior adjustment, PMI-1=profile match
index for Profile 1, PMI-2=profile match index for Profile 2, LS=life satisfaction,
Dep=depression, PL=psychological loneliness
The Analysis: Step by Step

Step 1: Estimate the number of profiles
(dimensions) from the latent variables.
MDS map
Euclidean distance model
Kruskal's stress = 0.00478
• Excellent stress value.
RSQ = 0.9998
Configuration derived in 2
dimensions.
2.0
pr
1.5
1.0
.5
ve
0.0
mcsa
bi
-.5
-1.0
fr
-1.5
-2
Profile 1
-1
0
1
2
3
Scale values of two MDS profiles (dimensions) in
psychosocial adjustment.
MDS map
Euclidean distance model
2.0
pr
1.5
1.0
.5
ve
0.0
mcsa
bi
-.5
Profile 2

-1.0
fr
-1.5
-2
-1
0
1
2
3
Profile 1

Normative profiles of
psychosocial adjustments
in young adults.

Each profile represents
prototypical individual.
References









Davidson, M. L. (1983). Multidimensional scaling. New York: J. Wiley
and Sons.
Ding, C. S. (2006). Multidimensional scaling modelling approach to latent
profile analysis in psychological research. International Journal of
Psychology 41 (3), 226-238.
Kruskal, J.B. & Wish M.1978. Multidimensional Scaling. Sage.
Street, H., Sheeran, P., & Orbell, S. (2001). Exploring the relationship
between different psychosocial determinants of depression: a
multidimensional scaling analysis. Journal of Affective Disorders 64, 53–
67.
Takane, Y., Young, F.W., & de Leeuw, J. (1977). Nonmetric individual
differences multidimensional scaling: An alternating least squares method
with optimal scaling features, Psychometrika 42 (1), 7–67.
Young, F.W., Takane, Y., & Lewyckyj, R. (1978). Three notes on ALSCAL,
Psychometrika 43 (3), 433–435.
http://www.analytictech.com/borgatti/profit.htm
http://www2.chass.ncsu.edu/garson/pa765/mds.htm
http://www.terry.uga.edu/~pholmes/MARK9650/Classnotes4.pdf
MDS- SPSS
MDS- SPSS
MDS- SPSS
MDS- SPSS
MDS- SPSS
MDS- SPSS
MDS- SPSS
A field map
Combining MDS with Clustering methods
 Draw clusters on MDS plots using MS Paint
 Identify cluster labels

Mapping the field of IR
Author Co-citation Map in the field of Information Retrieval (1992-1997)
Data: 1,466 IR-related papers was selected from 367 journals with 44,836 citations.
Examples

McCain, K. (1990). Mapping authors in
intellectual space: A technical overview. Journal
of the American Society for Information
Science & Technology, 41(6), 433-443.

Ding,Y., Chowdhury, G. and Foo, S. (1999).
Mapping Intellectual Structure of Information
Retrieval: An Author Cocitation Analysis, 19871997, Journal of Information Science, 25(1): 6778.
FACTOR ANALYSIS
Factor Analysis
To identify underlying variables, or factors, that
explain the pattern of correlations within a set
of observed variables.
 It is a data reduction to identify a small
number of factors that explain most of the
variance observed in a large number of
variables.
 Assumption: variables or cases should be
independent (we can use correlation to check
whether some variables are dependent)

Descriptive
The Coefficients option produces the Rmatrix, and the Significance levels option will
produce a matrix indicating the significance
value of each correlation in the R-matrix.You
can also ask for the
 Determinant of this matrix and this option
 is vital for testing for multicollinearity or
singularity.

Extraction
The scree plot was described earlier and is a
useful way of establishing how many factors
should be retained in an analysis.
 The unrotated factor solution is useful in
assessing the improvement of interpretation
due to rotation.
 If the rotated solution is little better than the
unrotated solution then it is possible that an
inappropriate (or less optimal) rotation
method has been used.

Rotation




The interpretability of factors can be improved through
rotation.
Rotation maximizes the loading of each variable on one of the
extracted factors whilst minimizing the loading on all other
factors.
Rotation works through changing the absolute values of the
variables whilst keeping their differential values constant.
If you expect the factors to be independent then you should
choose one of the orthogonal rotations (varimax).
Score
This option allows you to save factor scores
for each subject in the data editor.
 SPSS creates a new column for each factor
extracted and then places the factor score for
each subject within that column.
 These scores can then be used for further
analysis, or simply to identify groups of
subjects who score highly on particular
factors.

Options




SPSS will list variables in the order in which they are entered
into the data editor.
Although this format is often convenient, when interpreting
factors it can be useful to list variables by size.
By selecting Sorted by size, SPSS will order the variables by
their factor loadings.
There is also the option to Suppress absolute values less than
a specified value (by default 0.1). This option ensures that
factor loadings within ±0.1 are not displayed in the output.
This option is useful for assisting in interpretation;
It should be clear that the first few factors
explain relatively large amounts of variance
(especially factor 1) whereas subsequent
factors explain only small amounts of variance.
 SPSS then extracts all factors with eigenvalues
greater than 1, now we have 23 factors
 The eigenvalues associated with these factors
are displayed in the columns labelled
Extraction Sums of Squared Loadings.

Result
Factor and its members
Pick up loading (absolute value)>0.7 to interpret the factor, with >0.4
to report as the members for the factor
Scree Plot
Mapping the field of IR
Ding, Y., Chowdhury, G. and Foo, S. (1999). Mapping Intellectual Structure of
Information Retrieval: An Author Cocitation Analysis, 1987-1997, Journal of
Information Science, 25(1): 67-78.
Broader Thinking

Trying other networks:


Co-author network, journal co-citation network
Trying factor analysis for your survey data