Phipps’s Hawaiian Marriage data, reconfigured.

Download Report

Transcript Phipps’s Hawaiian Marriage data, reconfigured.

More on Choosing #Clusters in General
• References
– Breckenridge, James N. (2000), “Validating Cluster Analysis: Consistent Replication
and Symmetry,” Multivariate Behavioral Research, 35 (2), 261-285.
– Calinski, R. B. and J. Harabasz (1974), “A Dendrite Method for Cluster Analysis,”
Communications in Statistics, 3, 1-27.
– Krolak-Schwerdt, Sabine and Thomas Eckes (1992), “A Graph Theoretic Criterion for
Determining the Number of Clusters in a Data Set,” Multivariate Behavioral
Research, 27 (4), 541-565.
– Milligan, Glenn W. and Martha C. Cooper (1985), “An Examination of Procedures for
Determining the Number of Clusters in a Data Set,” Psychometrika, 50, 159-179.
– Steinley, Douglas and Michael J. Brusco (2011), “Choosing the Number of Clusters in
K-Means Clustering,” Psychological Methods, 16 (3), 285-297.
References: Articles
• Goodman, Leo A. and William H. Kruskal (1954), “Measures of Association for Cross
Classification” Journal of the American Statistical Association, 49, 732-764.
– Measures like correlations (r’s) but for categorical data
• Hartigan, John A. and M. A. Wong (1979), “A K-Means Clustering Algorithm,” Applied
Statistics, 28, 100-108.
– K-means and the Fortran code (hehehe, how cool & nerdy is that?!)
• Johnson, Stephen C. (1967), “Hierarchical Clustering Schemes,” Psychometrika, 32 (3),
241-254.
– “Hierarchy” is defined, single-link & complete-link are introduced
• Lance, G. N. and W. T. Williams (1967), “A General Theory of Classificatory Sorting
Strategies, I. Hierarchical Systems,” Computer Journal, 9, 373-380.
– The equation that subsumes single, complete, average, Ward’s, etc.
• Milligan, Glenn W. (1979), “Ultrametric Hierarchical Clustering Algorithms,”
Psychometrika, 44 (3), 343-346.
– Extends ultrametric distances
• Ward, Joe H., Jr. (1963), “Hierarchical Grouping to Optimize an Objective Function,”
Journal of the American Statistical Association, 58 (301, March), 236-244.
– The Ward of Ward’s method
References: Books
•
•
•
Aldenderfer, Mark S., and Roger K. Blashfield (1984), Cluster Analysis, Newbury Park, CA:
Sage.
– Great succinct intro
Hartigan, John (1975), Clustering algorithms, NY: Wiley.
– Has the fortran code for a bunch of algorithms
Sneath, Peter H. A. and Robert R. Sokal (1973), Principles of Numerical Taxonomy, San
Francisco: Freeman.
– Solid, examples are from a diff field (bio) but refreshing at the same time
Cluster analysis also appears as a chapter in most multivariate stats books, such as:
• Seber, G.A.F. (1984), Multivariate Observations, NY: Wiley, Ch.7, pp.347-394.
References: Articles
•
•
Arabie, Phipps, J. Douglas Carroll, Wayne DeSarbo, and Jerry Wind (1981), “Overlapping
Clustering: A New Method for Product Positioning,” Journal of Marketing Research 18
(Aug.), 310-317.
– Cool model for non-hierarchical clustering
Punj, Girish, and David W. Stewart (1983), “Cluster Analysis in Marketing Research:
Review and Suggestions for Application,” Journal of Marketing Research 20 (May), 134148.
– Illustrates a wide variety of applications of clustering
Recommendation Engines & Clustering
•
•
Iacobucci, Dawn, Phipps Arabie and Anand Bodapati (2000), “Recommendation Agents
on the Internet,” Journal of Interactive Marketing, 14 (3), 2-11.
Bodapati, Anand V. (2008), “Recommendation Systems with Purchase Data,” Journal of
Marketing Research, 45 (Feb.), 77-93.
Other Clustering Applications
•
Parkman, Margaret A. and Jack Sawyer (1967), “Dimensions of Ethnic Intermarriage in
Hawaii,” American Sociological Review, 32 (4), 593-607.
Clustering Related
•
•
McCutcheon, Allan L. (1987), Latent Class Analysis, Newbury Park, CA: Sage.
Smithson, Michael and Jay Verkuilen (2006), Fuzzy Set Theory: Applications in the Social
Sciences, Thousand Oaks, CA: Sage.