Information on K-Modes clustering and its Original Inventors: Anil Chaturvedi, Paul Green, and Douglas Carroll: Time of Invention: Summer of 1993, presenation to CSNA in 1994

This site is dedicated to K-Modes clustering and its Inventors. Since its inception in Summer 1993, the phrase "K-Modes clustering" has become very widely used, and is being used as frequently as "K-Means clustering" in areas such as Marketing Science, Bio-statistics, Data Mining, Computer Science, Mathematical Sciences, Sociology, Psychometrics, Classification Sciences, Biology, etc. Send email to sumaninc@att.net for questions.

The K-Modes Algorithm was invented by Anil Chaturvedi - with Paul Green, and Doug Carroll - in the summer of 1993 and first presented to external audiences in 1994 at the CSNA (Classification Society of North America) meeting held at Houston. It was presented again in 1996 at Chicago during the ASA conference (where handouts of the methodology were distributed), again in 1997 at the CSNA meetings, and finally published in 2001 in Journal of Classification. Since the Authors's primary interest is Marketing Science, it was invented especially for the purposes of Market Segmentation. It was only after 4 years of the original discovery of K-modes in 1994 that Data-miners and Computer Scientists published it. The paper was published by Huang et. al. in Data Mining journals as late as in 1998. Huang (Journal of Classification 2003) actually cites the three presentations by Chaturvedi et. al., clearly establishing the precedence of Chaturvedi et. al. (1994, 1996, and 1997) as the original inventors of the K-modes algorithm.

Anil Chaturvedi has collaborated with J. Douglas Carroll (Professor of Marketing and Psychology at Rutgers University), and Paul Green (Professor of Marketing, Wharton School, University and Pennsylvania)

Anil Chaturvedi was the first to note the separability property of Lp-norm based loss functions (such as ordinary Least Squares, Least Absolute Deviation, etc.) when estimating multilinear multivariate models. This discovery enabled dramatic improvements in the time and resources it took computers to solve complex multivariate analysis problems that could not be solved before. The algorithms he has proposed (together with Doug Carroll) can be characterized as time-and-resource-greedy (time and computer resources increase only linearly with the size of the data) algorithms for estimating many popular models in the general family of multivariate models such as Factor Analysis, Multidimensional Scaling, and Block Clustering of categorical or continuous data. The approach is described in many papers and publications listed below. Efficient mining of large data (hundreds of thousands to millions) has become a reality with his work - as evidenced by the popularity of the numerical methods that he has proposed and advanced. Some sample publications are:

The Overlapping K-centroids procedure for Market Segmentation (Journal of Marketing Research 1997), which improves K-means and K-Medians substantially by providing an ability to detect outlying observations.

Hybrid models for uncovering both continuous and discrete structure in data such as CLUSCALE (Chaturvedi and Carroll 2000, in review)) from consumers to capture product differentiation due to not only continuous perceptions (such as ease-of-use of software or efficacy of drugs), but also discrete perceptions (such as foreign vs. domestic cars).

The Hybrid Factor Analysis (HFA approach) and Discrete Factor Analysis (DFA) approaches for data reduction into not just continuous factor scores and loadings, but also discrete factor scores or loadings, in addition to being able to determine how individuals "weight" each of those factors.

Ways of simultaneously deriving latent classes, fuzzy clusters, or deterministic clusters (either overlapping or non-overlapping) from the data AND deriving factor scores/loadings from the same data"

Attached is a list of his published work:

PAPERS PUBLISHED

Chaturvedi, A. D. and Carroll, J. D. (2004): CLUSCALE ("CLUstering and multidimensional SCAL[E]ing"): A Three-Way Hybrid Model Incorporating Overlapping Clustering and Multidimensional Scaling Structure, Submitted for publication, 2nd revision

Carroll, J. D., Arabie, P. A., Chaturvedi, A. D., and Hubert, L. A. (2004): Multidimensional scaling and Clustering in Marketing: Paul Green's role," in Market Research and Modeling: Progress and Prospects: A Tribute to Paul Green, Eds., Yoram Wind and Paul Green, Kluwer Academic Publishers: Netherlands, 71-102.

Chaturvedi, A. D., Green, P. E., & Carroll, J. D. (2001). K-modes clustering Journal of Classification, 18, 35-56.

Chaturvedi, A., & Carroll, J. D. (1998). A perceptual mapping procedure for analysis of proximity data to determine common and unique product-market structures. European Journal of Operational Research, 111, 268-284.

Carroll, J. D., & Chaturvedi, A. (1998). Fitting the CANDCLUS/MUMCLUS models with partitioning and other constraints. In C. Hayashi, N. Ohsumi, K. Yajima, Y. Tanaka, H. -H. Bock & Y. Baba (Eds.), Data Science, Classification, and Related Methods (pp. 496-505). Tokyo: Springer-Verlag.

Carroll, J. D., & Chaturvedi, A. (1998). K-midranges clustering. In A. Rizzi, M. Vichi & H. H. Bock (Eds.), Advances in Data Science and Classification (pp. 3-14). Berlin: Springer

Chaturvedi, A., & Carroll, J. D. (1997). An L1-norm procedure for fitting overlapping clustering models to proximity data. In Y. Dodge (Ed.) L1-Statistical procedures and related topics (31, 443-456). Hayward, CA: IMS [Institute of Mathematical Statistics] Lecture Notes - Monograph Series [LNM].

Chaturvedi, A., Carroll, J. D., Green, P. E., & Rotondo, J. A. (1997). A feature based approach to market segmentation via overlapping K-centroids clustering. Journal of Marketing Research, XXXIV, 370-377.

Chaturvedi, A. D., and Green, P. (1995), "Software Review: SPSS for Windows CHAID 6.0, Chicago: SPSS, Inc., 1992," Journal of Marketing Research, Volume 32, pages 245-254.

Carroll, J. D., & Chaturvedi, A. (1995). A general approach to clustering and multidimensional scaling of two-way, three-way, or higher-way data. In R. D. Luce, M. D'Zmura, D. D. Hoffman, G. Iverson & A. K. Romney (Eds.), Geometric Representations of Perceptual Phenomena (295-318). Mahwah, NJ: Erlbaum.

Chaturvedi, A., & Carroll, J. D. (1994). An alternating combinatorial optimization approach to fitting the INDCLUS and generalized INDCLUS models. Journal of Classification, 11, 155-170.

De Soete, G., Carroll, J. D., & Chaturvedi, A. (1993). A modified CANDECOMP method for fitting the extended INDSCAL model. Journal of Classification, 10, 75-92.

PAPERS PRESENTED

Chaturvedi, Anil, Carroll, J. D. , et al. (2004): A Hybrid Factor/Components Analysis Model for Three-way, Three-Mode Data. Talk presented at the IFCS 2004 conference, Chicago, July 17, 2004.

Chaturvedi, Anil, Carroll, J. D. , et al. (2004): HIDCLAN: A General Approach to Hidden Cluster and Latent Class Analysis of Multivariate Data. Talk presented at the IFCS 2004 conference, Chicago, July 18, 2004.

Chaturvedi, Anil, Carroll, J. D. , et al. (2004): A Hybrid Factor/Components Analysis Model for Three-way, Three-Mode Data. Invited talk presented at the Statistics Department, George Washington University, April 30, 2004.

Chaturvedi, A. D., and Carroll, J. D. (2001), "Deriving Market Structure via Additive Decomposition of Market Shares", 2001 Berkeley Invitational Choice Symposium, Asilomar Conference center, June 1-5, Monterey, California.

Chaturvedi, A. D., Carroll, J. D., and Duffy M. (2001), "Deriving Market Structure via Additive Decomposition of Market Shares", CSNA meeting, June 14-17, St. Louis, Missouri.

Chaturvedi, A. D., & Carroll, J. D. (2000). "Three-Way HYCLUS: A Hybrid Individual Differences Model Combining INDSCAL and INDCLUS Structures" presented at Classification Society of North America (CSNA) meeting in Montreal, Canada, June 2000

Chaturvedi, A. D., Green, P. E., and Carroll, J. D. (1997), "Empirical Findings Obtained from Evaluating K-Modes and Overlapping K-centroids Clustering", Presented at the Classification Society of North America Meeting, Washington D. C.

Rotondo, J. A., and Chaturvedi, A. D. (1997), "A preferred direction cross effects choice model," presented at the INFORMS Marketing Science conference, Berkeley, California.

Chaturvedi, A. D., Green, P. E., and Carroll, J. D. (1996), "Market Segmentation via K-Modes Clustering", Presented at the American Statistical Association Meeting, Chicago.

Rotondo, J. A., and Chaturvedi, A. D. (1996), "An Ideal Point Extension of a Cross Effects Choice Model with Individual Differences," presented at the INFORMS Marketing Science conference, Gainsville, Florida.

Chaturvedi, A. D., Carroll, J. D., Green, P., and Rotondo, J. A. (1995), "Market Segmentation via Overlapping K-Centroids Clustering," presented at the A/R/T conference held at Monterey, California.

Rotondo, J. A., and Chaturvedi, A. D. (1995), "An Ideal Point Extension of a Cross Effects Choice Model," paper presented at the A/R/T conference held at Monterey, California.

Chaturvedi, A. D., Carroll, J. D., Lakshmi-Ratan, R. A., and Rotondo, J. A. (1994), "Overlapping K-Centroids approach to Market Segmentation," TIMS Marketing Science conference, Tucson, Arizona.

Rotondo, J. A., and Chaturvedi, A. D. (1994), "An Ideal Point Extension of a

Chaturvedi, A. D., Green, P. E., and Carroll, J. D. (1994), "K-Means, K-Medians, and K-Modes: Special Cases of Partitioning Multiway Data", Presented at the Classification Society of North America Meeting, Houston.

Chaturvedi, A. D., Carroll, J. D., Wadhwani, R., Arabie, P., and Lakshmi-Ratan, R. A. (1993), "Market Structure Analysis and Segmentation via INDCLUS," TIMS Marketing Science conference, St. Louis, Missouri.

Chaturvedi, A. D., Carroll, J. D., and Lakshmi-Ratan, R.A. (1993), "MADCLUS: Minimum Absolute Deviation Clustering," presented at the Annual meeting of the Classification Society of North America," Pittsburgh.

Carroll, J. D., and Chaturvedi, A. D. (1993), "CANDCLUS: A Multiway Generalization of the INDCLUS Model and SINDCLUS Method for Overlapping Clustering," Presented at the Annual Psychometric Society Meeting, University of California, Berkeley.

Chaturvedi, A. D., and Carroll, J. D. (1992), "ALCLUS: An Alternating Least Squares Procedure for Fitting the ADCLUS and INDCLUS Models," Classification Society of North America, East Lansing, Michigan.

Chaturvedi, A. D., Carroll, J. D., Rotondo, J. A., and Lakshmi-Ratan, R. A. (1992), "Product Uniqueness in Product Choice," TIMS Marketing Science Conference, London School of Business, London, England.

Rotondo, J. A., Lakshmi-Ratan, R. A., and Chaturvedi, A. D. (1992), "A Three-way Extension of a Contextual Choice Model," TIMS Marketing Science Conference, London School of Business, London, England.

Chaturvedi, A. D., and Carroll, J. D. (1991), "Product Differentiation: Perceived Product Uniqueness and its Importance to Individual Consumers/Market Segments", TIMS Marketing Science Conference, Wilmington, Delaware.

Rotondo, J. A., Lakshmi-Ratan, R. A., and Chaturvedi, A. D. (1991), "A Multiattribute Extension of a Cross Effects Choice Model," TIMS Marketing Science Conference, Wilmington, Delaware.

Rotondo, J. A., Lakshmi-Ratan, R. A., and Chaturvedi, A. D. (1991), "A Multidimensional Probabilistic Choice Model," Joint Meeting of the Psychometric and Classification Societies," Rutgers University, New Jersey.

Chaturvedi, A. D. and Jagpal, H. S. (1991), "On Treating the Haywood Problem in Structural Equation Models," Joint Meeting of the Psychometric and Classification Societies," Rutgers University, New Jersey.