Clustering repeated ordinal data: Model based approaches using finite mixtures
Model based approaches to cluster continuous and cross-sectional data are abundant and well established. In contrast to that, equivalent approaches for repeated ordinal data are less common and an active area of research. In this dissertation, we propose several models to cluster repeated ordinal data using finite mixtures. In doing so, we explore several ways of incorporating the correlation due to the repeated measurements while taking into account the ordinal nature of the data. In particular, we extend the Proportional Odds model to incorporate latent random effects and latent transitional terms. These two ways of incorporating the correlation are also known as parameter and data dependent models in the time-series literature. In contrast to most of the existing literature, our aim is classification and not parameter estimation. This is, to provide flexible and parsimonious ways to estimate latent populations and classification probabilities for repeated ordinal data. We estimate the models using Frequentist (Expectation-Maximization algorithm) and Bayesian (Markov Chain Monte Carlo) inference methods and compare advantages and disadvantages of both approaches with simulated and real datasets. In order to compare models, we use several information criteria: AIC, BIC, DIC and WAIC, as well as a Bayesian Non-Parametric approach (Dirichlet Process Mixtures). With regards to the applications, we illustrate the models using self-reported health status in Australia (poor to excellent), life satisfaction in New Zealand (completely agree to completely disagree) and agreement with a reference genome of infant gut bacteria (equal, segregating and variant) from baby stool samples.