- Exemplar - Old Faithful eruptions - correlation -> duration of eruption and time between eruptions - Goal -> represent data in K clusters - each cluster has prototype $\mu_k$ -> representative data point - [[Expectation Maximization (EM) Algorithm]]: - Initialize prototypes (guess) - E-step -> assign each data point to a cluster based on a metric between data point and prototype (Euclidean distance) - M-step -> update prototype as cluster mean, based on E-step - Stopping criteria (number of steps, reassignment rate below threshold, cost function, etc.) - Normalize data ## Math - define *responsibilities*: $r_{nk} \in {0, 1}$ so that $\sum_k r_{nk} = 1$ - Example: 5 data points and 3 clusters $\textbf{\textit{R}} = (r_{nk}) = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{bmatrix}$ - column -> cluster - row -> data point - define cost function: $J = \sum_{n=1}^N\sum_{k=1}^K r_{nk}\lvert\lvert\textbf{x}_n-\boldsymbol{\mu}_k\rvert\rvert^2$ $r_{nk}$ - responsibilities $\mathbf{x}_n$ - data $\mu_k$ - prototypes ### E-Step - minimize $J$ wrt $r_{nk}$ - assigns each data point to nearest prototype ### M-Step - minimize $J$ wrt $\mathbf{\mu}_k$ $\mathbf{\mu}_k = \frac{\sum_n r_{nk}\mathbf{x}_n}{\sum_n r_{nk}}$ - each prototype set to mean of data points in that cluster - convergence guaranteed -> finite number of combinations of responsibilities ## Limitations - forces hard assignment of data points to clusters - use "soft" probabilistic assignments instead - represented with [[Gaussian Mixture Models|Gaussian mixture model]] - unclear to choose K