- soft assignment of data to clusters
- probabilistic treatment to clustering
- each cluster represented by a Gaussian, mixed together with weighing coefficient
# 4 Operations
1. Calculate total likelihood *p(**x**)*
2. Draw sample from distribution ***x*** ~ *p(**x**)*
3. Assign a point to a cluster
4. Determine model parameters (EM algorithm)
- Mixing coefficients
- Prototypes (mean vectors)
- Covariance matrices
## Calculate Total Likelihood
Likelihood of data point belonging to specific cluster:
$
p(\textbf{x}|\mathcal{C}_k) = \mathcal{N}(\textbf{x}|\boldsymbol{\mu}_k, \boldsymbol{\Sigma}_k)
$
Total likelihood:
$
p(\textbf{x}) = \sum_{k=1}^K\pi_k\mathcal{N}(\textbf{x}|\boldsymbol{\mu}_k, \boldsymbol{\Sigma}_k)
$
## Sampling from Gaussian Mixtures
Generate a single data point
1. Pick one of the components using probability *π*<sub>k</sub>
2. Draw a sample from that component
## Assign Point to a Cluster
Labels -> latent (hidden) variables:
$
z_k \in \{0, 1\} \qquad \sum_{k=1}^K z_k = 1 \qquad \textbf{z} = \begin{bmatrix}z_1 \\ z_2 \\ \vdots \\ z_K \end{bmatrix}
$
K = number of clusters
Probability of element *z* given in terms of mixing coefficients:
$
p(z_k = 1)= \pi_k \quad \text{with} \quad 0\leq\pi_k\leq1,\quad\sum_{k=1}^K\pi_k = 1
$
Probability of vector ***z***:
$
p(\mathbf{z}) = \coprod_{k=1}^K\pi_k^{z_k} = \pi_1^0\pi_2^0\dots\pi_k^1\dots\pi_K^0 = \pi_k
$
- Only one element of ***z*** = 1
Conditional probability of ***x*** given value for latent variable ***z***:
$
p(\mathbf{x}|z_k = 1) = \mathcal{N}(\mathbf{x}|\boldsymbol{\mu}_k, \boldsymbol{\Sigma}_k)
$
Rewritten as:
$
p(\mathbf{x}|\mathbf{z} = \coprod_{k=1}^K\mathcal{N}(\mathbf{x}|\boldsymbol{\mu}_k, \boldsymbol{\Sigma})k
$
Joint distribution:
$
p(\mathbf{x}, \mathbf{z}) = p(\mathbf{z})p(\mathbf{x}|\mathbf{z})
$
Marginal distribution:
$
p(\mathbf{x}) = \sum_\mathbf{z}p(\mathbf{z})p(\mathbf{x}|\mathbf{z}) = \sum_{k=1}^K\pi_k\mathcal{N}(\mathbf{x}|\boldsymbol{\mu}_k, \boldsymbol{\Sigma}_k)
$
Conditional probability that data is in a cluster given the data point :
$
\gamma(z_k)=p(z_k= 1|\mathbf{x})=\frac{p(z_k=1)p(\mathbf{x}|z_k=1)}{\sum_{j=1}^K\pi_j\mathcal{N}(\mathbf{x}|\boldsymbol{\mu}_j, \boldsymbol{\Sigma}_j)}=\frac{p(\mathbf{z})p(\mathbf{x}|\mathbf{z})}{p(\mathbf{x})}
$
Bayes' theorem
Assign each point a probability of being in a class
Can't use log maximum likelihood
- instead use [[Expectation Maximization (EM) Algorithm|expectation maximization (EM)]] algorithm