Low-rank Adaptation
rank - number of linearly-independent columns the matrix contains
> … there exists a low dimension reparameterization \[of LLMs] that is as effective for fine-tuning as the full parameter space
> <cite>– Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning</cite>
## In a nutshell
- Decompose weight matrix $W$ into two matrices $A$ and $B$
- Removes linearly-dependent columns, saving computation
- Tune $A$ and $B$
- Add $AB$ to $W$
## Notes
- used for [[fine-tuning]]
- can use prompt engineering, but this has limits
- reduces number of trainable parameters
- Don’t need to optimize full-rank matrix
- Only need low-rank decomposition
- represent weight matrix as multiplication of two matrices $A$, $B$
- choose hyperparameter $r$
- we don’t know the rank of the weight matrices
- if $r$ too small
- can remove linearly *independent* columns (bad!)
- if $r$ too big
- keep too many linearly *dependent* columns (inefficient)
- Initialize $A$ from Gaussian distribution, $B$ as zeroes
- let backprop figure out what $A$ and $B$ should be
-