LoRA - Davey's Digital Den

Low-rank Adaptation rank - number of linearly-independent columns the matrix contains > … there exists a low dimension reparameterization \[of LLMs] that is as effective for fine-tuning as the full parameter space > <cite>– Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning</cite> ## In a nutshell - Decompose weight matrix $W$ into two matrices $A$ and $B$ - Removes linearly-dependent columns, saving computation - Tune $A$ and $B$ - Add $AB$ to $W$ ## Notes - used for [[fine-tuning]] - can use prompt engineering, but this has limits - reduces number of trainable parameters - Don’t need to optimize full-rank matrix - Only need low-rank decomposition - represent weight matrix as multiplication of two matrices $A$, $B$ - choose hyperparameter $r$ - we don’t know the rank of the weight matrices - if $r$ too small - can remove linearly *independent* columns (bad!) - if $r$ too big - keep too many linearly *dependent* columns (inefficient) - Initialize $A$ from Gaussian distribution, $B$ as zeroes - let backprop figure out what $A$ and $B$ should be -