Model Merging
Task-Speciic Fine-tuned Models, but how do you combine them into a single multi-task model
TIES-Merging
Problem: Existing Merging Methods Causes a Drop in Performance due to the interference of parameters
Causes:
- During Finetuning,Many Model Parameters can change over the course of finetuning, but most of them have only have a small impact in the performance. But when a model's influential parameter is merged with another's non-influential parameter then we get into a problem.
- Some parameters might have positive values for some models and a negative value for others. In this case, simple averaging could hurt our chances.
Algorithm -
Given
- Fine Tuned Model Parameters
- Base Model Parameters
- A threshold Quantile - K
- Scaling Parameter -
For each of the task fine-tuned models,
- Create Task Vectors - Basically the difference between the finetuned model parameters and the base model parameters
- Trim the Redundant Parameters - Just keep the top K parameters and reset the rest to zero.
- Create an array to store the signs of the task vectors
- Creat en array to store the magnitudes
- Then For each task vector, elect an sign ( Choose the sign with bigger magnitude), then do a disjoint merge with the elected sign for each tensor.
- Then add this tensor to the initialization vector.
The Trimming removes the redundant parameters, then we make sure to just merge those which are in the same direction as the influential parameters, thus not causing intereference due to opposing signs as well.