Editing Openai/693c0f4f-255c-8008-92e9-0cd44c6d6226 (section)

===== More radical version: =====
# Collect many tasks, each with: - LoRA weights ΔW\Delta WΔW, - A Δ-embedding vfv_fvf of that LoRA.
# Learn a generator GGG such that G(vf)≈ΔW.G(v_f) \approx \Delta W.G(vf)≈ΔW.
# For a new task, only train a low-dimensional code vvv via gradients (hundreds or thousands of params), and get its LoRA as G(v)G(v)G(v).

This is basically:
* Factorizing LoRA space into: - a big shared generator GGG (frozen at test time), - a small task code vvv that you optimize.

Compared to DnD:
* DnD: learns a direct mapping from prompts → LoRA, then no training at adaptation.
* “Train δ instead”: learns mapping code → LoRA, then you still train δ per task.

So yes, if your goal is “no training for new tasks”, this is worse than generating LoRAs à la DnD: you added optimization back into the loop.

But as a meta-learning scheme it’s actually interesting:
* You only optimize ~4k parameters instead of millions (LoRA).
* The generator encodes a strong prior over “reasonable” LoRAs (learned from many tasks).
* It could be more sample-efficient than training a fresh LoRA from scratch for each new task.

The weak points:
* The Δ embedding is very lossy by design; many different LoRAs map to similar Δ. The inverse map GGG is underdetermined and might collapse diversity.
* Delta Activations were engineered to cluster domain-level behaviors, not to preserve all fine-grained information needed for optimal performance. Using them as the only latent for reconstruction is risky.
* You’d have to show empirically that optimizing δ + using GGG can reach the same task performance as training the LoRA directly. My guess: you’d often lose some peak performance, especially for hard, narrow tasks.