Editing Openai/693c0f4f-255c-8008-92e9-0cd44c6d6226 (section)

==== So, is training δ “worse” than generating LoRA? ====

If you mean strictly in the DnD sense (“I want zero-shot prompt-to-weights”):
* Yes, training δ is worse on the axis they care about (no optimization at adaptation time).
* δ is a summary of behavior, not an adapter. You still need a path from δ → weights or δ → retrieval.

If you mean as a general design choice for adaptation/meta-learning:
* It’s not obviously worse; it’s just another trade-off: - Pros: lower-dimensional, better for clustering/retrieval, maybe more stable and regularized. - Cons: doesn’t directly control the model, lossy, needs extra machinery (retrieval or generator) to actually adapt the LLM.

A realistic “best of both worlds” design might be:
# Use Delta Activations to build a model/task embedding space and organize your LoRA zoo. 2509.04442v1
# On top of that, learn a small hypernetwork HHH that maps from Δ (plus maybe statistics of the new dataset’s prompts) to a correction LoRA (or to mixing coefficients over existing LoRAs).
# Optionally allow a low-dim δ code to be optimized per task, but keep most parameters in HHH shared.

That would sit in between:
* pure retrieval in Δ-space, and
* full DnD-style LoRA generation.

If you want, we can sketch a concrete toy experiment design to test “train δ instead of LoRA” and see where it breaks (and where it might actually help).