Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
freem
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Openai/693c0f4f-255c-8008-92e9-0cd44c6d6226
(section)
Add languages
Page
Discussion
English
Read
Edit
Edit source
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
Edit source
View history
General
What links here
Related changes
Special pages
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Assistant: Nice, now we’re in the fun “mash up two papers and see what falls out” zone 😄 === Nice, now we’re in the fun “mash up two papers and see what falls out” zone 😄 Let me restate your idea in my words first: : Short version: * You can build a LoRA generator conditioned on Delta Activations, * but by itself it doesn’t solve “no-training adaptation for new tasks”. * It’s strongest as a way to organize, compose, or refine existing LoRAs, not as a replacement for dataset-based conditioning. Let’s break that down a bit. ==== In the Delta Activations paper: ==== * For each finetuned model (or LoRA), you compute a single vector δ∈Rd\delta \in \mathbb{R}^dδ∈Rd (say d ≈ few thousand) summarizing how its internal activations differ from the base model on a fixed small probe set. * This δ\deltaδ behaves like a “semantic embedding of the finetune”: - similar tasks → similar deltas, - you can add / interpolate them for rough composition, - you can cluster models or select good LoRAs based on this space. Crucial: δ\deltaδ is not an adapter; it does not change the base model. It’s more like a descriptor of the adapter you already trained. ==== Imagine we train a network GGG such that: ==== G(δ)≈LoRA weightsG(\delta) \approx \text{LoRA weights}G(δ)≈LoRA weights using many (δ, LoRA) pairs from your LoRA zoo. Then: * For an existing LoRA, you can encode it → δ → decode back LoRA (autoencoder in behavior space). * You can do ops like δcombo=δA+δB\delta_{\text{combo}} = \delta_A + \delta_Bδcombo=δA+δB then feed G(δcombo)G(\delta_{\text{combo}})G(δcombo) to get a new, “blended” LoRA. That’s actually a cool idea: * Delta Activations give you a nice, low-dimensional latent space over model behaviors. * A generator GGG makes that latent space “actionable” by turning latents back into weights. But notice: this is fundamentally about recombining / editing behaviors of LoRAs you already have. If you don’t use datasets at all, then: * You can sample random δ and generate random LoRAs… * …but you don’t know what these LoRAs are good for. You’ve lost the link to any task semantics. So with only δ as condition, you can: * Explore a smooth manifold of “things like the LoRAs I trained before” * Compose known domains (e.g., mix math & code LoRAs via δ-math + δ-code) * Possibly regularize / denoise noisy LoRAs via autoencoding …but you cannot say “give me a LoRA for this new dataset / capability”, because δ doesn’t know what that is unless you somehow produced δ from that task. ==== This is the key point where it becomes weaker than DnD-style conditioning. ==== * DnD: - Condition = a batch of prompts from the new dataset. - No training required at adaptation time: you just feed prompts in and get a LoRA out. * “Generate LoRA from δ”: - Condition = Delta Activation vector describing “what the model should do”. - To get δ for a new task, you need something that already behaves like that task: - a finetuned LoRA, - a few gradient steps, - or some other expensive process. So unless you have a magical way to compute δ directly from raw data (without training), you’re back to: : In that sense, yes, for “no-training” adaptation it’s strictly worse than DnD: * DnD: dataset → LoRA (zero-shot) * δ-conditioning: requires a model that already approximates the task to define δ. ==== Even though it’s not a drop-in replacement for dataset-conditioning, there are interesting uses: ==== ===== - Train an encoder E: LoRA → δ (basically Delta Activations). ===== * Train a decoder G: δ → LoRA. * Now: - Edit δ (add, interpolate, project, do PCA or whatever), - Decode back to LoRA. This gives you: * A smooth latent editing space for LoRAs, like a StyleGAN latent space but for adapters. * Strong prior: G only outputs LoRAs similar to those seen during training, reducing weird failure LoRAs. Weakness: * It’s still limited to the convex-ish hull of behaviors in your LoRA zoo. * Doesn’t help you create fundamentally new capabilities you’ve never finetuned for. ===== If δ behaves compositional-ish (e.g., “math + code ≈ math + code in delta space”), you can: ===== * Use δ-space as the slot space for task factors. * Then G(δ) is the generator that instantiates those factors as a concrete LoRA. This connects nicely to the compositional paper you read earlier: * δ plays the role of “slots” (task factors), * G is the generator f, * LoRA weights are “images” in that manifold. You can then play generative games like: * δ_new = δ_A (domain) + δ_B (behavior style) * LoRA_new = G(δ_new) Again the limitation: you’re composing known high-level behaviors; you still don’t get a new capability without some supervision somewhere. ===== A more powerful approach would be: ===== * Condition G on both δ and dataset / prompts, e.g. condition=(δprior,prompt stats)\text{condition} = (\delta_{\text{prior}}, \text{prompt stats})condition=(δprior,prompt stats) * δ_prior: your best guess of “what kind of task this is like” (e.g. from retrieval in Δ-space). * Prompt stats: information from the new dataset, like DnD. Then you get: * Prior over LoRA space from δ (reuse of existing skills), * Fine-grained, task-specific signal from prompts. This could make adaptation faster and more stable than using prompts alone. ==== : ==== * If you literally mean only δ, no dataset at inference: - You can generate some LoRA, but you don’t know what it solves. - It’s good for exploring / composing existing LoRAs, not for new tasks. - It does not give the “no-training adaptation to arbitrary new datasets” that DnD aims for. * If you’re willing to: - either (a) use δ for existing LoRAs, and - (b) maybe combine δ with dataset prompts for new tasks, then it becomes a cool meta-learning & composition tool: - δ as a low-dim task/behavior code, - a generator G(δ) as a LoRA decoder, - possible compositional tricks (δ_add, δ_mix) in that space. So conceptually it’s promising, but its main weakness is: δ depends on having some model that already does the task, so by itself it can’t replace dataset-conditioned LoRA generation if your goal is “not training but generating LoRA.” It’s a better fit for “not training a new LoRA but re-mixing/compressing existing LoRAs.” If you want, next step we could sketch a toy experimental design for: * build δ-space over a bunch of LoRAs, * train G(δ) → LoRA, * test: does δ_math + δ_code → LoRA that actually improves on math+code benchmarks?
Summary:
Please note that all contributions to freem are considered to be released under the Creative Commons Attribution-ShareAlike 4.0 (see
Freem:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)