Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
freem
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Openai/693c0f4f-255c-8008-92e9-0cd44c6d6226
(section)
Add languages
Page
Discussion
English
Read
Edit
Edit source
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
Edit source
View history
General
What links here
Related changes
Special pages
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==== This is the part where I’ll be a bit critical, since you asked me to push on weaknesses. ==== ===== In the compositional paper, they prove that: ===== * If f^∈Fint\hat f \in F_{\text{int}}f^∈Fint fits the data, then it essentially must be the true generator (up to slotwise transforms). * This is a strong structural statement about which decoders are allowed, via Jacobian/Hessian constraints. 2512.08854v1 DnD doesn’t have anything analogous: * The hyper-convolutional decoder is chosen mostly for efficiency, not because it’s the unique class of functions that can map prompts to LoRAs “correctly.” 2506.16406v1 * For a given dataset, there may be many LoRAs that give similar performance; and for a given set of prompt batches, many mappings from text embeddings to weights will fit the training checkpoints. So the “generator over weights” in DnD is more like a strong practical prior than a theoretically justified FintF_{\text{int}}Fint-style class. If you were hoping to cleanly import the compositional identifiability result into DnD, that doesn’t work yet: they haven’t characterized the space of good hyper-networks the same way. ===== The compositional paper is about per-example latent variables z(x)z(x)z(x), with slots that can be recombined across images. ===== DnD: * Condition = a batch of prompts from the dataset, not individual examples. * It generates one LoRA per dataset/task, not a per-input latent code. So the “slots” here are really “tasks across datasets”, not “factors inside one datapoint”. That’s a much coarser granularity. You could imagine a more compositional version of DnD, where the parameter generator has explicit factors like: * domain slot, * reasoning-style slot, * modality slot, * difficulty slot, and you combine them to synthesize adapters for totally new mixtures. The current DnD architecture doesn’t enforce or expose such a factorization; it just hopes the text encoder + hyper-conv decoder learn something reasonable. ===== Exactly like we discussed for your pseudo-label idea: ===== * Given finitely many training datasets and checkpoints, a high-capacity hyper-network can, in principle, memorize mapping from condition embeddings to weights. * Even with MSE on tokenized weights, you can approximate the mapping arbitrarily well on those training pairs, while behaving unpredictably on new datasets. The paper shows empirically that DnD does generalize to new benchmarks (e.g., common sense → science, text → multimodal) and often beats the average of training LoRAs. 2506.16406v1 But unlike the generator-identifiability result, there’s nothing that rules out “memorize and interpolate poorly” solutions theoretically.
Summary:
Please note that all contributions to freem are considered to be released under the Creative Commons Attribution-ShareAlike 4.0 (see
Freem:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)