Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
freem
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Openai/693e3ce6-229c-8008-97dc-ab720cb1f95a
(section)
Add languages
Page
Discussion
English
Read
Edit
Edit source
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
Edit source
View history
General
What links here
Related changes
Special pages
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Assistant: No — γ0\gamma0γ0 does not mean the algorithm needs infinite work for any fixed accuracy ε0\varepsilon0ε0. === No — γ=0\gamma=0γ=0 does ''not'' mean the algorithm needs infinite work for any fixed accuracy ε>0\varepsilon>0ε>0. What it means is: the particular “local work” bound you’re quoting was proved only under the strict margin assumption γ>0\gamma>0γ>0, and if you plug in γ=0\gamma=0γ=0 you are applying the theorem outside its validity range. The “+∞+\infty+∞” you get is the bound becoming vacuous, not the algorithm becoming non-terminating. Here’s exactly what is going on. ==== In our derivation, the key counting step was: ==== * if an inactive coordinate i∈I⋆i\in I^\stari∈I⋆ becomes nonzero at an iterate, * then the prox input must cross the threshold by at least tγt\gammatγ, * which leads to a bound like ∣Ak∣ ≤ ∥u(yk)−u(x⋆)∥22t2γ2 ≤ (1+tLtγ)2∥yk−x⋆∥22.|A_k| \;\le\; \frac{\|u(y_k)-u(x^\star)\|_2^2}{t^2\gamma^2} \;\le\;\left(\frac{1+tL}{t\gamma}\right)^2\|y_k-x^\star\|_2^2.∣Ak∣≤t2γ2∥u(yk)−u(x⋆)∥22≤(tγ1+tL)2∥yk−x⋆∥22. That inequality divides by γ2\gamma^2γ2. If γ=0\gamma=0γ=0, the statement “each spurious active coordinate forces a jump of at least tγt\gammatγ” becomes “forces a jump of at least 0,” which gives no information. So the proof technique cannot upper bound the number of spurious activations, and the resulting constant 1/γ21/\gamma^21/γ2 blows up. So what fails at γ=0\gamma=0γ=0 is finite identification / stability of the optimal inactive set, not convergence of APG itself. ==== γ=0\gamma=0γ=0 means there exists an inactive coordinate i∈I⋆i\in I^\stari∈I⋆ such that the KKT inequality is tight: ==== ∣∇f(x⋆)i∣=λi.|\nabla f(x^\star)_i| = \lambda_i.∣∇f(x⋆)i∣=λi. Equivalently, at the optimum the prox input is exactly at the soft-threshold boundary: ∣u(x⋆)i∣=tλi,u(x):=x−t∇f(x).|u(x^\star)_i| = t\lambda_i,\quad u(x):=x-t\nabla f(x).∣u(x⋆)i∣=tλi,u(x):=x−t∇f(x). When you are exactly on that boundary, then arbitrarily small perturbations of yyy around x⋆x^\starx⋆ can flip that coordinate between * returning 0 (inactive), and * returning a tiny nonzero (active), because soft-thresholding switches at equality. Accelerated methods generate extrapolated points yky_kyk that can oscillate around x⋆x^\starx⋆, so there is no reason (in worst case) that such “tight” coordinates must become permanently zero after a finite time. In other words: * γ>0\gamma>0γ>0 ⇒ there is a buffer zone in which inactive coordinates are guaranteed to stay inactive, hence finite identification, hence the spurious-work term is a constant (independent of ε\varepsilonε). * γ=0\gamma=0γ=0 ⇒ there is no buffer zone, so spurious coordinates can keep flickering for as long as you run; the spurious work is no longer a constant independent of ε\varepsilonε. That’s exactly why the margin matters. ==== For any fixed ε>0\varepsilon>0ε>0: no. ==== Accelerated proximal gradient still has the iteration complexity Nε=O(1/αlog(1/ε))N_\varepsilon = O(\sqrt{1/\alpha}\log(1/\varepsilon))Nε=O(1/αlog(1/ε)) for strongly convex composite objectives, regardless of γ\gammaγ. This is the standard statement you quoted in the open-problem note. open-problem-fountoulakis22a What you lose at γ=0\gamma=0γ=0 is a guarantee that each iteration can be implemented with local cost O~(1/ρ)\tilde O(1/\rho)O~(1/ρ) (or that the cumulative support outside S⋆S^\starS⋆ is bounded by a constant). In the worst case you may have to fall back to the pessimistic bound you already mention in your note: Work(Nε) ≤ O (n1αlog1ε),\text{Work}(N_\varepsilon)\;\le\; O\!\left(n\sqrt{\frac{1}{\alpha}}\log\frac{1}{\varepsilon}\right),Work(Nε)≤O(nα1logε1), because without margin you cannot prove the iterate supports stay small. open-problem-fountoulakis22a So: finite work, but potentially non-local. ==== Two different “limits” get conflated: ==== ===== - You run Nε<∞N_\varepsilon<\inftyNε<∞ iterations. ===== * Work is at most nNε<∞nN_\varepsilon<\inftynNε<∞. * Nothing is infinite. ===== - Nε→∞N_\varepsilon\to\inftyNε→∞ for any method. ===== * If γ>0\gamma>0γ>0, we proved that the spurious support contribution stays bounded even as ε→0\varepsilon\to 0ε→0, because identification happens in finite time. * If γ=0\gamma=0γ=0, you can have no finite identification time, so spurious activations can persist for arbitrarily many iterations; the spurious contribution can grow with NεN_\varepsilonNε. In that sense, a bound that tries to be uniform “independent of ε\varepsilonε” will necessarily break. That is exactly what the 1γ2α\frac{1}{\gamma^2\sqrt{\alpha}}γ2α1 term is signaling. ==== Not a missing step — it’s a missing assumption. The bound you quoted is a conditional theorem: ==== * It is correct under Assumption γ>0\gamma>0γ>0. * At γ=0\gamma=0γ=0, that assumption is false, and the bound is not meant to apply. More strongly: the blow-up is not an artifact; it reflects a real phenomenon — at γ=0\gamma=0γ=0 you generally cannot guarantee finite-time identification or an ε\varepsilonε-independent bound on cumulative spurious support for inertial/accelerated proximal schemes. If you want, I can also show (still without LaTeX) how to state an alternative runtime bound valid even when γ=0\gamma=0γ=0: it will necessarily depend on nnn (or on how many “tight” inactive coordinates there are), matching the pessimistic O(n1/αlog(1/ε))O(n\sqrt{1/\alpha}\log(1/\varepsilon))O(n1/αlog(1/ε)) running time in your open-problem statement. open-problem-fountoulakis22a
Summary:
Please note that all contributions to freem are considered to be released under the Creative Commons Attribution-ShareAlike 4.0 (see
Freem:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)