Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
freem
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Openai/6922876a-7988-8007-9c62-5f71772af6aa
(section)
Add languages
Page
Discussion
English
Read
Edit
Edit source
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
Edit source
View history
General
What links here
Related changes
Special pages
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==== ### ==== * What’s written. In Lemma \ref{lem:dyn-W2} you set Xt=(1−t)X+tT(X)X_t=(1-t)X+tT(X)Xt=(1−t)X+tT(X) with X∼PX\sim PX∼P, define Pt=Law(Xt)P_t=\mathrm{Law}(X_t)Pt=Law(Xt), and then write ddt\EPt[φ]=\EPt[φ′(Xt) vt(Xt)],vt(x)=T(x)−x.\frac{d}{dt}\E_{P_t}[\varphi]=\E_{P_t}\big[\varphi'(X_t)\,v_t(X_t)\big],\quad v_t(x)=T(x)-x .dtd\EPt[φ]=\EPt[φ′(Xt)vt(Xt)],vt(x)=T(x)−x. The last identity is not correct: T(x)−xT(x)-xT(x)−x is defined on the Lagrangian coordinate x∈supp(P)x\in\mathrm{supp}(P)x∈supp(P), not on the current Eulerian position x∈supp(Pt)x\in\mathrm{supp}(P_t)x∈supp(Pt). In particular vt(Xt)v_t(X_t)vt(Xt) is generally not equal to T(X)−XT(X)-XT(X)−X. * What is correct. If we define Tt(x)=(1−t)x+tT(x)T_t(x)=(1-t)x+tT(x)Tt(x)=(1−t)x+tT(x) and Xt=Tt(X)X_t=T_t(X)Xt=Tt(X), then ddt\E[φ(Xt)] = \E [φ′(Xt)⋅(T(X)−X)],\frac{d}{dt}\E[\varphi(X_t)] \;=\; \E\!\left[\varphi'(X_t)\cdot (T(X)-X)\right],dtd\E[φ(Xt)]=\E[φ′(Xt)⋅(T(X)−X)], where the expectation is w.r.t. the original X∼PX\sim PX∼P. Equivalently, in Eulerian form, the velocity field vtv_tvt must satisfy vt(Tt(x))=T(x)−xv_t(T_t(x))=T(x)-xvt(Tt(x))=T(x)−x; that is, vt(z)=\E[T(X)−X∣Xt=z]v_t(z)=\E[T(X)-X \mid X_t=z]vt(z)=\E[T(X)−X∣Xt=z], not T(z)−zT(z)-zT(z)−z. * Consequence. The displayed Cauchy–Schwarz step goes through once you rewrite the derivative correctly. The correct bound is ∣\EQ[φ]−\EP[φ]∣≤(∫01\EP[φ′(Xt)2] dt)1/2W2(P,Q),\big|\E_Q[\varphi]-\E_P[\varphi]\big| \le \Big(\int_0^1 \E_P[\varphi'(X_t)^2]\,dt\Big)^{1/2} W_2(P,Q),\EQ[φ]−\EP[φ]≤(∫01\EP[φ′(Xt)2]dt)1/2W2(P,Q), which coincides with your statement after noting \EP[g(Xt)]=\EPt[g]\E_P[g(X_t)]=\E_{P_t}[g]\EP[g(Xt)]=\EPt[g]. * Drop‑in fix: see §III.A. ===== There are several separate problems: ===== * (a) Misuse of Lemma \ref{lem:dyn-W2} with φ=ψτ′\varphi=\psi_\tau'φ=ψτ′ and φ=(ψτ)2\varphi=(\psi_\tau)^2φ=(ψτ)2. The lemma requires ∫01\EPt[φ′(Xt)2] dt<∞\int_0^1\E_{P_t}[\varphi'(X_t)^2]\,dt<\infty∫01\EPt[φ′(Xt)2]dt<∞. For φ=ψτ′\varphi=\psi_\tau'φ=ψτ′ this means ψτ′′∈L2(Pt)\psi_\tau''\in L^2(P_t)ψτ′′∈L2(Pt). In your construction f(⋅;θ,τ)f(\cdot;\theta,\tau)f(⋅;θ,τ) is C1C^1C1 but not C2C^2C2 across the junctions. Indeed, with the chosen Gaussian tails, ψτ(u)=−∂ulogf(u)={0,∣u∣≤12−τ,12τK(0)⋅∣u∣−(12−τ)2τK(0),∣u∣>12−τ,\psi_\tau(u)=-\partial_u\log f(u) = \begin{cases} 0,& |u|\le \tfrac12-\tau,\\ \tfrac{1}{2\tau K(0)}\cdot \frac{|u|-(\tfrac12-\tau)}{2\tau K(0)},& |u|>\tfrac12-\tau, \end{cases}ψτ(u)=−∂ulogf(u)={0,2τK(0)1⋅2τK(0)∣u∣−(21−τ),∣u∣≤21−τ,∣u∣>21−τ, so ψτ\psi_\tauψτ is continuous, ψτ′\psi_\tau'ψτ′ is piecewise constant (0 on the flat top; constant >0>0>0 on the tails), hence ψτ′′\psi_\tau''ψτ′′ is a sum of Dirac masses at the junctions and is not a bounded function. Thus φ′=ψτ′′\varphi'=\psi_\tau''φ′=ψτ′′ is not square‑integrable and the lemma cannot be applied with φ=ψτ′\varphi=\psi_\tau'φ=ψτ′. * (b) “Edge strips” of width τ\tauτ. The proof asserts that the derivatives “are nonzero only on the two edge strips” and that “the νθ,τ\nu_{\theta,\tau}νθ,τ-mass of each edge region equals τ\tauτ.” This is incorrect for the derivatives: while the density modification from the uniform happens on a set of ν\nuν-mass 2τ2\tau2τ, the score derivatives ψτ′,(ψτ2)′\psi_\tau', (\psi_\tau^2)'ψτ′,(ψτ2)′ are nonzero on the entire tails ∣u∣>12−τ|u|>\tfrac12-\tau∣u∣>21−τ, not just on a bounded strip: ψτ′\psi_\tau'ψτ′ is constant (positive) on the tails; (ψτ2)′(\psi_\tau^2)'(ψτ2)′ grows linearly in ∣u∣|u|∣u∣. Consequently the integrals \EPt[ψτ′(Xt)2]\E_{P_t}[\psi_\tau'(X_t)^2]\EPt[ψτ′(Xt)2] and \EPt[((ψτ2)′(Xt))2]\E_{P_t}[((\psi_\tau^2)'(X_t))^2]\EPt[((ψτ2)′(Xt))2] cannot be bounded by “edge mass ×\times× L∞L^\inftyL∞” with the claimed τ\tauτ-scalings. * (c) The scalings in (3.3)–(3.5) are wrong. In fact, on the tails ψτ′(u)≡14τ2K(0)2(a positive constant),\psi_\tau'(u)\equiv \frac{1}{4\tau^2K(0)^2}\quad\text{(a positive constant),}ψτ′(u)≡4τ2K(0)21(a positive constant), so even under PtP_tPt with Pr(∣Xt−θ∣>12−τ)=1\Pr(|X_t-\theta|>\tfrac12-\tau)=1Pr(∣Xt−θ∣>21−τ)=1 one has \EPt[ψτ′(Xt−θ)2]=Θ(τ−4)\E_{P_t}[\psi_\tau'(X_t-\theta)^2]=\Theta(\tau^{-4})\EPt[ψτ′(Xt−θ)2]=Θ(τ−4), not O(τ−1)O(\tau^{-1})O(τ−1). Similarly, (ψτ2)′=2ψτψτ′(\psi_\tau^2)'=2\psi_\tau\psi_\tau'(ψτ2)′=2ψτψτ′ grows linearly in ∣u∣|u|∣u∣ on the tails, so its L2L^2L2 under a general PtP_tPt depends on higher moments that are not controlled by a W2W_2W2 constraint. * (d) Dependence on 4th moments of QQQ. Later you use \EQ[ψ4]≲τ⋆−2\E_Q[\psi^4]\lesssim \tau_\star^{-2}\EQ[ψ4]≲τ⋆−2 and ∥ψ∥∞≲τ⋆−1/2\|\psi\|_\infty\lesssim \tau_\star^{-1/2}∥ψ∥∞≲τ⋆−1/2. Neither is true for the current ψ\psiψ. For the Gaussian tails we have ∣ψ(u)∣≍∣u∣/(τ2)|\psi(u)|\asymp |u|/(\tau^2)∣ψ(u)∣≍∣u∣/(τ2) for large ∣u∣|u|∣u∣; thus ψ\psiψ is unbounded, and \EQ[ψ4]\E_Q[\psi^4]\EQ[ψ4] can be arbitrarily large—even with W2(Q,μθ)≤εW_2(Q,\mu_\theta)\le\varepsilonW2(Q,μθ)≤ε—by sending a tiny mass δ\deltaδ far out (cost ∼δR2\sim \delta R^2∼δR2 in W22W_2^2W22), while \EQ[ψ4]∼δR4/τ8→∞\E_Q[\psi^4]\sim \delta R^4/\tau^8\to\infty\EQ[ψ4]∼δR4/τ8→∞ as R→∞R\to\inftyR→∞. So the uniform bound in \eqref{eq:risk-decomp} that relies on \EQ[ψ4]\E_Q[\psi^4]\EQ[ψ4] is not valid over the whole W2W_2W2-ball. * Bottom line. Lemma \ref{lem:score-moments} as stated is false. Its proof contains (i) an inapplicable use of Lemma \ref{lem:dyn-W2}, (ii) incorrect support/scaling claims, and (iii) reliance on moment bounds that fail uniformly over the W2W_2W2-ball. * Two viable fixes (choose one and adjust the rest accordingly): Fix A (localize / clip the score). Replace ψτ\psi_\tauψτ by a clipped score ψ~τ,B\tilde\psi_{\tau,B}ψ~τ,B that coincides with ψτ\psi_\tauψτ on the region where ∣u∣≤12−τ+c τ|u|\le \tfrac12-\tau + c\,\tau∣u∣≤21−τ+cτ and is truncated outside so that ψ~τ,B\tilde\psi_{\tau,B}ψ~τ,B and ψ~τ,B′\tilde\psi_{\tau,B}'ψ~τ,B′ are bounded and Lipschitz uniformly in uuu. Then Lemma \ref{lem:dyn-W2} applies with ∫01\EPt[ψ~τ,B′(Xt)2] dt≲τ−1,\int_0^1\E_{P_t}[\tilde\psi_{\tau,B}'(X_t)^2]\,dt \lesssim \tau^{-1},∫01\EPt[ψ~τ,B′(Xt)2]dt≲τ−1, and one can prove ∣\EQ[ψ~τ,B]−\Eνθ,τ[ψ~τ,B]∣≲ετ,∣\EQ[ψ~τ,B′]−\Eνθ,τ[ψ~τ,B′]∣≲ετ,\big|\E_Q[\tilde\psi_{\tau,B}]-\E_{\nu_{\theta,\tau}}[\tilde\psi_{\tau,B}]\big| \lesssim \frac{\varepsilon}{\sqrt{\tau}},\quad \big|\E_Q[\tilde\psi_{\tau,B}']-\E_{\nu_{\theta,\tau}}[\tilde\psi_{\tau,B}']\big| \lesssim \frac{\varepsilon}{\sqrt{\tau}},\EQ[ψ~τ,B]−\Eνθ,τ[ψ~τ,B]≲τε,\EQ[ψ~τ,B′]−\Eνθ,τ[ψ~τ,B′]≲τε, uniformly over Q:W2(Q,μθ)≤εQ:W_2(Q,\mu_\theta)\le\varepsilonQ:W2(Q,μθ)≤ε. The clipped estimator coincides with the unclipped one whenever no observation falls in the extreme tails; for the W2W_2W2-ball, the clipping error can be made negligible at the τ\tauτ-scales of interest, and all fourth‑moment issues vanish. Fix B (smooth everywhere, not only the edges). Replace the “flat‑top + Gaussian tails” fff by a globally smoothed log‑concave kernel (e.g. convolve μθ\mu_\thetaμθ with a C∞C^\inftyC∞ compactly supported bump of width ≍τ\asymp\tau≍τ); then ψτ∈C∞\psi_\tau\in C^\inftyψτ∈C∞, ψτ′,ψτ′′\psi_\tau',\psi_\tau''ψτ′,ψτ′′ are bounded and supported in a strip of width O(τ)O(\tau)O(τ), and Lemma \ref{lem:dyn-W2} yields exactly your τ\tauτ-scalings. This also resolves uniqueness (see 3) below). I include drop‑in text for Fix A (clipping) in §III.B and for Fix B (global smoothing) in §III.C. Either path repairs the entire chain of arguments and preserves the claimed ε2/3/n\varepsilon^{2/3}/nε2/3/n rate and constants‑up‑to‑≍\asymp≍. ===== - What’s written. “It follows from our analysis that this solution is unique.” ===== * What happens. With your fff, logf(⋅;θ,τ)\log f(\cdot;\theta,\tau)logf(⋅;θ,τ) is flat (constant) on ∣x−θ∣≤12−τ|x-\theta|\le\tfrac12-\tau∣x−θ∣≤21−τ. For samples X1′,…,Xn′X_1',\dots,X_n'X1′,…,Xn′ with maxiXi′−miniXi′≤1−2τ\max_i X_i'-\min_i X_i'\le 1-2\taumaxiXi′−miniXi′≤1−2τ, the set I=⋂i=1n[ Xi′−(12−τ), Xi′+(12−τ)]\mathcal{I}=\bigcap_{i=1}^n\big[\,X_i'-(\tfrac12-\tau),\, X_i'+(\tfrac12-\tau)\big]I=i=1⋂n[Xi′−(21−τ),Xi′+(21−τ)] is a nonempty interval, and for all t∈It\in\mathcal{I}t∈I one has ψτ(Xi′−t)=0\psi_\tau(X_i'-t)=0ψτ(Xi′−t)=0, hence Gn(t)=0G_n(t)=0Gn(t)=0. Thus every t∈It\in\mathcal{I}t∈I solves \eqref{eq:M-est}; the solution need not be unique. This event has positive probability (1−2τ)n(1-2\tau)^n(1−2τ)n under Q=μθQ=\mu_\thetaQ=μθ. * Fix. Either: (i) tie‑break the zero set by a deterministic rule (e.g. choose the midpoint of the interval of roots, or the maximizer of the concave log‑likelihood, or the smallest root), and state uniqueness is not guaranteed; or (ii) adopt Fix B above, which makes logf\log flogf strictly concave in ttt and restores uniqueness. See §III.D for a drop‑in sentence/footnote. ===== - As noted, ψτ′\psi_\tau'ψτ′ has a jump at the two junctions; ψτ′′\psi_\tau''ψτ′′ is not a bounded function, so the step ∣mn(tˉ)−mn(θ)∣≤∥ψ′′∥∞ ∣tˉ−θ∣|m_n(\bar t)-m_n(\theta)|\le \|\psi''\|_\infty\,|\bar t-\theta|∣mn(tˉ)−mn(θ)∣≤∥ψ′′∥∞∣tˉ−θ∣ is not justified. With either Fix A (clipped score is globally Lipschitz) or Fix B (globally smoothed fff so ψ′′\psi''ψ′′ bounded), the bound becomes correct and the subsequent “absorption” step is valid as written. See §III.B/§III.C. ===== ===== - You write (just after \eqref{eq:risk-decomp}): \EQ[ψ]2m(Q)2 ≤ C ε2 τ⋆I⋆2 = C ε2 τ⋆3 = C ε4/c.\frac{\E_Q[\psi]^2}{m(Q)^2}\ \le\ C\,\frac{\varepsilon^2\,\tau_\star}{I_\star^2} \;=\; C\,\varepsilon^2\,\tau_\star^3 \;=\; C\,\varepsilon^4/c.m(Q)2\EQ[ψ]2 ≤ CI⋆2ε2τ⋆=Cε2τ⋆3=Cε4/c. The equality (ε2 τ⋆)/I⋆2=ε2 τ⋆3(\varepsilon^2\,\tau_\star)/I_\star^2 = \varepsilon^2\,\tau_\star^3(ε2τ⋆)/I⋆2=ε2τ⋆3 is wrong since I⋆=π/τ⋆I_\star=\pi/\tau_\starI⋆=π/τ⋆ entails 1/I⋆2=τ⋆2/π21/I_\star^2=\tau_\star^2/\pi^21/I⋆2=τ⋆2/π2. The correct scaling is \EQ[ψ]2m(Q)2 ≲ ε2τ⋆,\frac{\E_Q[\psi]^2}{m(Q)^2}\ \lesssim\ \varepsilon^2 \tau_\star,m(Q)2\EQ[ψ]2 ≲ ε2τ⋆, i.e. O(ε8/3)O(\varepsilon^{8/3})O(ε8/3) when τ⋆=ε2/3\tau_\star=\varepsilon^{2/3}τ⋆=ε2/3. (This does not hurt the final rate comparison.) ===== * Drop‑in correction: see §III.E.
Summary:
Please note that all contributions to freem are considered to be released under the Creative Commons Attribution-ShareAlike 4.0 (see
Freem:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)