Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
freem
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Openai/692c91e4-ba20-800d-8366-e5c7ed15aa7b
(section)
Add languages
Page
Discussion
English
Read
Edit
Edit source
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
Edit source
View history
General
What links here
Related changes
Special pages
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Assistant: Short answer: there’s no single fixed number. === Short answer: there’s no single fixed number. The amount of data Solomonoff induction needs is roughly the algorithmic (Kolmogorov) complexity of the true Newtonian model (in bits) plus the bits needed to specify the continuous parameters to the precision you want — scaled by how small you require the predictor’s error to be. And: Solomonoff induction is uncomputable, so these bounds are theoretical. Why (brief derivation & usable formula) * Let K(μ)K(\mu)K(μ) be the prefix Kolmogorov complexity (in bits) of the true data-generating measure μ\muμ (this encodes the form of Newton’s laws as a program). * Solomonoff’s universal prior MMM satisfies a convergence / cumulative-loss bound of the form ∑t=1TE[−logM(xt∣x<t)] ≤ (K(μ)+O(1))ln2.\sum_{t=1}^T \mathbb{E}\big[-\log M(x_t\mid x_{<t})\big]\ \le\ (K(\mu)+O(1))\ln 2 .t=1∑TE[−logM(xt∣x<t)] ≤ (K(μ)+O(1))ln2. From this you can read off a rough sample-size estimate: to make the average per-step log-loss ≤ε\le \varepsilon≤ε, T≳K(μ)ln2ε.T \gtrsim \frac{K(\mu)\ln 2}{\varepsilon}.T≳εK(μ)ln2. Interpreting ε\varepsilonε as the tolerated per-step KL / log-loss gives a concrete (though coarse) relation between desired accuracy and data. Concrete interpretation for Newtonian laws * K(μ)K(\mu)K(μ): the program length for “Newton’s laws + solver/measurement model” is small (maybe a few hundred bits) relative to many real-world models. * But you must also encode continuous parameters (masses, gravitational constant, initial conditions) to the required precision. To specify a real-valued parameter to ddd decimal digits needs about d⋅log2(10)≈3.322dd\cdot\log_2(10)\approx 3.322dd⋅log2(10)≈3.322d bits. If you need ppp parameters to ddd decimal digits, add ≈3.322pd\approx 3.322pd≈3.322pd bits to the complexity budget. * Noise, model mismatch, and unmodelled forces increase effective complexity and thus required data. Quick numeric example * Suppose the compact Newtonian form is K(μ) = 1000\,K(\mu)\!=\!1000K(μ)=1000 bits (including measurement/noise model). You want average per-step log-loss ε=10−3\varepsilon=10^{-3}ε=10−3. Then T≳1000⋅ln210−3≈693,148T \gtrsim \frac{1000\cdot\ln 2}{10^{-3}}\approx 693{,}148T≳10−31000⋅ln2≈693,148 observations (order-of-magnitude). If you also need to pin down 5 parameters to 10 decimal digits each, that adds ≈5×33.2≈166\approx 5\times 33.2\approx 166≈5×33.2≈166 bits to K(μ)K(\mu)K(μ) (minor compared with 1000 in this toy). Important caveats * Solomonoff induction is uncomputable — the bounds are information-theoretic, not an algorithm you can run. Practical learning uses computable approximations (MDL, Bayesian model selection, algorithmic-approximations, Occam-penalty priors). * Identifiability matters: some dynamics are only learnable up to transformations unless you observe the right variables and timescales. * The bound above is high-level (log-loss). Translating into, say, parameter-estimation error requires additional statistical analysis and depends on noise distribution, experiment design, and model class. Takeaway * The minimal data scales with the bit-size of the description of Newtonian mechanics plus the bits needed for parameter precision, and inversely with the tolerated per-step log-loss. For any concrete experimental precision you can plug numbers into T≈(K(theory)+bits for params)ln2εT \approx \frac{(K(\text{theory})+\text{bits for params})\ln 2}{\varepsilon}T≈ε(K(theory)+bits for params)ln2 to get an order-of-magnitude sample-size estimate.
Summary:
Please note that all contributions to freem are considered to be released under the Creative Commons Attribution-ShareAlike 4.0 (see
Freem:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)