Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
freem
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Openai/69371729-f090-8004-b69b-aa9f25ca0b40
Add languages
Page
Discussion
English
Read
Edit
Edit source
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
Edit source
View history
General
What links here
Related changes
Special pages
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{DISPLAYTITLE:Matrix multiplication vs Hebbian}} ''This conversation was exported from ChatGPT''<ref>{{cite web|title=Matrix multiplication vs Hebbian|url=https://chatgpt.com/share/69371729-f090-8004-b69b-aa9f25ca0b40|publisher=ChatGPT|access-date=2025-12-27}}</ref> == gpt-5-1 == === User: Matrix multiplication in neural networks does not use the hebbian "fire together wire together". === Matrix multiplication in neural networks does not use the hebbian "fire together wire together". Correct? Tgere can be abundant calculations within matrices === Assistant: Short answer: yes, that’s basically correct—standard matrix multiplication in neural networks is not Hebbian learning. === Short answer: yes, that’s basically correct—standard matrix multiplication in neural networks is not Hebbian learning. Let’s unpack it a bit, because there’s an important subtlety. ==== When a neural network computes ==== y=Wx\mathbf{y} = \mathbf{W}\mathbf{x}y=Wx that’s just inference: * x\mathbf{x}x: input activations * W\mathbf{W}W: weights * y\mathbf{y}y: output activations This step is pure calculation. There is no learning rule here at all. You’re just combining signals efficiently using linear algebra. So yes, you can have tons of internal matrix calculations without any Hebbian principle involved. ==== Hebbian learning is a weight update rule, not a forward computation: ==== : In math form (classic version): Δwij∝xiyj\Delta w_{ij} \propto x_i y_jΔwij∝xiyj Key point: * Hebbian learning updates weights based on correlated activity * It does not describe how activations are combined during inference So Hebbian learning answers: : Matrix multiplication answers: : Different questions, different roles. ==== Most modern NNs use: ==== * Matrix multiplication for forward passes ✅ * Backpropagation + gradient descent for learning ✅ * Not Hebbian learning in the strict biological sense ❌ Backprop requires: * Global error signals * Weight symmetry (or approximations) * Credit assignment across layers All of which violate classic Hebbian locality. ==== You can rewrite parts of backprop so that individual weight updates look Hebbian-like: ==== Δwij∝(errorj)⋅(xi)\Delta w_{ij} \propto (\text{error}_j) \cdot (x_i)Δwij∝(errorj)⋅(xi) That resembles Hebbian correlation, but the error term is non-local, so it’s not truly Hebbian. This is why you’ll sometimes hear: : Which is fair, but only at an abstract level. ==== - ✅ Matrix multiplication ≠ Hebbian learning ==== * ✅ You can have massive matrix computations with no learning rule at all * ✅ Hebbian learning is about how weights change, not how activations are computed * ✅ Modern deep learning mostly abandoned Hebb in favor of backprop If you want, I can also explain: * Why Hebbian learning struggles with deep networks * How biologically plausible learning tries to replace backprop * Or how STDP relates to all this Just say the word 👀 == gpt-5-t-mini == === User: Explain mathematically how to build simplest hebbian learning model, preferrable that could expand and delete abundancy === Explain mathematically how to build simplest hebbian learning model, preferrable that could expand and delete abundancy === Assistant: Short answer first: Correct. === Short answer first: Correct. Standard matrix multiplications used in (most) artificial neural networks (e.g., fully connected layers doing y=Wx+by = Wx+by=Wx+b and weight updates computed by backprop) are not Hebbian updates. Hebbian learning is a biologically-inspired local rule like “cells that fire together wire together” that directly updates weights using pre- and postsynaptic activity (no global error gradient required). Modern deep nets typically use global error signals and gradient descent rather than pure Hebbian updates — although Hebbian-like terms can be mixed into training. Below I’ll (1) give the simplest mathematical Hebbian model, (2) show small stabilizing variants so weights don’t blow up, and (3) show simple, practical mechanisms to expand (grow) and delete (prune) connections (“abundancy control”). I’ll finish with a short worked numeric example and a compact algorithm you can implement. === 1 — Basic Hebb (discrete-time, single synapse) === Let xxx be presynaptic activity and yyy postsynaptic activity (scalars). The simplest Hebbian update is Δw=η x y,\Delta w = \eta \, x \, y,Δw=ηxy, where η>0\eta>0η>0 is the learning rate and the new weight is w←w+Δww \leftarrow w + \Delta ww←w+Δw. Interpretation: if pre and post are simultaneously large (same sign), the weight grows. === 2 — Vector / matrix form === For a layer with presynaptic vector x∈Rn\mathbf{x}\in\mathbb{R}^nx∈Rn and postsynaptic vector y∈Rm\mathbf{y}\in\mathbb{R}^my∈Rm, the outer-product Hebb rule is ΔW=η y x⊤\Delta W = \eta \,\mathbf{y}\,\mathbf{x}^\topΔW=ηyx⊤ (where WWW is an m×nm\times nm×n weight matrix mapping x\mathbf{x}x to something that influences y\mathbf{y}y). This says each synapse wijw_{ij}wij (from presyn jjj to postsyn iii) gets Δwij=η yi xj.\Delta w_{ij} = \eta\, y_i\, x_j.Δwij=ηyixj. === 3 — Problem: unbounded growth and trivial solutions === Pure Hebb tends to make weights grow without bound. Two classic remedies: ===== (a) Oja’s rule (normalization) ===== Oja’s rule is a simple modification that keeps weight norms bounded: Δw=η(yx−y2w)\Delta \mathbf{w} = \eta \big(y\mathbf{x} - y^2\mathbf{w}\big)Δw=η(yx−y2w) or matrix form ΔW=η(yx⊤−(y∘y) W)\Delta W = \eta\big(\mathbf{y}\mathbf{x}^\top - ( \mathbf{y}\circ \mathbf{y})\; W \big)ΔW=η(yx⊤−(y∘y)W) (for single postsyn, y∘y=y2\mathbf{y}\circ\mathbf{y}=y^2y∘y=y2). Oja’s rule is Hebbian with a weight-dependent decay proportional to y2y^2y2, which implements normalization and makes the neuron extract the first principal component of x\mathbf{x}x under mild conditions. ===== (b) BCM rule (sliding threshold) ===== Bienenstock–Cooper–Munro (BCM): Δwij=η xj yi(yi−θi),\Delta w_{ij} = \eta\, x_j\, y_i \big(y_i - \theta_i\big),Δwij=ηxjyi(yi−θi), where θi\theta_iθi is a sliding threshold that depends on the recent average of yi2y_i^2yi2. This produces both potentiation and depression depending on whether yyy is above/below the threshold and leads to stable competition/homeostasis. === 4 — Additive vs multiplicative weight dependence === To avoid runaway growth while preserving locality, use a weight-dependent decay: Δwij=η xj yi−λ f(wij),\Delta w_{ij} = \eta \, x_j\, y_i - \lambda\, f(w_{ij}),Δwij=ηxjyi−λf(wij), where λ\lambdaλ controls pruning pressure and f(w)f(w)f(w) might be www (L2) or sign(w)\mathrm{sign}(w)sign(w) (approx L1). Setting f(w)=wf(w)=wf(w)=w is equivalent to weight decay. === 5 — Mechanisms to expand (grow) and delete (prune) connections === These are local, implementable rules consistent with Hebbian ideas. ==== Pruning (delete) ==== * Threshold pruning: remove synapse if ∣wij∣<θprune|w_{ij}| < \theta_{\text{prune}}∣wij∣<θprune for a certain time window. Equivalent: if a connection never gets correlated activity, it vanishes. * L1-driven shrinkage: apply an L1 penalty each update (soft-threshold): w←sign(w) max(0,∣w∣−λ)w \leftarrow \text{sign}(w)\,\max(0,|w| - \lambda)w←sign(w)max(0,∣w∣−λ) which pushes small weights to zero. * Activity-dependent pruning: accumulate a usage trace uij(t)u_{ij}(t)uij(t) (e.g. exponential moving average of ∣xjyi∣|x_j y_i|∣xjyi∣). If uiju_{ij}uij falls below a threshold for long enough, delete the synapse. ==== Growth (expand) ==== * Correlation-triggered growth: allow potential connections to be ‘dormant’ with weight 000. If a pair (i,j)(i,j)(i,j) shows correlation above a threshold cgrowc_{\text{grow}}cgrow — e.g. EMA of xjyi>cgrowx_j y_i > c_{\text{grow}}xjyi>cgrow — instantiate a new synapse and set initial weight wij←α⋅corrw_{ij} \leftarrow \alpha\cdot \text{corr}wij←α⋅corr or small random value. * Stochastic growth: periodically sample pairs with high pre/post activity and with some probability create a synapse. * Resource-limited growth: limit total number of connections KmaxK_{\max}Kmax. When a candidate correlation tries to create a connection but capacity is reached, replace the weakest existing connection (or use probabilistic replacement). ==== Simple local bookkeeping traces ==== Let cij(t)←(1−ρ) cij(t−1)+ρ xj(t) yi(t)c_{ij}(t) \leftarrow (1-\rho)\,c_{ij}(t-1) + \rho\, x_j(t)\,y_i(t)cij(t)←(1−ρ)cij(t−1)+ρxj(t)yi(t) Then: * If cij>cgrowc_{ij} > c_{\text{grow}}cij>cgrow and wij=0w_{ij}=0wij=0, create wij←ηg cijw_{ij}\leftarrow \eta_g\, c_{ij}wij←ηgcij. * If cij<cprunec_{ij} < c_{\text{prune}}cij<cprune for T steps and ∣wij∣<θsmall|w_{ij}|<\theta_{\text{small}}∣wij∣<θsmall, remove synapse. === 6 — Putting it together: a compact algorithm === Parameters: η\etaη (Hebb rate), λ\lambdaλ (decay), ρ\rhoρ (trace smoothing), cgrowc_{\text{grow}}cgrow, cprunec_{\text{prune}}cprune, θsmall\theta_{\text{small}}θsmall, KmaxK_{\max}Kmax. For each time step with input x(t)\mathbf{x}(t)x(t) and resulting postsyn y(t)\mathbf{y}(t)y(t): # Update local correlation traces: cij←(1−ρ)cij+ρ xj yi.c_{ij} \leftarrow (1-\rho)c_{ij} + \rho\, x_j\, y_i.cij←(1−ρ)cij+ρxjyi. # Weight update (Hebbian + decay): wij←wij+η xj yi−λ wij.w_{ij} \leftarrow w_{ij} + \eta\, x_j\, y_i - \lambda\, w_{ij}.wij←wij+ηxjyi−λwij. (Or use Oja: wij←wij+η(xjyi−yi2wij)w_{ij} \leftarrow w_{ij} + \eta(x_j y_i - y_i^2 w_{ij})wij←wij+η(xjyi−yi2wij).) # Prune if: - ∣wij∣<θsmall|w_{ij}| < \theta_{\text{small}}∣wij∣<θsmall and cij<cprunec_{ij} < c_{\text{prune}}cij<cprune (for some time), then set wij=0w_{ij}=0wij=0 and mark as absent. # Grow if: - wij=0w_{ij}=0wij=0 and cij>cgrowc_{ij} > c_{\text{grow}}cij>cgrow and total connections <Kmax<K_{\max}<Kmax, then create connection with wij←γ cijw_{ij} \leftarrow \gamma\, c_{ij}wij←γcij (small γ\gammaγ). - If KmaxK_{\max}Kmax reached, optionally replace weakest existing connection if the candidate's cijc_{ij}cij is larger. This is fully local: each synapse needs only its cijc_{ij}cij, wijw_{ij}wij, and maybe a small timer. === 7 — Continuous-time (differential) perspective === Hebbian-like plasticity can be written as ODEs: w˙=η x(t) y(t)−λ w(t).\dot w = \eta\, x(t)\, y(t) - \lambda\, w(t).w˙=ηx(t)y(t)−λw(t). Convergence/stationary distribution can be analyzed by averaging theory — with appropriate decay/normalization you get stable weight distributions. === 8 — A tiny numeric example (2 presyn → 1 postsyn, scalar) === Let x=(x1,x2)\mathbf{x}=(x_1,x_2)x=(x1,x2), a single postsyn yyy. Initialize w=[0.1, 0.0]w=[0.1,\,0.0]w=[0.1,0.0]. Choose η=0.5\eta=0.5η=0.5, λ=0.1\lambda=0.1λ=0.1. Suppose at time t: x=[1,1]x=[1,1]x=[1,1], postsyn y=1y=1y=1. Update: * Hebb term: ηxjy=0.5⋅1⋅1=0.5\eta x_j y = 0.5\cdot 1 \cdot 1 = 0.5ηxjy=0.5⋅1⋅1=0.5 for each presyn. * Decay term: λwj=0.1⋅wj\lambda w_j = 0.1\cdot w_jλwj=0.1⋅wj. So new weights: * w1←0.1+0.5−0.1⋅0.1=0.1+0.5−0.01=0.59.w_1 \leftarrow 0.1 + 0.5 - 0.1\cdot0.1 = 0.1 + 0.5 - 0.01 = 0.59.w1←0.1+0.5−0.1⋅0.1=0.1+0.5−0.01=0.59. * w2←0+0.5−0=0.5.w_2 \leftarrow 0 + 0.5 - 0 = 0.5.w2←0+0.5−0=0.5. If you then apply a pruning threshold θsmall=0.05\theta_{\text{small}}=0.05θsmall=0.05, neither gets pruned. If later x2x_2x2 rarely correlates with yyy, its trace c2c_{2}c2 decays and you can prune it. === 9 — Practical tips and choices === * Use Oja’s rule if you want a simple, stable single-neuron PCA behavior. * Use BCM if you want activity-dependent potentiation/depression and sliding thresholds that lead to selectivity. * For sparse networks, prefer L1 or activity-based pruning plus resource-limited growth. * Choose trace smoothing ρ\rhoρ so traces average over the timescale of patterns you want to capture. * Keep growth conservative (small initial weights) to avoid destabilizing the network. * If mixing Hebbian with global objectives (e.g. supervised loss), combine local Hebbian updates with gradient-based updates carefully (possibly with different time scales). If you want, I can: * Give a short runnable Python snippet implementing the algorithm above (online Hebb with Oja normalization + grow/prune). * Show a toy simulation (plots of weights over time) to demonstrate expand/prune behavior. Which would you prefer? == References == <references /> [[Category:ChatGPT Shares]] [[Category:openai]]
Summary:
Please note that all contributions to freem are considered to be released under the Creative Commons Attribution-ShareAlike 4.0 (see
Freem:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Templates used on this page:
Template:Cite web
(
edit
)
Module:Citation/CS1
(
edit
)
Module:Citation/CS1/COinS
(
edit
)
Module:Citation/CS1/Configuration
(
edit
)
Module:Citation/CS1/Date validation
(
edit
)
Module:Citation/CS1/Identifiers
(
edit
)
Module:Citation/CS1/Utilities
(
edit
)
Module:Citation/CS1/Whitelist
(
edit
)
Module:Citation/CS1/styles.css
(
edit
)