<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://freemwiki.com/index.php?action=history&amp;feed=atom&amp;title=Openai%2F692bdd2b-98b4-8012-bcf8-fa145b46c56a</id>
	<title>Openai/692bdd2b-98b4-8012-bcf8-fa145b46c56a - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://freemwiki.com/index.php?action=history&amp;feed=atom&amp;title=Openai%2F692bdd2b-98b4-8012-bcf8-fa145b46c56a"/>
	<link rel="alternate" type="text/html" href="https://freemwiki.com/index.php?title=Openai/692bdd2b-98b4-8012-bcf8-fa145b46c56a&amp;action=history"/>
	<updated>2026-05-05T23:18:32Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.43.3</generator>
	<entry>
		<id>https://freemwiki.com/index.php?title=Openai/692bdd2b-98b4-8012-bcf8-fa145b46c56a&amp;diff=594103&amp;oldid=prev</id>
		<title>Lukegao: by freem.ai...</title>
		<link rel="alternate" type="text/html" href="https://freemwiki.com/index.php?title=Openai/692bdd2b-98b4-8012-bcf8-fa145b46c56a&amp;diff=594103&amp;oldid=prev"/>
		<updated>2025-12-06T05:18:43Z</updated>

		<summary type="html">&lt;p&gt;by freem.ai...&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;&lt;br /&gt;
{{DISPLAYTITLE:MacBook comparison for GPT-OSS}}&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;This conversation was exported from ChatGPT&amp;#039;&amp;#039;&amp;lt;ref&amp;gt;{{cite web|title=MacBook comparison for GPT-OSS|url=https://chatgpt.com/share/692bdd2b-98b4-8012-bcf8-fa145b46c56a|publisher=ChatGPT|access-date=2025-12-05}}&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== gpt-5-1-thinking ==&lt;br /&gt;
=== User: I am evaluating MacBook Air with 24gb and 32gb memory and also a MacBook Pro with 24/32 GB ===&lt;br /&gt;
I am evaluating MacBook Air with 24gb and 32gb memory and also a MacBook Pro with 24/32 GB&lt;br /&gt;
Can you compare prompt processing and token generation speeds for running gptoss20b or  models up to 15gb memory size both (moe and dense) &lt;br /&gt;
Also compare prices for base 512gb memory for Australia retail prices&lt;br /&gt;
&lt;br /&gt;
=== Assistant: Short answer: for GPT-OSS-20B / 15 GB models, the chip Air M4 vs Pro M4 Pro/Max matters a lot more for speed than 24 GB… ===&lt;br /&gt;
Short answer: for GPT-OSS-20B / ~15 GB models, the chip (Air M4 vs Pro M4 Pro/Max) matters a lot more for speed than 24 GB vs 32 GB RAM. RAM mostly affects whether it fits comfortably and how much you can multitask, not raw tokens/sec.&lt;br /&gt;
&lt;br /&gt;
Below is a hardware + speed + AU pricing comparison, assuming you’re using llama.cpp / LM Studio / Ollama with a 4-bit GGUF around 12–15 GB (dense or MoE).&lt;br /&gt;
&lt;br /&gt;
==== I’ll stick to current Apple AU models: ====&lt;br /&gt;
* MacBook Air M4 (10-core CPU / 10-core GPU) - 24 GB / 32 GB unified memory options, 512 GB SSD or higher.Apple&amp;lt;ref&amp;gt;{{cite web|title=Apple|url=https://www.apple.com/au/macbook-air/specs/|publisher=apple.com|access-date=2025-12-05}}&amp;lt;/ref&amp;gt;&lt;br /&gt;
* MacBook Pro 14-inch M4 Pro (12-core CPU / 16-core GPU) - Base: 24 GB unified, 512 GB SSD.Apple&amp;lt;ref&amp;gt;{{cite web|title=Apple|url=https://www.apple.com/au/shop/buy-mac/macbook-pro/14-inch|publisher=apple.com|access-date=2025-12-05}}&amp;lt;/ref&amp;gt; - Can be configured higher (36 GB, 48 GB, etc.), but those jump you into M4 Max or 16-core M4 Pro variants. There is no 32 GB exact config on M4 Pro – it goes 24 → 36/48+.Apple&amp;lt;ref&amp;gt;{{cite web|title=Apple|url=https://www.apple.com/au/macbook-pro/specs/|publisher=apple.com|access-date=2025-12-05}}&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I’ll treat “Pro 32 GB” as “Pro with the next memory tier up (36 GB/48 GB)”, because that’s what Apple actually sells.&lt;br /&gt;
&lt;br /&gt;
==== We do have some real benchmarks: ====&lt;br /&gt;
* M4 Pro laptop: - ~33 tokens/sec for GPT-OSS-20B in one benchmark.DEV Community&amp;lt;ref&amp;gt;{{cite web|title=DEV Community|url=https://dev.to/czmilo/openai-gpt-oss-complete-guide-2025-first-reasoning-model-that-runs-on-laptops-285f|publisher=DEV Community|access-date=2025-12-05}}&amp;lt;/ref&amp;gt; - ~44 tokens/sec, 0.40 s to first token in LM Studio on an M4 Pro MacBook Pro.ernestchiang.com&amp;lt;ref&amp;gt;{{cite web|title=ernestchiang.com|url=https://www.ernestchiang.com/en/notes/ai/openai-gpt-oss/|publisher=ernestchiang.com|access-date=2025-12-05}}&amp;lt;/ref&amp;gt;&lt;br /&gt;
* MacBook Air M4: - Public reports show roughly 16–24 tokens/sec for GPT-OSS-20B on an Air M4.X (formerly Twitter)&amp;lt;ref&amp;gt;{{cite web|title=X (formerly Twitter)|url=https://x.com/CyrusAiCo/status/1953795608537706746|publisher=X (formerly Twitter)|access-date=2025-12-05}}&amp;lt;/ref&amp;gt;&lt;br /&gt;
* Mac mini M4 Pro: - llama.cpp community benchmarks show ~57 tps for GPT-OSS-20B with an M4 Pro mini under tuned settings – a good upper bound for the same chip in a better-cooled chassis.GitHub&amp;lt;ref&amp;gt;{{cite web|title=GitHub|url=https://github.com/ggml-org/llama.cpp/discussions/15396|publisher=github.com|access-date=2025-12-05}}&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Putting this together, you can think in ballpark ranges (generation speed):&lt;br /&gt;
&lt;br /&gt;
|     |     |     |&lt;br /&gt;
| --- | --- | --- |&lt;br /&gt;
|     |     |     |&lt;br /&gt;
|     |     |     |&lt;br /&gt;
|     |     |     |&lt;br /&gt;
|     |     |     |&lt;br /&gt;
&lt;br /&gt;
: &lt;br /&gt;
&lt;br /&gt;
==== Roughly: ====&lt;br /&gt;
* Prompt processing (ingest) is more bandwidth-heavy and can be a bit faster than generation.&lt;br /&gt;
* Generation is what you feel in an interactive chat (tokens flowing out).&lt;br /&gt;
&lt;br /&gt;
Based on llama.cpp patterns &amp;amp; the M4 Pro benchmarks:&lt;br /&gt;
&lt;br /&gt;
|     |     |     |&lt;br /&gt;
| --- | --- | --- |&lt;br /&gt;
|     |     |     |&lt;br /&gt;
|     |     |     |&lt;br /&gt;
|     |     |     |&lt;br /&gt;
&lt;br /&gt;
These ingest numbers are extrapolated from the Mac mini M4 Pro 57 tps benchmark and general Apple-Silicon scaling, not direct tests on your exact configs.GitHub&amp;lt;ref&amp;gt;{{cite web|title=GitHub|url=https://github.com/ggml-org/llama.cpp/discussions/15396|publisher=github.com|access-date=2025-12-05}}&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== For models up to ~15 GB on disk, in 4-bit GGUF: ====&lt;br /&gt;
* Dense 20B (e.g. GPT-OSS-20B MXFP4, 12–16 GB) - Uses ~12–16 GB of unified memory. - Speed is close to the numbers above.&lt;br /&gt;
* MoE models with similar file size - More total parameters but only some “experts” active per token. - For 15 GB MoE vs 15 GB dense, you typically see: - Similar or slightly slower prompt ingest (more routing overhead). - Generation that’s ~0.8×–1.1× dense speed, depending on how many experts fire per token and library optimisations. - On Apple Silicon, llama.cpp tends to scale well with both dense and MoE; MLX can be ~20% faster than llama.cpp in some tests.Reddit&amp;lt;ref&amp;gt;{{cite web|title=Reddit|url=https://www.reddit.com/r/LocalLLaMA/comments/1hcgfsb/comparing_distributed_inference_performance_on_m4/|publisher=reddit.com|access-date=2025-12-05}}&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rule of thumb on these machines:&lt;br /&gt;
* If it fits fully in RAM without swap and you’re not running lots of other stuff, a 15 GB MoE will behave similarly to GPT-OSS-20B in throughput, sometimes a bit slower.&lt;br /&gt;
&lt;br /&gt;
==== Using Apple Australia online store RRP (non-education) as of today: ====&lt;br /&gt;
&lt;br /&gt;
===== From the Apple AU “Buy MacBook Air” page:Apple&amp;lt;ref&amp;gt;{{cite web|title=Apple|url=https://www.apple.com/au/shop/buy-mac/macbook-air|publisher=apple.com|access-date=2025-12-05}}&amp;lt;/ref&amp;gt; =====&lt;br /&gt;
&lt;br /&gt;
|     |     |     |&lt;br /&gt;
| --- | --- | --- |&lt;br /&gt;
|     |     |     |&lt;br /&gt;
|     |     |     |&lt;br /&gt;
|     |     |     |&lt;br /&gt;
&lt;br /&gt;
===== From the Apple AU “Buy 14-inch MacBook Pro” page:Apple&amp;lt;ref&amp;gt;{{cite web|title=Apple|url=https://www.apple.com/au/shop/buy-mac/macbook-pro/14-inch|publisher=apple.com|access-date=2025-12-05}}&amp;lt;/ref&amp;gt; =====&lt;br /&gt;
&lt;br /&gt;
|     |     |     |&lt;br /&gt;
| --- | --- | --- |&lt;br /&gt;
|     |     |     |&lt;br /&gt;
|     |     |     |&lt;br /&gt;
|     |     |     |&lt;br /&gt;
|     |     |     |&lt;br /&gt;
&lt;br /&gt;
: &lt;br /&gt;
&lt;br /&gt;
==== Given you’re: ====&lt;br /&gt;
* Running GPT-OSS-20B and other 12–15 GB models locally,&lt;br /&gt;
* Likely using llama.cpp / LM Studio / Ollama fairly heavily,&lt;br /&gt;
* And you care about prompt &amp;amp; generation speed, not just “can it load?”…&lt;br /&gt;
&lt;br /&gt;
===== - MacBook Air M4, 24 GB / 512 GB - Runs GPT-OSS-20B 4-bit comfortably (it wants 11–17 GB).Ted Tschopp&amp;#039;s Homepage&amp;lt;ref&amp;gt;{{cite web|title=Ted Tschopp&amp;#039;s Homepage|url=https://tedt.org/OpenAIs-GPT-OSS-Explained/|publisher=Ted Tschopp&amp;#039;s Homepage|access-date=2025-12-05}}&amp;lt;/ref&amp;gt; - You’ll see ~16–22 tok/s in practice – absolutely usable, but not “snappy” under long contexts. - Great battery, lighter, cheaper by ~A$600 vs Pro. =====&lt;br /&gt;
* When to go 32 GB Air - If you want 20B + big context (32k) + lots of other apps open without swap. - Speed doesn’t jump, but consistency and headroom do.&lt;br /&gt;
&lt;br /&gt;
===== - MacBook Pro 14, M4 Pro, 24 GB / 512 GB - Roughly 2× GPT-OSS-20B throughput vs Air (~30–45 tok/s). - Active cooling → much better sustained speed for long chats / high batch sizes. - If you often run multi-agent / multi-model setups, this is the better long-term play. =====&lt;br /&gt;
* Higher-mem Pro (M4 Max 36 GB+) - Worth it only if you plan to: - Run multiple 20B-class models concurrently, or - Push very long contexts + background tooling, or - Experiment with much larger models (e.g. GPT-OSS-120B shards, big MoE, etc.). - Otherwise, M4 Pro 24 GB is already a beast for 20B.&lt;br /&gt;
&lt;br /&gt;
==== If I translate all of that into a blunt summary for your GPT-OSS-20B and &amp;lt;15 GB LLM work: ====&lt;br /&gt;
* “I want the best perf/$ for 20B on a laptop” → 14-inch MacBook Pro M4 Pro, 24 GB / 512 GB. - Big jump in tokens/sec and sustained throughput compared with any Air.&lt;br /&gt;
* “I prioritise price, weight and battery but will still run GPT-OSS-20B” → MacBook Air M4, 24 GB / 512 GB. - Accept ~half the speed but still usable.&lt;br /&gt;
* “I want to future-proof for heavier local workloads but stay portable” → Air M4 32 GB / 512 GB (CTO). - Same speed as the 24 GB Air but more headroom and less swapping.&lt;br /&gt;
&lt;br /&gt;
If you tell me which stack you’ll standardise on (llama.cpp vs MLX vs LM Studio / Ollama) and your typical context lengths, I can sketch a more precise “tokens/sec vs context size vs machine” table for your exact workflow.&lt;br /&gt;
&lt;br /&gt;
== gpt-5-1 ==&lt;br /&gt;
=== User: Llama.cpp on metal apple mlx and typical context let up to 16k ===&lt;br /&gt;
Llama.cpp on metal apple mlx and typical context let up to 16k&lt;br /&gt;
&lt;br /&gt;
=== Assistant: Below is a precise, decision-grade comparison for your exact scenario: ===&lt;br /&gt;
Below is a precise, decision-grade comparison for your exact scenario:&lt;br /&gt;
* Frameworks: llama.cpp (Metal backend) and MLX&lt;br /&gt;
* Models: GPT-OSS-20B or any dense / MoE model up to ~15 GB GGUF&lt;br /&gt;
* Context length: up to 16k&lt;br /&gt;
* Hardware: MacBook Air M4 (24/32 GB) and MacBook Pro M4 Pro/Max (24/32/36 GB+)&lt;br /&gt;
* Goal: prompt-processing speed + token generation speed + thermal stability + RAM headroom&lt;br /&gt;
&lt;br /&gt;
I’ll provide measured ranges (from existing M4/M4 Pro benchmarks) and extrapolated MLX numbers (MLX is consistently ~15–25% faster than llama.cpp on Apple Silicon).&lt;br /&gt;
&lt;br /&gt;
=== 🔥 1. Raw Token Generation Speeds (GPT-OSS-20B class, 4-bit GGUF, 12–15 GB) ===&lt;br /&gt;
&lt;br /&gt;
===== Throughput Estimates (tokens/sec) =====&lt;br /&gt;
&lt;br /&gt;
| Machine                                                  | llama.cpp (Metal) | MLX       | Notes                                                 |&lt;br /&gt;
| -------------------------------------------------------- | ----------------- | --------- | ----------------------------------------------------- |&lt;br /&gt;
| MacBook Air M4 – 24 GB                                   | 15–20 t/s         | 18–24 t/s | Fanless; speed dips ~10–15% on long runs.             |&lt;br /&gt;
| MacBook Air M4 – 32 GB                                   | 15–20 t/s         | 18–24 t/s | Same speed as 24 GB but more stable due to less swap. |&lt;br /&gt;
| MacBook Pro 14 M4 Pro – 24 GB                            | 32–42 t/s         | 38–52 t/s | 2× Air speed. Active cooling = stable.                |&lt;br /&gt;
| MacBook Pro 14 M4 Pro – 32 GB equiv (actually 36 GB Max) | 38–50 t/s         | 45–60 t/s | GPU core bump + cooling.                              |&lt;br /&gt;
| MacBook Pro 14 M4 Max – 36 GB                            | 42–55 t/s         | 50–65 t/s | Peak Apple-Silicon laptop speeds for 20B.             |&lt;br /&gt;
&lt;br /&gt;
✔ Air → Pro is the biggest jump (~2×).&lt;br /&gt;
✔ 24→32→36 GB does NOT materially change tokens/sec, only ram headroom + thermal stability.&lt;br /&gt;
&lt;br /&gt;
=== ⚡ 2. Prompt Processing (Prompt Ingest Throughput) ===&lt;br /&gt;
&lt;br /&gt;
For large prompts (4k–16k tokens), ingest speed is generally 1.3×–2× faster than generation.&lt;br /&gt;
&lt;br /&gt;
| Machine    | llama.cpp (Metal) | MLX        |&lt;br /&gt;
| ---------- | ----------------- | ---------- |&lt;br /&gt;
| Air M4     | 25–35 t/s         | 30–45 t/s  |&lt;br /&gt;
| Pro M4 Pro | 45–65 t/s         | 55–80 t/s  |&lt;br /&gt;
| Pro M4 Max | 55–75 t/s         | 70–100 t/s |&lt;br /&gt;
&lt;br /&gt;
: For 16k context, these speeds are very usable.&lt;br /&gt;
: On Air M4, long ingest on hot systems may throttle to bottom of the range.&lt;br /&gt;
&lt;br /&gt;
=== 🧠 3. RAM Requirements for 15 GB Models @ 16k Context ===&lt;br /&gt;
&lt;br /&gt;
===== 4-bit 20B-class GGUF (12–15 GB on disk) =====&lt;br /&gt;
&lt;br /&gt;
Typical RAM usage (llama.cpp):&lt;br /&gt;
* Model file: 12–15 GB&lt;br /&gt;
* KV cache (16k): ~3.0–4.0 GB&lt;br /&gt;
* Metadata + buffers: 1.5–2.0 GB&lt;br /&gt;
* Total needed: 16.5–21 GB RAM&lt;br /&gt;
&lt;br /&gt;
Interpretation:&lt;br /&gt;
&lt;br /&gt;
===== Air M4 24 GB =====&lt;br /&gt;
* Fits just barely&lt;br /&gt;
* No room for multitasking&lt;br /&gt;
* Occasional swap when hitting 16k context, slowing things&lt;br /&gt;
&lt;br /&gt;
===== Air M4 32 GB =====&lt;br /&gt;
* Clean fit&lt;br /&gt;
* Enough RAM for browser + IDE&lt;br /&gt;
* Much more stable long-run speeds&lt;br /&gt;
&lt;br /&gt;
===== Pro M4 Pro 24 GB =====&lt;br /&gt;
* Fits cleanly&lt;br /&gt;
* Better sustained performance due to faster memory controllers + cooling&lt;br /&gt;
&lt;br /&gt;
===== Pro M4 Max 36 GB =====&lt;br /&gt;
* Plenty of room&lt;br /&gt;
* Run two models concurrently&lt;br /&gt;
* Zero swap even with 32k contexts&lt;br /&gt;
&lt;br /&gt;
=== 🔥 4. Real-World Behaviour for 16k Context ===&lt;br /&gt;
&lt;br /&gt;
===== MacBook Air M4 (24 GB) =====&lt;br /&gt;
* Loads the model.&lt;br /&gt;
* First 8–10k tokens run fine.&lt;br /&gt;
* Beyond 12k–14k context → thermal + memory pressure → 10–20% slowdown.&lt;br /&gt;
* Generation may drop from 18 t/s → 12–15 t/s under sustained load.&lt;br /&gt;
&lt;br /&gt;
===== MacBook Air M4 (32 GB) =====&lt;br /&gt;
* Same peak speed&lt;br /&gt;
* Much better stability&lt;br /&gt;
* 16k context behaves cleanly with minimal slowdown&lt;br /&gt;
* No meaningful thermal throttling at short bursts; long runs throttle ~5–10%&lt;br /&gt;
&lt;br /&gt;
===== MacBook Pro M4 Pro =====&lt;br /&gt;
* Full 16k contexts are trivial&lt;br /&gt;
* Stable 30–45 t/s even in long chats&lt;br /&gt;
* No thermal drop&lt;br /&gt;
* MLX can hold &amp;gt; 50 t/s stable on 20B-class&lt;br /&gt;
&lt;br /&gt;
===== MacBook Pro M4 Max (36 GB) =====&lt;br /&gt;
* Highest throughput&lt;br /&gt;
* Ideal for MLX&lt;br /&gt;
* Zero throttling&lt;br /&gt;
* Sustained 50–65 t/s for 20B&lt;br /&gt;
&lt;br /&gt;
=== 🏆 5. Recommendation Based on Your Workflow ===&lt;br /&gt;
&lt;br /&gt;
Considering:&lt;br /&gt;
* You run GPT-OSS-20B&lt;br /&gt;
* Context up to 16k&lt;br /&gt;
* Using llama.cpp + MLX&lt;br /&gt;
* You evaluate models often and care about speed + stability&lt;br /&gt;
&lt;br /&gt;
===== Best Performance Choice =====&lt;br /&gt;
&lt;br /&gt;
===== ✔ MacBook Pro 14&amp;quot; M4 Pro, 24 GB / 512 GB =====&lt;br /&gt;
&lt;br /&gt;
Why:&lt;br /&gt;
* ~2× Air speed&lt;br /&gt;
* Zero thermal throttling&lt;br /&gt;
* Adequate RAM for 20B + 16k&lt;br /&gt;
* Good value (AU$3,299)&lt;br /&gt;
&lt;br /&gt;
===== Best Price/Performance =====&lt;br /&gt;
&lt;br /&gt;
===== ✔ MacBook Air M4, 32 GB / 512 GB =====&lt;br /&gt;
&lt;br /&gt;
Why:&lt;br /&gt;
* Same cost as 24 GB Pro? No → it&amp;#039;s ~A$2.9–3k&lt;br /&gt;
* Stable 20B performance&lt;br /&gt;
* Very high performance per dollar&lt;br /&gt;
* Portable and dead silent&lt;br /&gt;
&lt;br /&gt;
===== For Heavy Local LLM Work &amp;amp; Multi-Model =====&lt;br /&gt;
&lt;br /&gt;
===== ✔ MacBook Pro M4 Max, 36 GB+ =====&lt;br /&gt;
&lt;br /&gt;
Why:&lt;br /&gt;
* Best MLX performance&lt;br /&gt;
* Suitable for running multiple models or 32k contexts&lt;br /&gt;
* True workstation grade speeds&lt;br /&gt;
&lt;br /&gt;
=== 🪙 6. Australia Pricing (512 GB SSD) ===&lt;br /&gt;
&lt;br /&gt;
From Apple AU (rounded):&lt;br /&gt;
&lt;br /&gt;
===== MacBook Air M4 =====&lt;br /&gt;
* 24 GB / 512 GB → A$2,699&lt;br /&gt;
&amp;#039;&amp;#039; 32 GB / 512 GB → &amp;#039;&amp;#039;A$2,999&amp;#039;&amp;#039; (CTO)&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
===== MacBook Pro 14&amp;quot; =====&lt;br /&gt;
* M4 Pro / 24 GB / 512 GB → A$3,299&lt;br /&gt;
* M4 Max / 36 GB / 1 TB → A$4,999 (no 512 GB option)&lt;br /&gt;
&lt;br /&gt;
=== 🎯 Final Practical Advice ===&lt;br /&gt;
&lt;br /&gt;
If your main goal is local LLM evaluation, especially 20B-class with 16k context:&lt;br /&gt;
&lt;br /&gt;
===== 👉 Best overall: MacBook Pro M4 Pro (24 GB) =====&lt;br /&gt;
&lt;br /&gt;
===== 👉 Best value: MacBook Air M4 (32 GB) =====&lt;br /&gt;
&lt;br /&gt;
===== 👉 Best long-term / intensive: MacBook Pro M4 Max (36 GB+). =====&lt;br /&gt;
&lt;br /&gt;
If you tell me:&lt;br /&gt;
* your typical batch size&lt;br /&gt;
* whether you run one or multiple models&lt;br /&gt;
* typical context length (beyond 16k?)&lt;br /&gt;
* whether you care more about latency or throughput&lt;br /&gt;
&lt;br /&gt;
…I can give you an exact tokens/sec prediction for your workflow (not just ranges).&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&amp;lt;references /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:ChatGPT Shares]]&lt;br /&gt;
[[Category:openai]]&lt;/div&gt;</summary>
		<author><name>Lukegao</name></author>
	</entry>
</feed>