Openai/694896a0-6814-800f-b484-4ed0ec0b6e9c

This conversation was exported from ChatGPT^[1]

gpt-5-2-thinking edit

User: Why is ChatGPT 5.2 Pro not evaluated on benchmark sites? edit

Why is ChatGPT 5.2 Pro not evaluated on benchmark sites? If it is, which sites?

Assistant: “ChatGPT 5.2 Pro” is a bit of a shape-shifter as far as benchmarking goes. edit

“ChatGPT 5.2 Pro” is a bit of a shape-shifter as far as benchmarking goes.

Most benchmark/leaderboard sites don’t (and often can’t) evaluate the ChatGPT product experience directly. They usually evaluate an API model with a stable, scriptable interface, because benchmarks require thousands of automated runs with controlled settings.

Why you often don’t see “ChatGPT 5.2 Pro” on the big public leaderboards

ChatGPT (the app) isn’t a clean benchmark target The ChatGPT experience includes extra layers that benchmarks try to avoid: system instructions, safety layers, tool routing, UI behaviors, feature toggles, and sometimes “smart defaults” that aren’t exposed as simple knobs. That makes results harder to reproduce and compare apples-to-apples across vendors.
Automated evaluation via ChatGPT is generally a no-go OpenAI’s Terms of Use forbid automatically or programmatically extracting data or output from the services, which blocks the typical “benchmark harness logs into the website and blasts 50k prompts” approach. OpenAI^[2] So serious evaluators generally stick to the API.
GPT-5.2 Pro is API-accessible, but it’s a specialty endpoint OpenAI’s model page states GPT-5.2 Pro is available via the Responses API (not the older Chat Completions flow), which can be an extra integration hurdle for some benchmark rigs. OpenAI Platform^[3]
It’s expensive to benchmark at scale The official model listing shows GPT-5.2 Pro priced at $21 per 1M input tokens and $168 per 1M output tokens. Benchmarks can become hilariously pricey, fast. OpenAI Platform^[4] Even if a site could run it, many choose not to continuously retest the most expensive tiers.

So is it evaluated anywhere? Yep, just not everywhere.

Where GPT-5.2 Pro does show up (examples)

SimpleBench SimpleBench’s public leaderboard includes “GPT-5.2 Pro (xhigh)” (and also “GPT-5.2 (high)”). SimpleBench^[5]

ARC Prize leaderboard (ARC-AGI) ARC Prize’s leaderboard includes GPT-5.2 Pro results (they track performance vs cost per task for ARC-AGI style evaluations). ARC Prize^[6] Note: their page is heavily visual/interactive, so some text scrapers show less detail, but the leaderboard is a real benchmarking venue.

Where you’ll often NOT see GPT-5.2 Pro

LMArena / Chatbot Arena (LMSYS) Arena currently lists GPT-5.2 variants like gpt-5.2, gpt-5.2-high, and gpt-5.2-search, but not a “Pro” entry. LMArena^[7] That’s consistent with the cost + access + “which exact settings did you run?” problem.

Bonus: sites that track GPT-5.2 generally (may not label “Pro” separately) Artificial Analysis has a GPT-5.2 model page and comparisons, but it commonly shows GPT-5.2 variants (like xhigh) rather than always breaking out “Pro” as its own named entry. Artificial Analysis^[8] Epoch AI’s ECI aggregates benchmark scores and reports GPT-5.2 on its capability index (again, not always split into every tier the way OpenAI markets them). Substack^[9]

If you want a clean mental model: benchmark sites are usually grading the underlying API model (gpt-5.2-pro) under very specific settings, not the full ChatGPT “Pro” experience you use in the app. That’s why the label you’re looking for often seems to “vanish” like a magician’s coin, even when the model itself is being tested. 🪄

References edit

↑ "ChatGPT 5.2 Pro benchmarks". ChatGPT. Retrieved 2025-12-22.
↑ "OpenAI". openai.com. Retrieved 2025-12-22.
↑ "OpenAI Platform". OpenAI Platform. Retrieved 2025-12-22.
↑ "OpenAI Platform". OpenAI Platform. Retrieved 2025-12-22.
↑ "SimpleBench". simple-bench.com. Retrieved 2025-12-22.
↑ "ARC Prize". ARC Prize. Retrieved 2025-12-22.
↑ "LMArena". lmarena.ai. Retrieved 2025-12-22.
↑ "Artificial Analysis". Artificial Analysis. Retrieved 2025-12-22.
↑ "Substack". substack.com. Retrieved 2025-12-22.

[1] "ChatGPT 5.2 Pro benchmarks". ChatGPT. Retrieved 2025-12-22.

[2] "OpenAI". openai.com. Retrieved 2025-12-22.

[3] "OpenAI Platform". OpenAI Platform. Retrieved 2025-12-22.

[4] "OpenAI Platform". OpenAI Platform. Retrieved 2025-12-22.

[5] "SimpleBench". simple-bench.com. Retrieved 2025-12-22.

[6] "ARC Prize". ARC Prize. Retrieved 2025-12-22.

[7] "LMArena". lmarena.ai. Retrieved 2025-12-22.

[8] "Artificial Analysis". Artificial Analysis. Retrieved 2025-12-22.

[9] "Substack". substack.com. Retrieved 2025-12-22.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]