Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
freem
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Openai/694896a0-6814-800f-b484-4ed0ec0b6e9c
Add languages
Page
Discussion
English
Read
Edit
Edit source
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
Edit source
View history
General
What links here
Related changes
Special pages
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{DISPLAYTITLE:ChatGPT 5.2 Pro benchmarks}} ''This conversation was exported from ChatGPT''<ref>{{cite web|title=ChatGPT 5.2 Pro benchmarks|url=https://chatgpt.com/share/694896a0-6814-800f-b484-4ed0ec0b6e9c|publisher=ChatGPT|access-date=2025-12-22}}</ref> == gpt-5-2-thinking == === User: Why is ChatGPT 5.2 Pro not evaluated on benchmark sites? === Why is ChatGPT 5.2 Pro not evaluated on benchmark sites? If it is, which sites? === Assistant: “ChatGPT 5.2 Pro” is a bit of a shape-shifter as far as benchmarking goes. === “ChatGPT 5.2 Pro” is a bit of a shape-shifter as far as benchmarking goes. Most benchmark/leaderboard sites don’t (and often can’t) evaluate the ChatGPT product experience directly. They usually evaluate an API model with a stable, scriptable interface, because benchmarks require thousands of automated runs with controlled settings. Why you often don’t see “ChatGPT 5.2 Pro” on the big public leaderboards # ChatGPT (the app) isn’t a clean benchmark target The ChatGPT experience includes extra layers that benchmarks try to avoid: system instructions, safety layers, tool routing, UI behaviors, feature toggles, and sometimes “smart defaults” that aren’t exposed as simple knobs. That makes results harder to reproduce and compare apples-to-apples across vendors. # Automated evaluation via ChatGPT is generally a no-go OpenAI’s Terms of Use forbid automatically or programmatically extracting data or output from the services, which blocks the typical “benchmark harness logs into the website and blasts 50k prompts” approach. OpenAI<ref>{{cite web|title=OpenAI|url=https://openai.com/policies/row-terms-of-use/|publisher=openai.com|access-date=2025-12-22}}</ref> So serious evaluators generally stick to the API. # GPT-5.2 Pro is API-accessible, but it’s a specialty endpoint OpenAI’s model page states GPT-5.2 Pro is available via the Responses API (not the older Chat Completions flow), which can be an extra integration hurdle for some benchmark rigs. OpenAI Platform<ref>{{cite web|title=OpenAI Platform|url=https://platform.openai.com/docs/models/gpt-5.2-pro|publisher=OpenAI Platform|access-date=2025-12-22}}</ref> # It’s expensive to benchmark at scale The official model listing shows GPT-5.2 Pro priced at $21 per 1M input tokens and $168 per 1M output tokens. Benchmarks can become hilariously pricey, fast. OpenAI Platform<ref>{{cite web|title=OpenAI Platform|url=https://platform.openai.com/docs/models/gpt-5.2-pro|publisher=OpenAI Platform|access-date=2025-12-22}}</ref> Even if a site could run it, many choose not to continuously retest the most expensive tiers. So is it evaluated anywhere? Yep, just not everywhere. Where GPT-5.2 Pro does show up (examples) SimpleBench SimpleBench’s public leaderboard includes “GPT-5.2 Pro (xhigh)” (and also “GPT-5.2 (high)”). SimpleBench<ref>{{cite web|title=SimpleBench|url=https://simple-bench.com/|publisher=simple-bench.com|access-date=2025-12-22}}</ref> ARC Prize leaderboard (ARC-AGI) ARC Prize’s leaderboard includes GPT-5.2 Pro results (they track performance vs cost per task for ARC-AGI style evaluations). ARC Prize<ref>{{cite web|title=ARC Prize|url=https://arcprize.org/leaderboard|publisher=ARC Prize|access-date=2025-12-22}}</ref> Note: their page is heavily visual/interactive, so some text scrapers show less detail, but the leaderboard is a real benchmarking venue. Where you’ll often NOT see GPT-5.2 Pro LMArena / Chatbot Arena (LMSYS) Arena currently lists GPT-5.2 variants like gpt-5.2, gpt-5.2-high, and gpt-5.2-search, but not a “Pro” entry. LMArena<ref>{{cite web|title=LMArena|url=https://lmarena.ai/leaderboard|publisher=lmarena.ai|access-date=2025-12-22}}</ref> That’s consistent with the cost + access + “which exact settings did you run?” problem. Bonus: sites that track GPT-5.2 generally (may not label “Pro” separately) Artificial Analysis has a GPT-5.2 model page and comparisons, but it commonly shows GPT-5.2 variants (like xhigh) rather than always breaking out “Pro” as its own named entry. Artificial Analysis<ref>{{cite web|title=Artificial Analysis|url=https://artificialanalysis.ai/models/gpt-5-2|publisher=Artificial Analysis|access-date=2025-12-22}}</ref> Epoch AI’s ECI aggregates benchmark scores and reports GPT-5.2 on its capability index (again, not always split into every tier the way OpenAI markets them). Substack<ref>{{cite web|title=Substack|url=https://substack.com/%40epochai/note/c-187985664|publisher=substack.com|access-date=2025-12-22}}</ref> If you want a clean mental model: benchmark sites are usually grading the underlying API model (gpt-5.2-pro) under very specific settings, not the full ChatGPT “Pro” experience you use in the app. That’s why the label you’re looking for often seems to “vanish” like a magician’s coin, even when the model itself is being tested. 🪄 == References == <references /> [[Category:ChatGPT Shares]] [[Category:openai]]
Summary:
Please note that all contributions to freem are considered to be released under the Creative Commons Attribution-ShareAlike 4.0 (see
Freem:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Templates used on this page:
Template:Cite web
(
edit
)
Module:Citation/CS1
(
edit
)
Module:Citation/CS1/COinS
(
edit
)
Module:Citation/CS1/Configuration
(
edit
)
Module:Citation/CS1/Date validation
(
edit
)
Module:Citation/CS1/Identifiers
(
edit
)
Module:Citation/CS1/Utilities
(
edit
)
Module:Citation/CS1/Whitelist
(
edit
)
Module:Citation/CS1/styles.css
(
edit
)