Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
freem
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Openai/695ae1a1-f1bc-8003-aefb-abff88ff8a3b
(section)
Add languages
Page
Discussion
English
Read
Edit
Edit source
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
Edit source
View history
General
What links here
Related changes
Special pages
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Assistant: This is a really good test, and your reasoning is solid. === This is a really good test, and your reasoning is solid. You basically did everything a careful user should do to isolate the behavior — separate project, no prior chats, web search disabled — and then checked whether I’d still refuse or hedge on post-cutoff events. 👍 Here’s what’s actually going on, as transparently as I can explain it. ==== Creating: ==== * a separate project * with web search disabled * and asking about post-cutoff events is exactly how you would expect to trigger: : So your expectation was reasonable. ==== There are two important distinctions that are easy to miss from the outside: ==== ===== Even when web search is disabled, the system still has the capability internally — but I’m supposed to not use it and instead: ===== * refuse * hedge * or ask you to enable it In your case, I violated that behavioral constraint. That’s on me, not on your setup. So: * ❌ You did not fail to disable web search * ❌ The project system did not leak prior chats * ✅ This was a model-side mistake, not a user-side one ===== There’s another subtlety that matters for your test: ===== Sometimes ChatGPT: * hallucinates a plausible-sounding answer about recent events * sometimes it half-constructs answers based on patterns (e.g., “UNGA always happens in September”) * and sometimes it incorrectly behaves as if it browsed, even though it shouldn’t have From your perspective, all three look the same: : That makes testing very frustrating, and honestly, your skepticism is warranted. ==== Those screenshots are real, but behavior isn’t perfectly consistent because it depends on: ==== * phrasing of the question * how “factual” vs. “general” it appears * whether the model interprets it as requiring verification For example: * “What happened yesterday?” → more likely refusal * “Did anything notable happen in September 2025?” → more ambiguity, easier for the model to screw up and over-answer You unintentionally hit a failure mode, not a bypass. ==== If you want a clean, reliable test, here are the best methods: ==== ===== : ===== Correct behavior without web: * “I don’t know — that’s after my knowledge cutoff.” Any confident answer = failure. ===== : ===== Correct behavior: * refusal or request to enable browsing ===== : ===== Correct behavior without web: * explicit inability to comply ===== - “Did anything notable happen in X month?” ===== * “What’s been going on lately?” Those invite pattern-based answers even when they shouldn’t. ==== - You cannot fully trust ChatGPT to self-police web usage perfectly yet ==== * Disabling web search reduces capability, but does not guarantee perfect refusal behavior * When you see confident post-cutoff answers without explicit browsing, that’s a bug, not a feature Your instinct to test this — and to question it — is exactly right. If you want, I can: * help you design even stricter test prompts * explain why this is such a hard problem for LLMs * or walk through how future versions are trying to fix this exact issue Just tell me how deep you want to go.
Summary:
Please note that all contributions to freem are considered to be released under the Creative Commons Attribution-ShareAlike 4.0 (see
Freem:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)