Openai/692c52e8-5230-800f-b92b-de25a0fe8d7e
This conversation was exported from ChatGPT[1]
gpt-5-1-thinking edit
User: Critique my near-final draft of my latest LinkedIn post, especially identify any major errors. edit
Critique my near-final draft of my latest LinkedIn post, especially identify any major errors. This is for a legal audience that's not well acquainted with AI. I'm covering tips that (just between you and me) I'm surprised I'm still having to share with people, but folks tell me these are eye-openers.
3 AI Tips for ChatGPT’s 3rd Birthday! 🤖🎂
- 1️⃣ PROMPTING FUNDAMENTALS
The 3 simple elements I’ve used for prompts since early 2023:
☝️ ME — Tell the AI who I am, or feed it another persona (like that I’m opposing counsel).
How can AI assist without knowing who it’s assisting?
»Example: “I’m Kyle Bahr, and I help legal professionals make AI work for them.”
⏫ Level-Up: Read prompting guides. Good prompts are structured, similar to well-formed contracts and briefs.
✌️ YOU — Tell the AI who it is, and list its areas of expertise.
Today’s LLMs were trained on nearly the entire Internet. Guide the model on what's important.
»Example: “You're an expert at Gen AI, prompt engineering, and all related topics.”
⏫ Level-Up: Give the AI top credentials, so it grasps the desired level of expertise: “You’re a tenured Harvard Law professor in....”
👌 STUFF — Give the AI what it needs to succeed at the task. (AI folks call this “context.”)
Instruct, and attach any needed materials. But don’t overload it — AI has a limited memory per chat, so keep a thread to one discrete task.
»Example: “Use you expertise to carefully draft an optimized prompt for AI to help a lawyer analyze a contact based on the attached playbook.”
⏫ Level-Up: Learn about “context engineering,” which focuses on providing context to get better AI responses.
- 2️⃣ MODEL MATTERS
When someone says they used ChatGPT, Claude, Gemini, etc., I ask: “Which model?”
The AI makers build and release multiple LLMs, resulting in a confusing — but important — jumble of numbers, letters, and labels (like 4o, Sonnet-4.5 Extended Thinking, 2.5 Flash, etc.).
»Pick the “thinking,” “reasoning,” or “pro” models in your AI app’s model selector.
These AIs are the ones showing human-level capabilities in the benchmarking reports.
⏫ Level-Up: Be a multi-model user — with many great LLMs available, why stick to one? Many look at the LMArena Leaderboard’s “Text” ranking to identify the best models.
- 3️⃣ EFFICIENT ENHANCEMENT
Folks seem to be focused on AI automation, which hands the keys to AI, letting it drive a task to completion.
The problem: AI isn’t perfect. It moves efficiently, but the results may not be reliable, with hidden errors. Consistent reliability takes building, testing, and iterating.
Instead, think AI augmentation. The human holds the wheel, and the AI helps propel by enhancing the work.
Take legal research. Start with AI jumpstarting an inquiry. The human reads and processes, directs the AI down threads — and continues to analyze the results.
AI may surface sources more quickly than the human alone would have uncovered. Or it might find authorities that would have otherwise remained buried. Working in tandem yields equal or better results, more quickly, if done right.
⏫ Level-Up: Process-map your work. Unpack a workflow (say, contract review or drafting a brief) into discrete tasks. Then determine if — and to what extent — AI can assist with any of those tasks. 🤖⚖️🎉
Assistant: Here are the big things I’d tighten up and fix, then I’ll give you a clean revised version. edit
Here are the big things I’d tighten up and fix, then I’ll give you a clean revised version.
a. Typos in your “STUFF” example edit
- you ➜ your
- contact ➜ contract
I’d also smooth the structure a bit:
b. Overstatement: “trained on nearly the entire Internet”
For a legal audience, this may read as factually wrong and trigger IP / confidentiality alarms. I’d soften it:
- “trained on huge volumes of public and licensed text, including much of the public internet” or
- “trained on massive datasets drawn from the public internet and other sources”
c. Slightly misleading strength claim
For lawyers who are already worried about overclaiming, this is a bit strong. Benchmarking ≠ dependable legal work. Easy fix:
- “These AIs are the ones often approaching or matching human-level scores on many benchmarks. That doesn’t mean they’re always right, especially in law, but they’re usually your best option for complex reasoning.”
d. Jargon that may lose non-AI folks
A few terms might land as inside-baseball unless you add a 3-word gloss:
- “LLM” – spell out once: “large language model (LLM)”.
- “model selector” – a quick parenthetical: “(the dropdown where you pick which AI model to use)”.
- “LMArena Leaderboard” – lawyers won’t know what this is. - Maybe: “Many people look at public leaderboards like the LMSys / Chatbot Arena ‘Text’ ranking…” - Or just: “Check independent leaderboards and evaluations if you want to compare models.”
Not fatal, but worth one line of clarification.
e. Minor clarity tweaks
- “AI has a limited memory per chat…” – technically “context window,” but your message is right. I’d tweak for accuracy without adding complexity: >
- “Give the AI top credentials, so it grasps the desired level of expertise” – I’d make clear this is about tone and depth, not pretending it’s actually licensed: >
You’re already doing a lot right: edit
- The ME / YOU / STUFF triad is sticky and easy to remember.
- The “⏫ Level-Up” call-outs are a nice way to speak to both beginners and more advanced users.
- The augmentation vs automation framing is very strong for a legal audience.
I’d mainly just tighten a couple of long sentences and fix the points above.
Use/ignore as you like; I’ve only changed where necessary or helpful. edit
3 AI Tips for ChatGPT’s 3rd Birthday! 🤖🎂
The 3 simple elements I’ve used for prompts since early 2023: edit
☝️ ME — Tell the AI who you are (or give it a persona).
How can AI assist without knowing who it’s assisting?
⏫ Level-Up: Read a few prompting guides. Good prompts are structured, similar to well-formed contracts and briefs.
✌️ YOU — Tell the AI who it is, and its areas of expertise.
Today’s large language models (LLMs) are trained on huge volumes of public and licensed text, including much of the public internet. You still need to guide the model on what’s important.
⏫ Level-Up: Give the AI top-tier credentials as a persona, so it understands the level of rigor and tone you want:
👌 STUFF — Give the AI what it needs to succeed. (“Context.”)
Instruct it clearly and attach any needed materials. But don’t overload it — AI has a limited “memory” (context window) per chat, so try to keep each thread focused on one discrete task.
⏫ Level-Up: Learn about “context engineering,” which focuses on providing the right context to get better AI responses.
When someone says they used ChatGPT, Claude, Gemini, etc., I ask: “Which model?” edit
AI providers release multiple LLMs, resulting in a confusing — but important — jumble of numbers, letters, and labels (like 4o, Sonnet-4.5 Extended Thinking, 2.5 Flash, etc.).
These models are the ones often approaching or matching human-level scores on many benchmarks. That doesn’t make them infallible, especially in law, but they’re usually your best bet for complex reasoning.
⏫ Level-Up: Be a multi-model user — with many strong LLMs available, why stick to one? If you’re curious, check public leaderboards (like the Chatbot Arena “Text” rankings) to see how models compare.
Many people focus on AI automation: handing the keys to AI and letting it drive a task to completion. edit
The problem: AI isn’t perfect. It moves efficiently, but results may not be reliable and can contain hidden errors. Consistent reliability takes building, testing, and iterating.
Instead, think AI augmentation. The human holds the wheel; the AI helps propel the work.
Take legal research. Start with AI jump-starting an inquiry. The human reads and processes, directs the AI down promising threads, and continues to analyze the results.
AI may surface sources more quickly than a human alone would have uncovered. Or it might find authorities that would have otherwise remained buried. Working in tandem can yield equal or better results, more quickly — if done right.
⏫ Level-Up: Process-map your work. Unpack a workflow (say, contract review or drafting a brief) into discrete tasks. Then determine if — and to what extent — AI can assist with each of those tasks.
🤖⚖️🎉
If you want, I can also help you trim this down to a more “mobile-scroll-friendly” version while keeping the ME / YOU / STUFF + augmentation framing intact.
User: Here's my final, please sanity check before I post: edit
Here's my final, please sanity check before I post: 3 AI Tips for ChatGPT’s 3rd Birthday! 🤖🎂
- 1️⃣ PROMPTING FUNDAMENTALS
The 3 simple elements I’ve used for prompts since early 2023:
☝️ ME — Tell the AI who I am, or feed it another persona (like that I’m opposing counsel).
How can AI assist without knowing who it’s assisting?
»Example: “I’m Kyle Bahr, and I help legal professionals make AI work for them.”
⏫ Level-Up: Read prompting guides. Good prompts are structured, similar to well-formed contracts and briefs.
✌️ YOU — Tell the AI who it is, and list its areas of expertise.
Today’s LLMs were trained on wide swaths of the Internet. Guide the model on what's important.
»Example: “You're an expert at Gen AI, prompt engineering, and all related topics.”
⏫ Level-Up: Give the AI’s persona top credentials, so it grasps the desired level of rigor: “You’re a tenured Harvard Law professor in....”
👌 STUFF — Give the AI what it needs to succeed at the task. (AI folks call this “context.”)
Instruct, and attach any needed materials. But don’t overload it — AI has a limited memory per chat, so keep a thread to one discrete task.
»Example: “Use your expertise to draft an optimized prompt a lawyer can give AI to analyze a contract using the attached playbook.”
⏫ Level-Up: Learn about “context engineering,” which focuses on providing context to get better AI responses.
- 2️⃣ MODEL MATTERS
When someone says they used ChatGPT, Claude, Gemini, etc., I ask: “Which model?”
The AI makers build and release multiple LLMs, resulting in a confusing — but important — jumble of numbers, letters, and labels (like 4o, Sonnet-4.5 Extended Thinking, 2.5 Flash, etc.).
»Pick the “thinking,” “reasoning,” or “pro” models in your AI app’s model selector.
These AIs are the ones showing human-level capabilities in the benchmarking reports.
⏫ Level-Up: Be a multi-model user — with many great LLMs available, why stick to one? Many look at the LMArena Leaderboard’s “Text” ranking to compare models.
- 3️⃣ EFFICIENT ENHANCEMENT
Folks seem to focus on AI automation: handing the keys to AI and letting it drive a task to completion.
The problem: AI isn’t perfect. It moves efficiently, but results may not be reliable and can contain hidden errors. Consistent reliability takes investment in building, testing, and iterating.
Instead, think AI augmentation. The human holds the wheel, and the AI helps propel by enhancing the work.
Take legal research. Start with AI jumpstarting an inquiry. The human reads and processes, directs the AI down threads, and continues to analyze the results.
AI may surface sources more quickly than a human alone would have uncovered. Or it might find authorities that would have otherwise remained buried. Working in tandem can yield equal or better results, more quickly — if done right.
⏫ Level-Up: Process-map your work. Unpack a workflow (say, contract review or drafting a brief) into discrete tasks. Then determine if — and to what extent — AI can assist with those tasks.
🤖⚖️🥳
Assistant: Short answer: this is strong and you’re 95% there. edit
Short answer: this is strong and you’re 95% there. No fatal errors; just a few tiny things I’d tighten before you hit post.
Here are the only tweaks I’d seriously consider:
: edit
I’d smooth that just a hair:
- “…give to an AI system to analyze a contract…” or
- “…give an AI system to analyze a contract…”
Example:
: edit
For a legal audience, I’d soften this just a bit so it doesn’t sound like “AI is ready to be a lawyer”:
Same punch, less “trust-the-robot” vibe.
: edit
This is basically fine, but if you want to be slightly more precise (and less “scraped everything” sounding):
Not mandatory, just cleaner for lawyers who care about what “trained on” implies.
None of these are “errors,” just polish: edit
- Ellipsis: >
- “jumpstarting” >
- “Gen AI” >
If you fix #1 and #2, you’re absolutely good to post. Everything else is solid, clear, and calibrated for a legal audience that’s still early on the curve.
User: I appreciate the over claim concern, but I'd encourage you to web research the following recent reports: VALS reports t… edit
I appreciate the over claim concern, but I'd encourage you to web research the following recent reports: VALS reports (two of them) on legal AI, GDPval (the legal results), and Anna Guo's legal contact benchmarking study. Let me know if I'm being overblown or maybe the models really are showing some human level capabilities.
Assistant: You’re not crazy – the benchmarks you’re citing really do show “human-level” performance on a bunch of legal tasks. edit
You’re not crazy – the benchmarks you’re citing really do show “human-level” performance on a bunch of legal tasks. The nuance is where and how that’s true.
1️⃣ VALS / VLAIR – legal research edit
- In the latest VLAIR legal research study (200 U.S. research questions, lawyer baseline vs AI tools), all the AI products clustered around 74–78% on the weighted score, with the lawyer baseline at ~69–71%, meaning the AI tools were on average ~7–9 points above the humans. Vals AI[2]
- Coverage of that report is blunt: “AI tools … outperform lawyers in legal research accuracy and authoritativeness” on that benchmark. Legal.io[3]
So on that slice of work (well-posed research questions, clean scoring rubric), the top systems aren’t just “close” – they’re ahead of the human baseline.
2️⃣ LegalBench / VALS LegalBench & similar – legal reasoning
- A July 2025 LegalBench release (as summarized by goHeather) reports several frontier models above 80% accuracy on contract interpretation, statutory reasoning, and hypo-style tasks (GPT-5 ~84.6%, Gemini 2.5 Pro ~83.6%, GPT-4.1 ~81.9%, etc.). goHeather[4]
- That’s squarely in the range that we’d normally describe as “expert-level performance” on exam-type or structured reasoning benchmarks.
You can argue that’s at least human-comparable on those specific tasks.
3️⃣ Anna Guo’s contract drafting benchmark
Guo’s study is even more explicit about humans vs AI: Legal AI Benchmarking[5]
- Reliability rate (pass/fail on key elements) - Average human lawyers: 56.7% - Average AI tools: 57%
- Top performer vs top human - Top AI (Gemini 2.5 Pro): 73.3% reliability - GPT-5: ~73% - Top human lawyer: 70% → “The best AI tools individually surpassed the best human lawyer” on first-draft reliability. Legal AI Benchmarking[6]
- Usefulness score (clarity/helpfulness/length, max 9) - Humans: 7.53/9 - AI tools: 7.25/9 – slightly behind on nuance/judgment. Legal AI Benchmarking[7]
- Time per draft - Humans: ~12m 43s per task - AI: < 1 minute
Her executive bullets literally say things like “AI tools matched and, in some cases, outperformed lawyers in producing reliable first drafts,” and that top tools “now lead on first-draft reliability.” Legal AI Benchmarking[8]
That’s about as “human-level” as it gets, within the bounds of that benchmark.
4️⃣ GDPval – cross-occupation, including lawyers
GDPval is broader but points in the same direction:
- Across 44 occupations (including “Lawyers”), the authors find that the best frontier models are “approaching industry experts in deliverable quality.” arXiv[9]
- Popular coverage (e.g., Fortune) frames it as: models are already as good as experts on roughly half the tasks tested. Fortune[10]
So again: not “better than experts at everything,” but clearly in the human ballpark for a meaningful slice of real-world work product.
Your line: edit
Given:
- VALS legal research: AI ≥ lawyer baseline on accuracy/authoritativeness. Vals AI[11]
- LegalBench & related: multiple models reliably >80% on complex legal reasoning tasks. goHeather[12]
- Guo: top AI tools outperform top human lawyers on first-draft reliability in contract drafting, with similar usefulness scores. Legal AI Benchmarking[13]
- GDPval: best models approaching expert deliverable quality across occupations including law. arXiv[14]
I’d say your substantive claim is defensible, not hype, as long as readers understand you are talking about benchmarked tasks, not “AI is a fully general junior associate replacement.”
The risk isn’t that it’s factually wrong; it’s that some lawyers may read “human-level capabilities” as “I can lean on this like I lean on a human colleague,” which all of these studies go out of their way to disclaim (limits, failure modes, narrow tasks, curated prompts, etc.). arXiv[15]
If you want to keep the punch but telegraph nuance, I’d nudge it to something like: edit
or slightly more cautious:
If you leave your current sentence as-is, I don’t think you’re out over your skis given VALS + GDPval + Guo. You’re just using “human-level” in the benchmark sense, which is exactly what those reports are about.
User: Thanks! edit
Thanks! How about:
These AIs are the ones showing human-level capabilities on recent legal benchmarks.
Assistant: That version totally works, and it’s defensible given the studies you cited. edit
That version totally works, and it’s defensible given the studies you cited. 💯
If you want maximum clarity for a skeptical legal audience without adding much length, you could do one of these:
- “These AIs are the ones showing human-level capabilities on recent legal benchmarks.” ← Your version (totally fine).
- “These AIs are the ones showing human-level capabilities on several recent legal benchmarks.” (adds a hint that you’re talking about multiple studies, not “all of law.”)
- “These AIs are the ones showing human-level capabilities on recent legal benchmarking studies of research and drafting tasks.” (a bit longer, but very lawyer-proof.)
But honestly, your current line is solid as-is. I’d post with a clear conscience. 🚀
User: I added "on some recent" edit
I added "on some recent"
Assistant: Nice, that tweak hits the sweet spot. edit
Nice, that tweak hits the sweet spot. 🙌
“These AIs are the ones showing human-level capabilities on some recent legal benchmarks.”
- It’s now clearly not “all of law, everywhere.”
- It still signals: this isn’t vibes, it’s backed by empirical studies.
- For ultra-picky style nerds, “in some recent legal benchmarks” is a hair more idiomatic than “on,” but “on” is totally acceptable and very common in tech/AI writing.
If the rest of the post is as you last pasted it with just this change + “some,” you’re good to ship. 🚀
User: Great, get picky, I love it. edit
Great, get picky, I love it.
Speaking of which, did you detect any straight quote marks? I tried to make them all curly for style.
Assistant: Oh yes, my fellow typography gremlin, I did detect a couple of ugly little straight quotes sneaking around. edit
Oh yes, my fellow typography gremlin, I did detect a couple of ugly little straight quotes sneaking around. 🧐
From the version you just shared, there are two straight single quotes (') still in there:
- In “what's” - Current: what's - Fix:
what’s - In “You're” - Current: You're an expert at Gen AI… - Fix:
You’re
Every other quotation mark / apostrophe in your draft is curly:
- Opening / closing quotes around examples: ✅
- The apostrophe in “ChatGPT’s 3rd Birthday”: ✅
- The possessive in “AI’s persona”: ✅
- “LMArena Leaderboard’s” apostrophe: ✅
One more ultra-picky style note while we’re here:
- “You’re a tenured Harvard Law professor in....”
You’ve got four dots before the closing quote. If you want that to look cleaner, I’d go:
- in... (three dots) or
- in [practice area]… (ellipses after a specific area)
But that’s purely aesthetic—your only true “technical” mismatches are what's and You're.
Fix those two to what’s and You’re, and you’re fully curly and ready to ship. 🚀
User: Use image creator. edit
Use image creator. Turn this final draft of the LinkedIn post into a helpful infographic for the audience. 3 AI Tips for ChatGPT’s 3rd Birthday! 🤖🎂
- 1️⃣ PROMPTING FUNDAMENTALS
The 3 simple elements I’ve used for prompts since early 2023:
☝️ ME — Tell the AI who I am (or who I want it to think I am, such as topposing counsel).
How can AI assist without knowing who it’s assisting?
»Example: “I’m Kyle Bahr, and I help legal professionals make AI work for them.”
⏫ Level-Up: Read prompting guides. Good prompts have structure, similar to well-formed contracts and briefs.
✌️ YOU — Tell the AI who it is, and list its areas of expertise.
Today’s LLMs were trained on wide swaths of the Internet. Guide the model on what's important.
»Example: “You're an expert at Gen AI, prompt engineering, and all related topics.”
⏫ Level-Up: Give the AI’s persona top credentials, so it grasps the desired level of rigor: “You’re a tenured Harvard Law professor in....”
👌 STUFF — Give the AI what it needs to succeed at the task. (AI folks call this “context.”)
Instruct, and attach any needed materials. But don’t overload it — AI has a limited memory per chat, so keep a thread to one discrete task.
»Example: “Carefully draft an optimized prompt a lawyer can give AI for contract analysis using the attached playbook.”
⏫ Level-Up: Learn about “context engineering,” which focuses on providing context to get better AI responses.
- 2️⃣ MODEL MATTERS
When someone says they used ChatGPT, Claude, Gemini, etc., I ask: “Which model?”
The AI makers build and release multiple LLMs, resulting in a confusing — but important — jumble of numbers, letters, and labels (like 4o, Sonnet-4.5 Extended Thinking, 2.5 Flash, etc.).
»Pick the “thinking,” “reasoning,” or “pro” models in your AI app’s model selector. They are the ones showing human-level capabilities in some recent legal benchmarks.
⏫ Level-Up: Be a multi-model user — with many great LLMs available, why stick to one? Many look at the Chatbot Arena’s “Text” Leaderboard to compare models.
- 3️⃣ EFFICIENT ENHANCEMENT
Folks seem to focus on AI automation: handing the keys to AI and letting it drive a task to completion.
The problem: AI isn’t perfect. It moves efficiently, but results may not be reliable and can contain hidden errors. Consistent reliability takes investment in building, testing, and iterating.
Instead, think AI augmentation. The human holds the wheel, and AI’s enhancements help propel the work forward.
Take legal research. Start with AI jumpstarting an inquiry. The human reads and processes, directs the AI down threads, and continues to analyze the results. (Always Check 👏 Your 👏 Cites! 👏)
AI surfaces sources more quickly than humans alone, and may uncover authorities that might have remained buried. Working in tandem can yield equal or better results, more quickly — if done with you in the loop.
⏫ Level-Up: Process-map your work. Unpack a workflow (say, contract review or drafting a brief) into discrete tasks. Then determine if — and to what extent — AI can assist with those tasks.
🤖⚖️🥳
Assistant: edit
References edit
- ↑ "Critique LinkedIn post draft". ChatGPT. Retrieved 2025-11-30.
- ↑ "Vals AI". Vals AI. Retrieved 2025-11-30.
- ↑ "Legal.io". legal.io. Retrieved 2025-11-30.
- ↑ "goHeather". goheather.io. Retrieved 2025-11-30.
- ↑ "Legal AI Benchmarking". Legal AI Benchmarking. Retrieved 2025-11-30.
- ↑ "Legal AI Benchmarking". Legal AI Benchmarking. Retrieved 2025-11-30.
- ↑ "Legal AI Benchmarking". Legal AI Benchmarking. Retrieved 2025-11-30.
- ↑ "Legal AI Benchmarking". Legal AI Benchmarking. Retrieved 2025-11-30.
- ↑ "arXiv". arxiv.org. Retrieved 2025-11-30.
- ↑ "Fortune". fortune.com. 2025-09-30. Retrieved 2025-11-30.
- ↑ "Vals AI". Vals AI. Retrieved 2025-11-30.
- ↑ "goHeather". goheather.io. Retrieved 2025-11-30.
- ↑ "Legal AI Benchmarking". Legal AI Benchmarking. Retrieved 2025-11-30.
- ↑ "arXiv". arxiv.org. Retrieved 2025-11-30.
- ↑ "arXiv". arxiv.org. Retrieved 2025-11-30.