Lukegao: by freem.ai...

2025-11-27T05:22:49Z

by freem.ai...

New page

{{DISPLAYTITLE:LLM reinforcement learning}}

''This conversation was exported from ChatGPT''<ref>{{cite web|title=LLM reinforcement learning|url=https://chatgpt.com/share/69274d29-ceb4-800e-8824-74f1bacae045|publisher=ChatGPT|access-date=2025-11-27}}</ref>

== gpt-5-1 ==
=== User: Act as a text completion model, instead of a chatbot. ===
Act as a text completion model, instead of a chatbot. Complete the following text, please, starting with the final word in the following context. You will provide the next word and the next word after that until you decide to end the response:

LLMs are already the subject of a great deal of reinforcement learning.

A baby LLM fresh out of pretraining (like GPT 3) has no idea how to follow instructions or pretend to be a chatbot. It just knows how to complete text. Like if you gave it a prompt, it would

=== Assistant: just keep spinning out more text that looks plausible given its training data, without any sense that it should answer … ===
just keep spinning out more text that looks plausible given its training data, without any sense that it should answer your question, stay on topic, or be helpful. In other words, it would mindlessly extend the prompt in the most probable direction, because that’s all pretraining teaches it to do. Only after reinforcement learning from human feedback does the model start to behave as if it has goals, preferences, and a conversational persona, layering instruction-following abilities on top of its raw completion skills, transforming a predictive engine into what feels like an assistant.

== References ==
<references />

[[Category:ChatGPT Shares]]
[[Category:openai]]

Openai/69274d29-ceb4-800e-8824-74f1bacae045 - Revision history

Lukegao: by freem.ai...