Editing Openai/691a41cd-2efc-800c-9eff-de439224a90d (section)

===== With JSON / TOON training data: =====
* Every example carries all its structural baggage: - Repeated field names - Repeated braces/brackets/punctuation
* The model has to: 1. Learn the format (JSON vs TOON vs YAML) 2. Learn the mapping from that format → meaning 3. Only then learn the actual task (classification, reasoning, generation…)

This wastes a lot of your training budget on “remember how this format works.”

With LoreTokens as part of training:
* You teach the model once: - "LT_MED_NEURO_DOC_V1" → “long, structured, clinically sound explanation of neurology for non-experts” - "LT_TRADE_SIGNAL_V3" → “compressed multi-indicator trading decision with confidence + reasoning”
* After that, every single training sample that uses that LoreToken: - Reinforces the same concept - Adds nuance (edge cases, special phrasing, exceptions) - Doesn’t have to re-explain the structure or instructions

Benefits:
# Higher semantic density per token More gradient signal dedicated to “what this concept is” instead of “remember the punctuation.”
# Better generalization Many samples sharing the same LoreToken cause its embedding to become a dense, high-quality “concept node” in the model.
# Smaller datasets for the same capability Because instructions and structure are factored out into reusable semantic handles, you can cover more ground with fewer raw tokens.

TL;DR: JSON/TOON help encode data. LoreTokens help encode behavior and understanding.