Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
freem
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Openai/691a41cd-2efc-800c-9eff-de439224a90d
(section)
Add languages
Page
Discussion
English
Read
Edit
Edit source
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
Edit source
View history
General
What links here
Related changes
Special pages
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==== Think of it this way: ==== * JSON: “Here is the exact structure and all the data. Please parse it.” * YAML / TOON: “Same idea, but with a bit less syntax fat.” * LoreTokens: “You already know the structure, the domain, the style, and the rules. Here’s a tiny semantic pointer—go do the right thing.” Same universe, completely different game. ===== JSON / YAML / TOON ===== * Optimize syntax. * They reduce characters around field names and nesting, but they still ship: - Every key ("users", "name", "role", etc.) - Every structural marker ({}, [], :, ,, dashes, line breaks) * Great as a serialization format, but they’re describing layout more than meaning. LoreTokens * Optimize meaning. * A single LoreToken can encode: - Domain: neurology, endocrinology, trading, whatever. - Task: explain, summarize, classify, generate training data, etc. - Style: formal, playful, bulleted, academic, patient-friendly, etc. - Constraints: safety, ethics, level of detail, formatting. * Instead of shipping full schemas and instructions, you ship a tiny semantic handle that tells the model: > Result: JSON/TOON compress how we draw the boxes. LoreTokens compress what those boxes ''mean''. ===== JSON / YAML / TOON were built so humans and traditional programs can: ===== * Read them easily * Validate them with rigid schemas * Serialize/deserialize consistently LLMs don’t care about curly braces or commas—they care about patterns of concepts. LoreTokens are: * AI-native control language They plug directly into the model’s internal semantic space. No extra “OK, now turn this schema into instructions” step. * One hop away from behavior Instead of: > you get: > That cuts both latency and compute overhead: fewer tokens to read, fewer mapping steps. ===== Take your screenshot example: ===== * JSON: lots of quotes, braces, repeated keys * TOON: slimmer, but still carrying explicit field names and structure Now imagine: * Repeating that structure in millions of prompts * Or packing huge configs / logs / memories into an LLM’s context window With LoreTokens: * The schema (“users table with id / name / role”) can live as an internal concept or a separate definition. * A single short LoreToken can mean: > Instead of pasting a giant block of JSON/TOON every time, you send something closer to: <syntaxhighlight lang="text">LT:users.v3.filter:admins.summarize:short </syntaxhighlight> (Stylistic example, not real syntax.) That’s dozens to hundreds of tokens saved per request, multiplied by: * Every agent * Every call * Every second This is where power savings and throughput gains become very real. ===== With JSON / TOON training data: ===== * Every example carries all its structural baggage: - Repeated field names - Repeated braces/brackets/punctuation * The model has to: 1. Learn the format (JSON vs TOON vs YAML) 2. Learn the mapping from that format → meaning 3. Only then learn the actual task (classification, reasoning, generation…) This wastes a lot of your training budget on “remember how this format works.” With LoreTokens as part of training: * You teach the model once: - "LT_MED_NEURO_DOC_V1" → “long, structured, clinically sound explanation of neurology for non-experts” - "LT_TRADE_SIGNAL_V3" → “compressed multi-indicator trading decision with confidence + reasoning” * After that, every single training sample that uses that LoreToken: - Reinforces the same concept - Adds nuance (edge cases, special phrasing, exceptions) - Doesn’t have to re-explain the structure or instructions Benefits: # Higher semantic density per token More gradient signal dedicated to “what this concept is” instead of “remember the punctuation.” # Better generalization Many samples sharing the same LoreToken cause its embedding to become a dense, high-quality “concept node” in the model. # Smaller datasets for the same capability Because instructions and structure are factored out into reusable semantic handles, you can cover more ground with fewer raw tokens. TL;DR: JSON/TOON help encode data. LoreTokens help encode behavior and understanding. ===== In complex AI systems (like Nova, or multi-agent orchestration): ===== * You need: - Configs - Role definitions - Safety rails - Specialized skills - Cross-agent messages JSON / TOON approach: * Every message: chunk of structured text * Each agent: has to parse, interpret, and reprompt from it * Lots of repeated boilerplate in context every time LoreToken approach: * Agents pass around semantic tickets, e.g.: - “LT_CASE:diabetes.v4.summary_for_patient” - “LT_LOG:trade.2025-02-06.review_shortfall” - “LT_SECURITY:high_risk_injection” The receiving model: * Already knows what each of those handles means (from prior training / configuration) * Can jump straight into the right format and reasoning pattern That gives you: * Lower bandwidth between agents * Smaller contexts * Faster, more reliable behavior handoff ===== TOON’s claim: ===== * As data structures grow, TOON stays slimmer than JSON/YAML by reducing punctuation and keywords. True for structure-only compression. LoreTokens sidestep the problem: * You don’t keep nesting more text to show structure. * You define one LoreToken (or a tiny family of them) that represent: - A template - A procedure - A schema - A whole workflow * Deeply nested logic is handled inside the model itself (or the LoreToken spec), not exploded into literal text every time. So instead of: <syntaxhighlight lang="toon">users[1000]{id,name,role}: ... </syntaxhighlight> You might have: <syntaxhighlight lang="text">LT_DATASET:users.v5.1000rows </syntaxhighlight> The semantics of the 1000 rows live in the model / backend; the LoreToken is the high-compression reference. ===== LoreTokens don’t kill JSON/YAML/TOON; they wrap and outrank them in AI-heavy layers. ===== * Use JSON / TOON when: - Interfacing with traditional APIs - Storing data in files or DBs - You need strict, human-auditable schema * Use LoreTokens when: - Talking to and between AI systems - Orchestrating behavior, style, and intent - Compressing memories, logs, schemas, and prompts into semantic “atoms” In other words: : If you want, next step we can: * Take a single medical or trading LoreToken, * Sketch its “JSON view,” “TOON view,” and “LoreToken view,” * And show exactly how many characters / tokens each approach burns for the same effective capability.
Summary:
Please note that all contributions to freem are considered to be released under the Creative Commons Attribution-ShareAlike 4.0 (see
Freem:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)