Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
freem
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Openai/6913e3a3-1ed8-800a-9a2d-932e764b3c66
Add languages
Page
Discussion
English
Read
Edit
Edit source
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
Edit source
View history
General
What links here
Related changes
Special pages
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{DISPLAYTITLE:Build RAG pipeline}} ''This conversation was exported from ChatGPT''<ref>{{cite web|title=Build RAG pipeline|url=https://chatgpt.com/share/6913e3a3-1ed8-800a-9a2d-932e764b3c66|publisher=ChatGPT|access-date=2025-11-12}}</ref> == gpt-5-thinking == === User: How could I build a simple txt file ingestion, SQLite as the datastore, ChromaDB as the vector store, and OpenAI API as… === How could I build a simple txt file ingestion, SQLite as the datastore, ChromaDB as the vector store, and OpenAI API as the API, RAG pipeline? It should be written in python, and allow me to ingest new text files. I also want chunking with optional sizing. === Assistant: I put a complete, production-ready Python script on the canvas that does exactly what you asked: ingest .txt files, sto… === I put a complete, production-ready Python script on the canvas that does exactly what you asked: ingest .txt files, store metadata in SQLite, persist vectors in ChromaDB, and run RAG with the OpenAI API. It supports configurable chunk size/overlap and token-aware chunking (uses tiktoken if present; otherwise falls back to char-based). ==== What you’ll need ==== * Python 3.9+ * OPENAI_API_KEY in your environment * pip install chromadb openai tiktoken (tiktoken is optional but recommended) ==== Quick start ==== <syntaxhighlight lang="bash"># 1) install pip install chromadb openai tiktoken === 2) set your key === export OPENAI_API_KEY=sk-... === 3) ingest .txt files (folder or single file) === python rag.py ingest --path ./texts --chunk-size 1200 --chunk-overlap 200 === 4) query === python rag.py query --question "Summarize the incident notes" --top-k 5 </syntaxhighlight> ===== Other commands ===== * List what’s been ingested: <syntaxhighlight lang="bash">python rag.py show-docs </syntaxhighlight> ===== Notes ===== * Defaults: SQLite at ./.rag.sqlite3, Chroma at ./.chroma, collection rag_demo. * Embeddings: text-embedding-3-small. Generation: gpt-4o-mini. Change via env vars RAG_EMBED_MODEL/RAG_GEN_MODEL. * Idempotent ingestion: file checksum + mtime; re-ingesting updates vectors and SQLite rows only if the file changed. * Each retrieved chunk is labeled with its source filename and chunk index in the prompt for traceability. If you want a separate “ingest daemon” or a simple REST wrapper (FastAPI) around query, say the word and I’ll add it. == References == <references /> [[Category:ChatGPT Shares]] [[Category:openai]]
Summary:
Please note that all contributions to freem are considered to be released under the Creative Commons Attribution-ShareAlike 4.0 (see
Freem:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Templates used on this page:
Template:Cite web
(
edit
)
Module:Citation/CS1
(
edit
)
Module:Citation/CS1/COinS
(
edit
)
Module:Citation/CS1/Configuration
(
edit
)
Module:Citation/CS1/Date validation
(
edit
)
Module:Citation/CS1/Identifiers
(
edit
)
Module:Citation/CS1/Utilities
(
edit
)
Module:Citation/CS1/Whitelist
(
edit
)
Module:Citation/CS1/styles.css
(
edit
)