📡 Ollama API: http://localhost:11434
🚀 RAG API: http://localhost:8000/docs
🌐 WebUI: http://localhost:3000
cd ~/rag-server
sudo pkill ollama
sudo systemctl stop ollama
docker compose down --remove-orphans --volumes --rmi all
docker rm -f webui-pi5 rag-agents-pi5 ollama-pi5 2>/dev/null
sudo mkdir -p /mnt/ragdata/ollama
sudo chown -R 1000:1000 /mnt/ragdata/ollama
docker-compose.yml:
services:
ollama:
image: ollama/ollama:latest
container_name: ollama-pi5
ports: [11434:11434]
volumes: [/mnt/ragdata/ollama:/root/.ollama]
environment: [OLLAMA_KEEP_ALIVE=1h]
restart: unless-stopped
rag-agents:
build: .
container_name: rag-agents-pi5
ports: [8000:8000]
environment: [OLLAMA_BASE_URL=http://ollama:11434]
depends_on: [ollama]
restart: unless-stopped
webui-pi5:
image: ghcr.io/open-webui/open-webui:main
container_name: webui-pi5
ports: [3000:8080]
volumes: [open-webui:/app/backend/data]
environment:
- OLLAMA_BASE_URL=http://ollama:11434
- WEBUI_AUTH=true
depends_on: [ollama]
restart: unless-stopped
volumes:
open-webui:
Dockerfile:
FROM python:3.12-slim
WORKDIR /app
COPY agent_api.py .
RUN pip install fastapi uvicorn[standard] requests pydantic -q
EXPOSE 8000
CMD ["uvicorn", "agent_api:app", "--host", "0.0.0.0", "--port", "8000"]
agent_api.py:
from fastapi import FastAPI
from pydantic import BaseModel
import requests, os
app = FastAPI()
OLLAMA_URL = os.getenv("OLLAMA_BASE_URL", "http://ollama:11434")
class ChatRequest(BaseModel):
messages: list
@app.get("/health")
def health(): return {"status": "healthy"}
@app.get("/test")
def test():
try: return requests.get(f"{OLLAMA_URL}/api/version", timeout=5).json()
except: return {"error": "Ollama unreachable"}
@app.post("/chat")
def chat(req: ChatRequest):
try:
r = requests.post(f"{OLLAMA_URL}/api/chat", json={
"model": "qwen2.5:3b-instruct-q4_K_M",
"messages": req.messages, "stream": False
}, timeout=120)
return r.json()
except Exception as e: return {"error": str(e)}
if __name__ == "__main__":
import uvicorn
uvicorn.run("agent_api:app", host="0.0.0.0", port=8000)
docker compose up -d --build
docker compose ps # 3/3 Up
docker exec -it ollama-pi5 ollama pull qwen2.5:3b-instruct-q4_K_M
curl http://localhost:11434/api/tags # Verify
curl http://localhost:8000/health # ✅ healthy
curl http://localhost:8000/test # ✅ Ollama version
curl -X POST http://localhost:8000/chat -H "Content-Type: application/json" -d '{"messages":[{"role":"user","content":"Pi5 specs?"}]}' # ✅ Chat
1. http://localhost:3000 → Sign up (admin@pi5)
2. Admin Panel → Connections → Ollama → http://ollama:11434 → Connect
3. Models → qwen2.5:3b... → New Chat → Ask away!
| Model | Size | Speed | RAM |
|---|---|---|---|
qwen2.5:3b-instruct-q4_K_M |
1.8GB | 15-25 t/s | 3GB |
gemma2:2b |
1.5GB | 25-40 t/s | 2GB |
| Issue | Fix |
|---|---|
| Port 11434 conflict | sudo pkill ollama |
ModuleNotFoundError |
docker compose build --no-cache |
Empty /api/tags |
docker exec ollama-pi5 ollama pull qwen2.5:3b... |
| WebUI no models | Admin → Connections → http://ollama:11434 |
| YAML errors | cat > docker-compose.yml (above) |
| Timeout 30s | timeout=120 in agent_api.py |
docker exec ollama-pi5 ollama pull gemma2:2b
# Edit agent_api.py model name
docker compose up -d --build rag-agents
Ebooks (OCR) + Wiki.js (Incremental) → Embeddings → ChromaDB → RAG Chat
cd ~/rag-server
sudo apt install tesseract-ocr poppler-utils # OCR deps
sudo mkdir -p /mnt/ragdata/ollama
sudo chown -R 1000:1000 /mnt/ragdata/ollama
docker compose down --remove-orphans --volumes --rmi all
docker-compose.yml:
services:
ollama:
image: ollama/ollama:latest
container_name: ollama-pi5
ports: [11434:11434]
volumes: [/mnt/ragdata/ollama:/root/.ollama]
environment: [OLLAMA_KEEP_ALIVE=1h]
restart: unless-stopped
rag-agents:
build: .
container_name: rag-agents-pi5
ports: [8000:8000]
environment:
- OLLAMA_BASE_URL=http://ollama:11434
- WIKI_URL=http://wiki.local:3001
- WIKI_TOKEN=your-wiki-token
depends_on: [ollama]
restart: unless-stopped
webui-pi5:
image: ghcr.io/open-webui/open-webui:main
container_name: webui-pi5
ports: [3000:8080]
volumes: [open-webui:/app/backend/data]
environment: [OLLAMA_BASE_URL=http://ollama:11434]
depends_on: [ollama]
restart: unless-stopped
volumes: [open-webui]
Dockerfile (OCR RAG):
FROM python:3.12-slim
RUN apt-get update && apt-get install -y tesseract-ocr poppler-utils && rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY agent_api.py .
RUN pip install fastapi uvicorn[standard] requests pydantic \
langchain-community langchain-chroma chromadb \
"unstructured[pdf]" ollama -q
EXPOSE 8000
CMD ["uvicorn", "agent_api:app", "--host", "0.0.0.0", "--port", "8000"]
docker compose up -d --build
docker compose ps # 3/3 Up
docker exec ollama-pi5 ollama pull qwen2.5:3b-instruct-q4_K_M
docker exec ollama-pi5 ollama pull nomic-embed-text
curl http://localhost:11434/api/tags # Verify
curl http://localhost:8000/health # healthy, rag_ready: true
curl -X POST http://localhost:8000/upload -F "[email protected]"
# → {"pages": 245, "chunks": 892}
curl -X POST http://localhost:8000/rag \
-d '{"question": "Chapter 3 summary?"}' \
-H "Content-Type: application/json"
# Get Wiki.js token: Admin → API Tokens
echo "WIKI_TOKEN=your-token" > .env
docker compose up -d --build rag-agents
curl -X POST http://localhost:8000/wiki # Full wiki
curl -X POST http://localhost:8000/wiki-sync
# → {"new_pages": 2, "updated_pages": 5}
curl -X POST http://localhost:8000/rag \
-d '{"question": "Wiki deploy + ebook ch3?"}'
# Blends wiki + books!
localhost:3000 → Admin → Connections → http://ollama:11434
→ Models: qwen2.5 → Chat: "Wiki + scanned book?"
wiki-cron.py:
#!/usr/bin/env python3
import requests
r = requests.post("http://localhost:8000/wiki-sync")
print(r.json())
chmod +x wiki-cron.py
crontab -e
# Daily 2AM
0 2 * * * cd ~/rag-server && ./wiki-cron.py
| Task | Time | Storage |
|---|---|---|
| 100pg scanned PDF | 10min | 100KB/pg |
| Wiki 200pg sync | 2min | 50KB/pg |
| Query (mixed) | 3-8s | - |
| Daily cron | 30s | - |
GET /health # RAG status
POST /upload # PDF (OCR auto)
POST /wiki # Full wiki ingest
POST /wiki-sync # Incremental wiki
POST /rag # Query ebooks+wiki
DELETE /wiki-cleanup?days=90 # Optional cleanup
~/rag-server/ # Project root
├── docker-compose.yml # Orchestrates 3 services
├── Dockerfile # Builds rag-agents (OCR RAG)
├── agent_api.py # FastAPI: /upload /rag /wiki-sync
├── wiki-cron.py # Standalone wiki sync (optional)
├── .env # WIKI_TOKEN (gitignored)
├── ebooks/ # 📚 Your scanned/text PDFs
│ ├── book1.pdf
│ └── book2.pdf
├── README.md # This doc!
└── data/ # Auto-created
├── chromadb/ # Vector DB (ebooks + wiki)
└── logs/ # Optional API logs
ragserver_ollama-data/ # /mnt/ragdata/ollama/models (models persist)
ragserver_open-webui/ # WebUI chats/settings
ragserver_chromadb/ # Embeddings (ebooks + wiki chunks)
docker-compose.ymlservices:
ollama:
image: ollama/ollama:latest
container_name: ollama-pi5
ports: [11434:11434]
volumes: [/mnt/ragdata/ollama:/root/.ollama]
environment: [OLLAMA_KEEP_ALIVE=1h]
restart: unless-stopped
rag-agents:
build: .
container_name: rag-agents-pi5
ports: [8000:8000]
env_file: .env
volumes: [./data/chromadb:/app/chromadb]
depends_on: [ollama]
restart: unless-stopped
webui-pi5:
image: ghcr.io/open-webui/open-webui:main
container_name: webui-pi5
ports: [3000:8080]
volumes: [open-webui:/app/backend/data]
environment: [OLLAMA_BASE_URL=http://ollama:11434]
depends_on: [ollama]
restart: unless-stopped
volumes: [open-webui]
.env (Secure tokens)WIKI_URL=http://wiki.local:3001
WIKI_TOKEN=wiki_js_your_admin_api_token_here
wiki-cron.py (Daily sync)#!/usr/bin/env python3
import requests, os
from datetime import datetime
os.environ["WIKI_URL"] = os.getenv("WIKI_URL")
os.environ["WIKI_TOKEN"] = os.getenv("WIKI_TOKEN")
r = requests.post("http://localhost:8000/wiki-sync")
print(f"{datetime.now()}: {r.json()}")
cd ~/rag-server
docker compose up -d --build # 3/3 services
docker exec ollama-pi5 ollama pull qwen2.5:3b-instruct-q4_K_M nomic-embed-text
# Ingest data
curl -X POST http://localhost:8000/wiki # Full wiki
curl -X POST http://localhost:8000/upload -F "file=@ebooks/book1.pdf"
# Query
curl -X POST http://localhost:8000/rag -d '{"question":"Wiki + book summary?"}' -H "Content-Type: application/json"
# WebUI: localhost:3000 → Admin → Ollama connect → Chat!
Wiki.js (API) ←→ /wiki-sync (daily cron)
↓
Ebooks (OCR PDFs) ←→ /upload
↓
ChromaDB (unified embeddings)
↓
/rag + WebUI queries
| Action | Command |
|---|---|
| Full restart | docker compose up -d --build |
| Wiki full | curl -X POST http://localhost:8000/wiki |
| Wiki incremental | curl -X POST http://localhost:8000/wiki-sync |
| Upload PDF | curl -X POST /upload -F [email protected] |
| Query RAG | curl -X POST /rag -d '{"question":"?"}' |
| Health | curl http://localhost:8000/health |
| Cleanup | curl -X DELETE "/wiki-cleanup?days=90" |
~/rag-server/ with 5 files → full ebook/wiki RAG running 🚀
This guide explains how to switch the Pi5 RAG server from ChromaDB to Qdrant as the vector database.
It assumes you already have:
ollama-pi5 + rag-agents-pi5 (FastAPI/LangChain) + ChromaDB.QdrantVectorStore integration for RAG pipelines, similar to Chroma’s API.[web:128][web:160]Because the current ChromaDB store is empty, the migration is simply “swap DBs and re‑ingest”.
Edit the existing docker-compose.yml in your rag-server folder.
Complete updated docker-compose.xml file.
services:
ollama:
image: ollama/ollama:latest
container_name: ollama-pi5
ports:
- 11434:11434
volumes:
- ollama-data:/root/.ollama
restart: unless-stopped
qdrant: # added qdrant service
image: qdrant/qdrant:latest
container_name: qdrant-pi5
ports:
- 6333:6333
volumes:
- /home/ssd/ragdata/qdrantdb:/qdrant/storage
restart: unless-stopped
rag-agents:
build: .
container_name: rag-agents-pi5
ports:
- 8000:8000
environment:
- OLLAMA_BASE_URL=http://ollama:11434
- QDRANT_URL=http://qdrant:6333
- QDRANT_COLLECTION=rag_docs
depends_on:
- ollama
restart: unless-stopped
webui-pi5:
image: ghcr.io/open-webui/open-webui:main
container_name: webui-pi5
ports:
- 3000:8080
volumes:
- open-webui:/app/backend/data
environment:
- OLLAMA_BASE_URL=http://ollama:11434
depends_on:
- ollama
restart: unless-stopped
volumes:
ollama-data: {} # ^f^p Declared here!
open-webui: {} # ^f^p Declared here!
Notes:
./data/qdrant is a host folder on the Pi SSD where Qdrant stores all vector data persistently.[web:159][web:162]6333 is the default HTTP API port; keep it exposed so rag-agents-pi5 can reach it.If you previously ran Chroma as a separate service, you can:
docker-compose.yml, and./data/chromadb folder as a backup or delete it once you’re sure Qdrant works.In your rag-server .env file (or environment section of rag-agents-pi5), add/update:
VECTOR_DB=qdrant
QDRANT_URL=http://qdrant-pi5:6333
QDRANT_COLLECTION=rag_docs
OLLAMA_BASE_URL=http://ollama-pi5:11434
EMBEDDING_MODEL=nomic-embed-text
LLM_MODEL=qwen2.5:0.5b-instruct-q4_K_M
These variables tell the RAG service to:
rag_docs.Inside the rag-agents FastAPI project (e.g., folder ./rag-agents), adjust the vector store code.
In requirements.txt:
langchain
langchain-community
langchain-qdrant
qdrant-client
Then rebuild the image:
cd ~/rag-server
docker compose build rag-agents-pi5
Old (Chroma-based) code might look like:
from langchain_community.vectorstores import Chroma
vectorstore = Chroma.from_documents(
documents=docs,
embedding=embedding,
persist_directory="data/chromadb",
)
Replace with Qdrant:
import os
from qdrant_client import QdrantClient
from qdrant_client.http.models import Distance, VectorParams
from langchain_qdrant import QdrantVectorStore
from langchain.embeddings.base import Embeddings
QDRANT_URL = os.getenv("QDRANT_URL", "http://qdrant-pi5:6333")
QDRANT_COLLECTION = os.getenv("QDRANT_COLLECTION", "rag_docs")
def get_embedding() -> Embeddings:
from langchain.embeddings import OllamaEmbeddings
return OllamaEmbeddings(
base_url=os.getenv("OLLAMA_BASE_URL", "http://ollama-pi5:11434"),
model=os.getenv("EMBEDDING_MODEL", "nomic-embed-text"),
)
def get_vectorstore() -> QdrantVectorStore:
embedding = get_embedding()
# Probe embedding dimension
test_vec = embedding.embed_query("dim_probe")
dim = len(test_vec)
client = QdrantClient(url=QDRANT_URL)
# Create or reset collection with correct dimension
client.recreate_collection(
collection_name=QDRANT_COLLECTION,
vectors_config=VectorParams(size=dim, distance=Distance.COSINE),
)
return QdrantVectorStore(
client=client,
collection_name=QDRANT_COLLECTION,
embedding=embedding,
)
Where you previously created or used Chroma, change to:
vectorstore = get_vectorstore()
# Example in /upload or /wiki endpoint:
vectorstore.add_documents(chunks)
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
You no longer need persist_directory; Qdrant handles persistence via the Docker volume.[web:128][web:160]
If your /rag endpoint used Chroma’s retriever, no major changes are required.
The standard LangChain retrieval chain works the same:
from langchain.llms import Ollama
from langchain.chains import RetrievalQA
llm = Ollama(
base_url=os.getenv("OLLAMA_BASE_URL", "http://ollama-pi5:11434"),
model=os.getenv("LLM_MODEL", "qwen2.5:0.5b-instruct-q4_K_M"),
)
vectorstore = get_vectorstore()
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
retriever=retriever,
return_source_documents=True,
)
Your FastAPI handler for /rag can remain the same, only now it talks to Qdrant under the hood.
From the rag-server directory:
docker compose up -d qdrant-pi5
docker compose up -d --build rag-agents-pi5
Confirm all containers are running:
docker ps
Quick health checks:
Qdrant:
curl http://localhost:6333/collections
Should show an empty or newly created rag_docs collection.
RAG API:
curl http://localhost:8000/docs
Should display FastAPI docs in a browser.
Because the old ChromaDB was empty, simply run your normal ingestion flows:
Ebooks:
curl -X POST http://localhost:8000/upload \
-F "file=@ebooks/book1.pdf"
Wiki.js (full sync):
curl -X POST http://localhost:8000/wiki
Incremental Wiki.js sync:
curl -X POST http://localhost:8000/wiki-sync
Each endpoint now writes embeddings directly to Qdrant instead of Chroma.
After ingestion, test a query:
curl -X POST http://localhost:8000/rag \
-H "Content-Type: application/json" \
-d '{"question": "What does the Pi5 RAG server do?"}'
You should see an answer along with source metadata referring to ebook pages or Wiki.js articles.
If answers look off:
curl http://localhost:6333/collectionslen(chunks) during ingestion.nomic-embed-text in both).[web:128][web:163]If in the future you ever need to migrate an existing Chroma collection to Qdrant instead of re‑ingesting, use the official Qdrant migration tool:[web:127][web:157]
docker run --net=host --rm -it \
registry.cloud.qdrant.io/library/qdrant-migration chroma \
--chroma.url=http://localhost:8000 \
--chroma.collection 'collection-name' \
--qdrant.url 'http://localhost:6333' \
--qdrant.collection 'rag_docs' \
--migration.batch-size 64
In the current Pi5 setup, this is not needed because Chroma has no data yet; simply re-run ingestion.
qdrant-pi5 service and removed Chroma-specific persistence.VECTOR_DB=qdrant, QDRANT_URL, and QDRANT_COLLECTION.Chroma to QdrantVectorStore, with QdrantClient and VectorParams.rag_docs collection.Your Raspberry Pi 5 RAG server now uses Qdrant as the backing vector database while keeping the rest of the architecture, endpoints, and workflows unchanged.