Quick Start
Get up and running with Hindsight in 60 seconds.
Start the Server
- pip (API only)
- Docker (Full Experience)
pip install hindsight-all
export HINDSIGHT_API_LLM_PROVIDER=groq
export HINDSIGHT_API_LLM_API_KEY=gsk_xxxxxxxxxxxx
hindsight-api
API available at http://localhost:8888
docker run -p 8888:8888 -p 9999:9999 \
-e HINDSIGHT_API_LLM_PROVIDER=groq \
-e HINDSIGHT_API_LLM_API_KEY=gsk_xxxxxxxxxxxx \
ghcr.io/vectorize-io/hindsight
- API: http://localhost:8888
- Control Plane (Web UI): http://localhost:9999
LLM Provider
Hindsight requires an LLM with structured output support. Recommended: Groq with gpt-oss-20b for fast, cost-effective inference. Also supports OpenAI and Ollama.
Use the Client
- Python
- Node.js
- CLI
pip install hindsight-client
from hindsight_client import Hindsight
client = Hindsight(base_url="http://localhost:8888")
# Retain: Store information
client.retain(bank_id="my-bank", content="Alice works at Google as a software engineer")
# Recall: Search memories
client.recall(bank_id="my-bank", query="What does Alice do?")
# Reflect: Generate personality-aware response
client.reflect(bank_id="my-bank", query="Tell me about Alice")
npm install @vectorize-io/hindsight-client
const { HindsightClient } = require('@vectorize-io/hindsight-client');
const client = new HindsightClient({ baseUrl: 'http://localhost:8888' });
// Retain: Store information
await client.retain('my-bank', 'Alice works at Google as a software engineer');
// Recall: Search memories
await client.recall('my-bank', 'What does Alice do?');
// Reflect: Generate response
await client.reflect('my-bank', 'Tell me about Alice');
curl -fsSL https://raw.githubusercontent.com/vectorize-io/hindsight/refs/heads/main/hindsight-cli/install.sh | bash
# Retain: Store information
hindsight memory retain my-bank "Alice works at Google as a software engineer"
# Recall: Search memories
hindsight memory recall my-bank "What does Alice do?"
# Reflect: Generate response
hindsight memory reflect my-bank "Tell me about Alice"
What's Happening
| Operation | What it does |
|---|---|
| Retain | Content is processed, facts are extracted, entities are identified and linked in a knowledge graph |
| Recall | Four search strategies (semantic, keyword, graph, temporal) run in parallel to find relevant memories |
| Reflect | Retrieved memories are used to generate a personality-aware response |
Next Steps
- Retain — Advanced options for storing memories
- Recall — Search and retrieval strategies
- Reflect — Personality-aware reasoning
- Memory Banks — Configure personality and background
- Server Deployment — Docker Compose, Helm, and production setup