Quick Start

Get up and running with Hindsight in 60 seconds.

Start the Server

pip (API only)
Docker (Full Experience)

pip install hindsight-all
export HINDSIGHT_API_LLM_PROVIDER=groq
export HINDSIGHT_API_LLM_API_KEY=gsk_xxxxxxxxxxxx

hindsight-api

API available at http://localhost:8888

docker run -p 8888:8888 -p 9999:9999 \
  -e HINDSIGHT_API_LLM_PROVIDER=groq \
  -e HINDSIGHT_API_LLM_API_KEY=gsk_xxxxxxxxxxxx \
  ghcr.io/vectorize-io/hindsight

API: http://localhost:8888
Control Plane (Web UI): http://localhost:9999

LLM Provider

Hindsight requires an LLM with structured output support. Recommended: Groq with gpt-oss-20b for fast, cost-effective inference. Also supports OpenAI and Ollama.

Use the Client

Python
Node.js
CLI

pip install hindsight-client

from hindsight_client import Hindsight

client = Hindsight(base_url="http://localhost:8888")

# Retain: Store information
client.retain(bank_id="my-bank", content="Alice works at Google as a software engineer")

# Recall: Search memories
client.recall(bank_id="my-bank", query="What does Alice do?")

# Reflect: Generate personality-aware response
client.reflect(bank_id="my-bank", query="Tell me about Alice")

npm install @vectorize-io/hindsight-client

const { HindsightClient } = require('@vectorize-io/hindsight-client');

const client = new HindsightClient({ baseUrl: 'http://localhost:8888' });

// Retain: Store information
await client.retain('my-bank', 'Alice works at Google as a software engineer');

// Recall: Search memories
await client.recall('my-bank', 'What does Alice do?');

// Reflect: Generate response
await client.reflect('my-bank', 'Tell me about Alice');

curl -fsSL https://raw.githubusercontent.com/vectorize-io/hindsight/refs/heads/main/hindsight-cli/install.sh | bash

# Retain: Store information
hindsight memory retain my-bank "Alice works at Google as a software engineer"

# Recall: Search memories
hindsight memory recall my-bank "What does Alice do?"

# Reflect: Generate response
hindsight memory reflect my-bank "Tell me about Alice"

What's Happening

Operation	What it does
Retain	Content is processed, facts are extracted, entities are identified and linked in a knowledge graph
Recall	Four search strategies (semantic, keyword, graph, temporal) run in parallel to find relevant memories
Reflect	Retrieved memories are used to generate a personality-aware response

Next Steps

Retain — Advanced options for storing memories
Recall — Search and retrieval strategies
Reflect — Personality-aware reasoning
Memory Banks — Configure personality and background
Server Deployment — Docker Compose, Helm, and production setup

Start the Server​

Use the Client​

What's Happening​

Next Steps​

Start the Server

Use the Client

What's Happening

Next Steps