Skip to main content

Models

Hindsight uses several machine learning models for different tasks.

Overview

Model TypePurposeDefaultConfigurable
EmbeddingVector representations for semantic searchBAAI/bge-small-en-v1.5Yes
Cross-EncoderReranking search resultsms-marco-MiniLM-L-6-v2Yes
LLMFact extraction, reasoning, generationProvider-specificYes

All local models (embedding, cross-encoder) are automatically downloaded from HuggingFace on first run.


Embedding Model

Converts text into dense vector representations for semantic similarity search.

Default: BAAI/bge-small-en-v1.5 (384 dimensions, ~130MB)

Alternatives:

ModelDimensionsUse Case
BAAI/bge-small-en-v1.5384Default, fast, good quality
BAAI/bge-base-en-v1.5768Higher accuracy, slower
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2384Multilingual (50+ languages)

Configuration:

export HINDSIGHT_API_EMBEDDING_MODEL=BAAI/bge-base-en-v1.5
export HINDSIGHT_API_EMBEDDING_DEVICE=cuda # or mps for Apple Silicon
export HINDSIGHT_API_EMBEDDING_BATCH_SIZE=64

Cross-Encoder (Reranker)

Reranks initial search results to improve precision.

Default: cross-encoder/ms-marco-MiniLM-L-6-v2 (~85MB)

Alternatives:

ModelUse Case
ms-marco-MiniLM-L-6-v2Default, fast
ms-marco-MiniLM-L-12-v2Higher accuracy
mmarco-mMiniLMv2-L12-H384-v1Multilingual

Configuration:

export HINDSIGHT_API_RERANK_MODEL=cross-encoder/ms-marco-MiniLM-L-12-v2
export HINDSIGHT_API_RERANK_TOP_K=50 # How many results to rerank
export HINDSIGHT_API_RERANK_ENABLED=true # Set to false to disable

LLM

Used for fact extraction, entity resolution, opinion generation, and answer synthesis.

Supported providers: Groq, OpenAI, Ollama

ProviderRecommended ModelBest For
Groqgpt-oss-20bFast inference, high throughput (recommended)
OpenAIgpt-4o-miniGood quality, cost-effective
OpenAIgpt-4oBest quality
Ollamallama3.1Local deployment, privacy

Configuration:

# Groq (recommended)
export HINDSIGHT_API_LLM_PROVIDER=groq
export HINDSIGHT_API_LLM_API_KEY=gsk_xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=openai/gpt-oss-20b

# OpenAI
export HINDSIGHT_API_LLM_PROVIDER=openai
export HINDSIGHT_API_LLM_API_KEY=sk-xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=gpt-4o-mini

# Ollama (local)
export HINDSIGHT_API_LLM_PROVIDER=ollama
export HINDSIGHT_API_LLM_BASE_URL=http://localhost:11434/v1
export HINDSIGHT_API_LLM_MODEL=llama3.1

Note: The LLM is the primary bottleneck for write operations. See Performance for optimization strategies.


Model Comparison

ProviderModelSpeedQualityCost
Groqgpt-oss-20bFastGoodFree tier
OpenAIgpt-4o-miniMediumGood$0.15 / $0.60 per 1M tokens
OpenAIgpt-4oSlowerBest$2.50 / $10.00 per 1M tokens
Ollamallama3.1VariesGoodFree (local)