← Blog

How Korvo's RAG Search Works on Your Local Files

7 min read

RAG - retrieval-augmented generation - is the technique that makes AI outputs actually useful. Instead of relying on the model's training data alone, RAG retrieves relevant chunks from your own documents and feeds them as context alongside your query. The result: grounded answers with citations from your actual source material.

Most RAG implementations require uploading your documents to a cloud vector database - Pinecone, Weaviate, Chroma Cloud. Your sensitive files get chunked, embedded, and stored on servers you don't control. Korvo does the entire pipeline locally. Here's how.

The problem with cloud RAG

Traditional RAG-as-a-service has a fundamental tension: to make AI answers better, you have to give away your data. The more documents you upload, the better the retrieval - and the more exposure you have.

  • Your documents are chunked and stored as embeddings on third-party servers.
  • The original text is often stored alongside embeddings for retrieval.
  • Vector databases are a high-value target - they contain the distilled knowledge of every customer.
  • You have no control over retention policies, geographic storage, or access controls.
  • Typical cost: $70–200/month for meaningful storage, on top of your AI subscription.

Korvo's local RAG pipeline

Korvo runs the entire RAG pipeline on your machine. No cloud vector databases. No document uploads. Here's the architecture, step by step:

1

Document ingestion

When you add files to a Korvo project - PDFs, markdown, plain text, Word documents - the app extracts the text content locally. No file is uploaded anywhere. The extracted text is stored in a local SQLite database within your project workspace.

2

Chunking

Documents are split into semantically meaningful chunks - typically 500–1000 tokens each, with overlap to preserve context across boundaries. Korvo uses a recursive splitting strategy that respects paragraph and section boundaries rather than cutting mid-sentence. Chunk metadata (source file, page number, section heading) is preserved for citation.

3

Embedding generation

Each chunk is converted into a vector embedding - a numerical representation of its semantic meaning. Korvo uses your configured AI provider's embedding endpoint (e.g., OpenAI's text-embedding-3-small). The embedding request goes directly from your machine to the provider. The resulting vectors are stored locally - never on our servers.

4

Local vector storage

Embeddings are indexed in a local vector store using HNSW (Hierarchical Navigable Small World) indexing - the same algorithm used by production vector databases, but running entirely on your device. This enables fast approximate nearest-neighbor search across thousands of chunks without any cloud dependency.

5

Query-time retrieval

When you ask a question or trigger Plan Mode, Korvo embeds your query using the same embedding model, then performs a similarity search against the local vector index. The top-k most relevant chunks are retrieved, ranked by cosine similarity, and injected into the prompt as grounding context.

6

Grounded generation with citations

The LLM receives your query plus the retrieved context and generates a response. Korvo's prompting instructs the model to cite its sources - producing inline references like [Source: pitch-deck.pdf, p.4] that link back to the exact chunk used. The output is grounded in your actual documents, not the model's general training data.

What this looks like in practice

Say you're doing due diligence on a deal. You upload the pitch deck, financial model, cap table, and three market research reports into a Korvo project. Here's what happens:

1.Korvo ingests and chunks all 6 documents locally. ~2000 chunks, indexed in under a minute.
2.You enter Plan Mode and ask: "What are the key risks in this deal based on the financials and market data?"
3.Korvo retrieves the 15 most relevant chunks across all documents - revenue projections from the model, market sizing assumptions from the reports, team data from the deck.
4.Plan Mode builds a structured reasoning plan: identify revenue risks, compare market assumptions to third-party data, flag missing diligence areas.
5.You approve the plan. Korvo generates a risk analysis with inline citations to specific pages and sections of your source documents.
6.Every citation is clickable - jump to the exact source chunk to verify the reasoning yourself.

The entire pipeline - ingestion, chunking, embedding, retrieval, generation - runs locally. The only network calls are embedding requests and the final LLM call, both going directly to your chosen provider via your API key.

Performance considerations

A common concern with local RAG is performance. In practice, it's fast:

Ingestion

~50 pages/sec

PDF extraction + chunking

Embedding

~200 chunks/sec

Via OpenAI API (batched)

Vector search

<50ms

Local HNSW index, 10k chunks

Full RAG query

2–8 sec

Retrieval + LLM generation

The bottleneck is almost always the LLM generation step - which is the same latency you'd have with a cloud RAG pipeline. The local retrieval step is actually faster because there's no network round-trip to a remote vector database.

Why citations matter

RAG without citations is just marginally better hallucination. If you can't trace an AI-generated claim back to a specific source, you can't trust it - and for high-stakes decisions, untraceable claims are worse than no claims at all.

Korvo's citation system isn't an afterthought. Every output produced through RAG includes source references. Every reference links to the actual chunk. And every chunk links back to the original file, page, and section. This is what we call full provenance - the ability to trace any conclusion back through the reasoning chain to its source material.

Local RAG vs. cloud RAG: the comparison

Cloud RAGKorvo (local)
Data locationThird-party serversYour device
PrivacyProvider-dependentEnforced by architecture
Cost$70–200/mo + AI sub$0 (uses your API key)
Search latency100–300ms (network)<50ms (local)
Offline accessNoFull index available
Vendor lock-inHigh (proprietary index)None (standard formats)

The bottom line

RAG is what makes AI useful for real work - but the standard implementation requires surrendering your documents to cloud infrastructure. Korvo proves it doesn't have to.

Local ingestion. Local embeddings. Local vector search. Direct-to-provider generation. Full citations. Zero cloud storage.

Your documents stay on your machine. Your AI outputs are grounded in your actual sources. And every conclusion is traceable.

Try local RAG in Korvo

Upload your files, ask questions, get cited answers - all on your machine. Free to start.

Download free