| title | description |
|---|---|
Development Guide |
How to set up, build, test, and contribute to go-rag — prerequisites, test patterns, coding standards, and extension points. |
This document covers everything needed to work on forge.lthn.ai/core/go-rag: setting up the required services, running tests, understanding the test architecture, and following the project's coding standards.
Go 1.26 or later. The module is part of a Go workspace (go.work) that resolves forge.lthn.ai/core/* dependencies via local paths. Ensure the sibling modules referenced in your workspace file are present and their go.mod files are consistent.
Two services are required for integration tests. Unit tests and mock-based tests run without either.
Qdrant -- vector database, gRPC on port 6334:
docker run -d \
--name qdrant \
-p 6333:6333 \
-p 6334:6334 \
qdrant/qdrant:v1.16.3Port 6333 is the REST API (not used by the library). Port 6334 is gRPC (used by the library).
Ollama -- embedding model server, HTTP on port 11434:
# Install Ollama from https://ollama.com
ollama pull nomic-embed-text
ollama serveThe nomic-embed-text model (274MB, F16) is the default. For AMD GPUs with ROCm, install the ROCm-enabled Ollama binary from the Ollama releases page.
go test ./...Runs all pure-function and mock-based tests. No Qdrant or Ollama instance is needed.
go test -tags rag ./...Runs the full suite including:
qdrant_integration_test.go-- collection lifecycle, upsert, search, payload filteringollama_integration_test.go-- model verification, single and batch embedding, determinismintegration_test.go-- full pipeline, all helper variants, semantic similarity verification
Integration tests skip gracefully when services are unavailable (they call HealthCheck and t.Skipf on failure).
go test -v -run TestChunkMarkdown ./...
go test -v -tags rag -run TestIntegration_FullPipeline ./...# Mock-only benchmarks (no services needed):
go test -bench=. -benchmem ./...
# GPU/service benchmarks (require Qdrant + Ollama):
go test -tags rag -bench=. -benchmem ./...Key benchmarks include BenchmarkChunk, BenchmarkChunkWithOverlap, BenchmarkQuery_Mock, BenchmarkIngest_Mock, BenchmarkFormatResults, BenchmarkKeywordFilter, BenchmarkEmbedSingle, BenchmarkEmbedBatch, BenchmarkQdrantSearch, and BenchmarkFullPipeline.
# Mock-only coverage:
go test -coverprofile=coverage.out ./...
go tool cover -html=coverage.out
# Full coverage with live services:
go test -tags rag -coverprofile=coverage.out ./...Coverage targets: ~69% without services, ~89% with live services.
golangci-lint run ./...
go vet ./...
gofmt -w .The .golangci.yml configuration enables govet, errcheck, staticcheck, unused, gosimple, ineffassign, typecheck, gocritic, and gofmt.
Tests requiring external services carry the //go:build rag build tag:
//go:build rag
package ragThis isolates them from CI environments that lack live services. All pure-function and mock-based tests have no build tag and run unconditionally with go test ./....
mock_test.go provides two in-package test doubles:
mockEmbedder -- returns deterministic all-0.1 vectors of configurable dimension. Features:
- Call tracking:
embedCallsrecords every text passed toEmbed;batchCallsrecordsEmbedBatchinputs - Error injection: set
embedErrorbatchErrto force failures - Custom behaviour: set
embedFuncfor per-test logic - Thread-safe: all state is guarded by a mutex
mockVectorStore -- in-memory map-backed store. Features:
- Stores points per collection in
map[string][]Point - Search returns stored points with fake descending scores (1.0, 0.9, 0.8, ...)
- Supports payload filter matching (exact string comparison)
- Per-method error injection:
createErr,existsErr,deleteErr,listErr,infoErr,upsertErr,searchErr - Custom search: set
searchFuncto override default behaviour - Call tracking for all methods
Constructors:
embedder := newMockEmbedder(768)
store := newMockVectorStore()Error injection:
embedder.embedErr = errors.New("embed failed")
store.upsertErr = errors.New("store unavailable")Tests use _Good, _Bad, _Ugly suffix semantics:
_Good-- happy path_Bad-- expected error conditions (invalid input, service errors)_Ugly-- panic or edge cases
Table-driven subtests are used for pure functions with many input variants (e.g., valueToGo, EmbedDimension, FormatResults*).
Graceful skip: Integration tests call HealthCheck and skip if the service is unavailable:
if err := client.HealthCheck(ctx); err != nil {
t.Skipf("Qdrant unavailable: %v", err)
}Indexing latency: After upserting points to Qdrant, tests include a 500ms sleep before searching to account for Qdrant's indexing delay.
Point ID format: Qdrant requires UUID-format point IDs. Always use ChunkID() to generate IDs. Arbitrary strings like "point-alpha" are rejected by Qdrant's UUID parser.
Collection isolation: Integration tests create collections with timestamped or randomised names and delete them in t.Cleanup to avoid cross-test interference.
UK English throughout -- in comments, documentation, variable names, and error messages. Use colour, organisation, initialise, serialise, behaviour, recognised. Do not use American spellings.
Error messages use the log.E("component.Method", "what failed", err) pattern from forge.lthn.ai/core/go-log. This wraps errors with component context for structured logging:
return log.E("rag.Ingest", "error resolving directory", err)- All functions have explicit parameter and return types
- No naked returns
- Exported types and functions have doc comments
- Internal helpers are unexported with concise inline comments
- Standard
gofmt/goimportsformatting
Every new Go source file should include:
// Copyright (C) 2026 Host UK Ltd.
// SPDX-License-Identifier: EUPL-1.2Conventional commits format: type(scope): description
Common types: feat, fix, test, refactor, docs, chore.
Every commit must include the co-author trailer:
Co-Authored-By: Virgil <[email protected]>
Example:
feat(chunk): add sentence-aware splitting for oversized paragraphs
When a paragraph exceeds ChunkConfig.Size, split at sentence boundaries
(". ", "? ", "! ") rather than adding the whole paragraph as an
oversized chunk. Falls back to the full paragraph when no sentence
boundaries exist.
Co-Authored-By: Virgil <[email protected]>
- Create a new file (e.g.,
openai.go) with a config struct and constructor. - Implement the
Embedderinterface:Embed,EmbedBatch,EmbedDimension. - Add a unit test file (
openai_test.go) covering config defaults and dimension lookup. - Add an integration test file (
openai_integration_test.go) with the//go:build ragtag for live API tests.
- Create a new file (e.g.,
weaviate.go) with a config struct and constructor. - Implement all methods of the
VectorStoreinterface. - Ensure
CollectionInfomaps backend-specific status codes to the"green"/"yellow"/"red"/"unknown"convention. - Add integration tests under the
//go:build ragtag.
Qdrant UUID requirement: Do not pass arbitrary strings as point IDs. Always use ChunkID() or another MD5/UUID generator. Qdrant rejects non-UUID strings with Unable to parse UUID: <value>.
EmbedBatch is sequential: There is no batch endpoint in the Ollama API. EmbedBatch calls Embed in a loop. For higher throughput, parallelise calls with goroutines and limit concurrency to avoid overwhelming the Ollama process.
Collection must exist before upsert: Ingest handles collection creation automatically. If calling UpsertPoints directly, create the collection first (or use CollectionExists to check).
Score threshold filtering: The default threshold is 0.5. Short or ambiguous queries may return zero results. Lower QueryConfig.Threshold or set it to 0.0 to return all results up to the limit.
Convenience wrappers open connections per call: QueryDocs, IngestDirectory, and IngestSingleFile construct a new QdrantClient (and gRPC connection) on every invocation. Use the *With variants with pre-created clients for server processes or loops.
EmbedDimension fallback: Unknown model names return 768 (the nomic-embed-text dimension). If a model with a different dimension is configured and its dimension is not known to the library, the collection will be created with an incorrect vector size, causing upsert failures at the Qdrant level.
Workspace module resolution: The go.mod may contain replace directives for local development. Ensure the referenced sibling directories exist and their go.mod files are consistent. If go test reports module-not-found errors for forge.lthn.ai/core/*, verify the workspace configuration.