Skip to content

Option to embed ALL vault documents, not just recent ~1000 #5

@gunternowy

Description

@gunternowy

Context

I have a large Obsidian vault (~18,500 markdown files, ~1.8 GB) and upgraded from enzyme v0.25.3 to v0.4.6.

With v0.25.3, all documents were chunked and embedded (416,379 chunks). With v0.4.6, only ~1,000-3,000 "recent" documents get embeddings. The rest are indexed (entities, tags, links) but not embedded.

Problem

For my use case (research vault with academic papers, project notes, clippings spanning 10+ years), I need all documents embedded for semantic search via enzyme catalyze. Many important connections are between older papers and current projects. Limiting to recent docs misses these cross-temporal patterns.

Request

Could you add a config option or flag to embed all documents? Something like:

# enzyme-config.yaml
embedding:
  scope: all  # or "recent" (default)

Or a CLI flag:

enzyme init --embed-all

Current workaround

Running v0.25.3 in parallel for full embeddings, but the DB format is incompatible with v0.4.6.

Thanks for the great tool!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions