Curator-built tarballs for the Cerid AI
personal-knowledge-companion harness. Each pack is a .tar.gz archive
containing a pack.json manifest plus a content/ tree of clean
markdown, sourced from permissively-licensed third-party publishers
and processed through Cerid's seven source adapters.
The Cerid harness installs packs by id; users browse them via the Library UI, MCP tool palette, REST endpoint, or CLI:
docker exec ai-companion-mcp \
python -m scripts.install_knowledge_pack list
docker exec ai-companion-mcp \
python -m scripts.install_knowledge_pack install rust-bookThe shipped Cerid registry already points at this repo's GitHub releases, so no extra configuration is needed for end users.
Each release carries the latest tarballs for the catalog. See the
Latest release
for the current sha256 + file list. Catalog metadata (license,
source, adapter) lives in
config/knowledge_packs.json
upstream.
| Pack ID | Upstream | License | Adapter |
|---|---|---|---|
mdn-web-docs |
mdn/content | CC-BY-SA-2.5 | github_zip |
rust-book |
rust-lang/book | MIT | github_zip |
typescript-handbook |
microsoft/TypeScript-Website | MIT | github_zip |
python-stdlib-docs |
docs.python.org | PSF-2.0 | python_docs_zip |
kubernetes-website |
kubernetes/website | CC-BY-4.0 | github_zip |
helm-docs |
helm/helm-www | MIT | github_zip |
tldr-pages |
tldr-pages/tldr | CC-BY-4.0 | github_zip |
apache-spark-docs |
apache/spark | Apache-2.0 | github_zip |
learnxinyminutes |
adambard/learnxinyminutes-docs | CC-BY-SA-3.0 | github_zip |
bogleheads-wiki |
bogleheads.org/wiki | CC-BY-SA-4.0 | mediawiki_api |
irs-publications-curated |
irs.gov/publications | US-gov-PD (CC0-1.0) | html_scrape |
cfpb-ask |
consumerfinance.gov/ask-cfpb | US-gov-PD (CC0-1.0) | html_scrape |
18f-methods-guides |
18F/methods | CC0-1.0 | github_zip |
chaoss-metrics |
chaoss/metrics | MIT | github_zip |
wikivoyage-en |
dumps.wikimedia.org/enwikivoyage | CC-BY-SA-3.0 | wiki_dump |
gutenberg-classics-curated |
gutenberg.org | Public Domain (CC0-1.0) | gutenberg |
wikipedia-simple-en |
wikimedia/wikipedia (HF) | CC-BY-SA-3.0 | hf_dataset |
cosmopedia-khanacademy |
HuggingFaceTB/cosmopedia (HF) | Apache-2.0 | hf_dataset |
pes2o-cs-recent |
allenai/peS2o (HF) | ODC-BY-1.0 | hf_dataset |
Cerid AI is the curator (build pipeline + tarball envelope). The
content in each pack remains under its upstream license, and
attribution / share-alike obligations propagate to embeddings + RAG
output. The Cerid install path enforces a SPDX license-category
gate that requires explicit operator opt-in for share_alike
content.
Tarballs are reproducible from the upstream sources via:
docker exec ai-companion-mcp \
python -m scripts.build_catalog --allEvery release pins:
- the upstream commit/snapshot via the recipe in
cerid-ai'sconfig/knowledge_packs.json - a sha256 of the tarball, verified at install time
The build recipes and orchestration code live in
Cerid-AI/cerid-ai under
Apache-2.0. The tarball contents in this repo are licensed per the
table above — see each pack's pack.json for the canonical SPDX
identifier.