feat(ingester): migrate CairoBookIngester to full mdbook build#101
Merged
feat(ingester): migrate CairoBookIngester to full mdbook build#101
Conversation
Replace pre-summarized file approach with full mdbook build workflow, matching StarknetFoundryIngester pattern. This provides complete Cairo Book content instead of a condensed summary. Changes: - Download Cairo Book from GitHub releases - Remove quiz-cairo, cairo, and gettext preprocessors from book.toml - Build full mdbook and process all generated markdown files - Remove legacy cairo_book_summary.md file (314KB) - Remove readSummaryFile() and chunkSummaryFile() methods - Remove custom process() override in favor of parent class workflow Benefits: - Access to complete Cairo Book content, not just summaries - Consistent ingestion pattern across all mdbook sources - Automatic updates from latest Cairo Book releases
Collaborator
Author
|
[AUTOMATED] Updated the implementation to use |
Replace GitHub release download approach with direct git clone from the main branch for more up-to-date content and simpler implementation. Changes: - Replace axios/AdmZip download logic with git clone --depth 1 - Clone from main branch instead of latest release - Remove axios and AdmZip dependencies from imports - Update docstrings to reflect cloning approach - Simplify downloadAndExtractRepo method significantly Benefits: - Always get the latest Cairo Book content from main - No need to wait for releases - Simpler implementation without zip extraction logic - Faster with shallow clone (--depth 1)
48115db to
180dc64
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Context
Previously, CairoBookIngester used a pre-generated summary file (
cairo_book_summary.md) that required manual regeneration. This was inconsistent with other mdbook ingesters like StarknetFoundryIngester, which build the full documentation.This migration:
book.tomlto remove problematic preprocessors (quiz-cairo, cairo, gettext)mdbook buildTechnical Changes
CairoBookIngester.ts:
downloadAndExtractRepo()- downloads from GitHub releasesupdateBookConfig()- removes preprocessors from book.tomlbuildMdBook()- builds mdbook with CLIreadSummaryFile()andchunkSummaryFile()- no longer neededprocess()override - uses parent class workflowCleanup:
python/src/cairo_coder_tools/ingestion/generated/cairo_book_summary.mdBenefits
Test Plan
cd ingesters && bun run generate-embeddings