Skip to content

refactor: extract ArtifactGenerationService to drain _artifacts.py below budget#1565

Merged
teng-lin merged 3 commits into
mainfrom
refactor/extract-artifact-gen
Jun 12, 2026
Merged

refactor: extract ArtifactGenerationService to drain _artifacts.py below budget#1565
teng-lin merged 3 commits into
mainfrom
refactor/extract-artifact-gen

Conversation

@teng-lin

@teng-lin teng-lin commented Jun 12, 2026

Copy link
Copy Markdown
Owner

Summary

Extracts the artifact generation kickoff paths from the 1447-line ArtifactsAPI facade into a new ArtifactGenerationService (src/notebooklm/_artifact/generation.py) — a peer to the existing ArtifactDownloadService and ArtifactPollingService. Combined with aggressive docstring/comment trimming on the facade, this drains _artifacts.py to 990 lines, below the 1000-line module-size budget, and removes its ALLOWLISTED_CEILINGS entry (one-way ratchet) — leaving exceptions.py as the sole allowlisted module.

Stacked on #1564 (refactor: drain module-size ratchet to two monoliths). This PR targets refactor/drain-module-size-ceilings; once #1564 merges, retarget the base to main.

What moved

Into ArtifactGenerationService(rpc=…, notebooks=…, note_service=…):

  • generate_audio/video/cinematic_video/report/study_guide/quiz/flashcards/infographic/slide_deck/data_table/mind_map
  • revise_slide, retry_failed
  • private _call_generate, _parse_generation_result

The facade keeps thin, signature-preserving delegators for every public method. _call_generate and _parse_generation_result also keep facade hops because tests call them directly on the API instance.

Why it's seam-safe

  • No test does setattr / patch.object on _call_generate / _parse_generation_result; they're called directly (preserved by the delegators) or exercised via a mocked RPC.
  • The RPC is injected at construction; tests mutate the same shared object (mock_core.rpc_executor.rpc_call) via .side_effect / .return_value, so the extracted service sees it.
  • All generate_* patch-seam tests pass unchanged — no seam repoints needed.

What did NOT move

_is_media_ready, _extract_artifact_error, _get_artifact_type_name stay on the facade (status/polling concerns delegating to _artifact_polling). Their verbose docstrings were trimmed but the positional RPC-shape rationale (art[3]/art[5]/art[6][5]/…) was kept. Every issue/ADR ref (#1342, #1270, #1491, #1247, #1362, ADR-0019, …) was preserved.

Verification

  • pytest tests/unit tests/integration -k "artifact or generation or mind_map or generate" → 1543 passed
  • pytest tests/_guardrails/test_module_size_ratchet.py test_public_surface_manifest.py tests/unit/test_claude_md_freshness.py → 309 passed
  • Full suite green (the only 2 failures observed were a missing browser extra in the sandbox, unrelated; green once installed)
  • mypy src/notebooklm --ignore-missing-imports → clean
  • pre-commit run --all-files → pass
  • scripts/audit_public_api_compat.py → exit 0 (non-breaking; no allowlist entries added)

Docs

docs/architecture.md updated: class-index table, file-index table, repo tree, and the artifact-generation call-flow row (which previously said the service "was folded into the facade").

🤖 Generated with Claude Code

Review in cubic

Summary by CodeRabbit

  • Refactor

    • Improved internal code organization by restructuring artifact generation services for better maintainability and separation of concerns. All public APIs and functionality remain unchanged.
  • Chores

    • Updated architecture documentation and test configurations to reflect internal service refactoring.

@teng-lin

Copy link
Copy Markdown
Owner Author

@claude review

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1ddacf63-ce0c-4d1a-959d-8e356d2015fe

📥 Commits

Reviewing files that changed from the base of the PR and between 928eab3 and 1dbbee6.

📒 Files selected for processing (5)
  • docs/architecture.md
  • src/notebooklm/_artifact/generation.py
  • src/notebooklm/_artifacts.py
  • tests/_guardrails/test_module_size_ratchet.py
  • tests/_guardrails/test_no_facade_reach_in.py

📝 Walkthrough

Walkthrough

This PR extracts artifact-generation kickoff logic from the ArtifactsAPI facade into a dedicated ArtifactGenerationService module. The new service owns all generation/revise/retry RPC dispatch, parameter building, and result parsing, while ArtifactsAPI delegates generation behavior and retains polling, export, and listing responsibilities.

Changes

Artifact Generation Service Extraction

Layer / File(s) Summary
Architecture documentation
docs/architecture.md
Updated documentation to describe the extracted ArtifactGenerationService and its role in the generation kickoff flow, added to collaborators table, key files index, and repository tree.
ArtifactGenerationService structure and RPC dispatch
src/notebooklm/_artifact/generation.py
Implemented ArtifactGenerationService with constructor, _call_generate for CREATE_ARTIFACT RPC invocation with null-result feature gating, and _parse_generation_result for extracting artifact_id and status code from RPC responses.
ArtifactGenerationService generation methods
src/notebooklm/_artifact/generation.py
Implemented all generation entry points (generate_audio, generate_video with style/prompt validation, generate_cinematic_video, generate_report, generate_study_guide, generate_quiz, generate_flashcards, generate_infographic, generate_slide_deck, revise_slide, retry_failed, generate_data_table) following a pattern of defaulting language, resolving source_ids, building parameters, and delegating to _call_generate.
ArtifactGenerationService mind-map generation
src/notebooklm/_artifact/generation.py
Implemented generate_mind_map with RPC call, mind-map leaf unwrapping, JSON payload normalization (parsing strings and serializing non-strings), note title derivation from name field, note persistence via NoteService.create_note, and MindMapResult construction with mind_map, note_id, and created_at.
ArtifactsAPI refactoring and delegation
src/notebooklm/_artifacts.py
Refactored facade to delegate all generation behavior to ArtifactGenerationService: updated imports, added _generation service initialization, and replaced all generation-related implementations with delegation calls. Condensed docstrings for listing, getting, polling, and exporting while preserving semantics.
Module size and reach-in guardrail updates
tests/_guardrails/test_module_size_ratchet.py, tests/_guardrails/test_no_facade_reach_in.py
Removed _artifacts.py from oversized module allowlist. Added _artifact/generation.py to artifact-service module set and migrated modules list with updated reach-in documentation.

Sequence Diagrams

sequenceDiagram
  participant Service as ArtifactGenerationService
  participant RPC as RpcCaller
  participant Parser as _parse_generation_result
  Service->>Service: _call_generate()
  Service->>RPC: CREATE_ARTIFACT RPC
  RPC-->>Service: response or null
  alt null response with feature gate
    Service->>Service: raise ArtifactFeatureUnavailableError
  else success response
    Service->>Parser: _parse_generation_result()
    Parser-->>Service: GenerationStatus
  end
Loading
sequenceDiagram
  participant Client as Client/User
  participant Facade as ArtifactsAPI
  participant Service as ArtifactGenerationService
  Client->>Facade: generate_audio/video/report/etc
  Facade->>Service: _generation.generate_audio/video/report/etc
  Service->>Service: resolve source_ids, build params
  Service->>Service: _call_generate
  Service-->>Facade: GenerationStatus
  Facade-->>Client: GenerationStatus
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • teng-lin/notebooklm-py#910: Adjusts ArtifactGenerationService to be collaborator-injected and re-wires generation/result calls through the service/protocol-facing methods.
  • teng-lin/notebooklm-py#891: Scaffolds protocol+AST "no facade reach-in" checks and updates reach-in tests to include the new _artifact/generation.py.
  • teng-lin/notebooklm-py#673: Introduces the initial ArtifactGenerationService extraction with similar responsibilities for generation kickoff and RPC result parsing.

Poem

🐰 A service hops out of the facade so wide,
Generation methods find their rightful place,
RPC dispatch and parsing set aside,
Delegating flows in architectural grace! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: extraction of ArtifactGenerationService to reduce _artifacts.py module size below the budget threshold.
Docstring Coverage ✅ Passed Docstring coverage is 95.92% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch refactor/extract-artifact-gen
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch refactor/extract-artifact-gen

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

Claude finished @teng-lin's task in 3m 37s —— View job


Code Review

Tasks:

  • Gather context and understand the PR
  • Read changed files
  • Analyze architecture and design decisions
  • Check for bugs, correctness issues, and security concerns
  • Review test coverage and guardrails
  • Post comprehensive review feedback

Summary

This is a clean, well-executed extraction. The ArtifactGenerationService follows the established pattern from ArtifactDownloadService and ArtifactPollingService, the facade delegators are thin and signature-preserving, and the module-size ratchet correctly removes _artifacts.py from the allowlist. I found one guardrail gap and one stale comment that should be addressed.


Issues

1. _artifact/generation.py and _source/_upload_decode.py missing from reach-in guardrail lists

File: tests/_guardrails/test_no_facade_reach_in.py

The two new extraction modules are not registered in the boundary guards that protect the other artifact/source service modules:

# line 55 — _artifact/generation.py is absent
_ARTIFACT_SERVICE_MODULES = [
    "_artifact/formatters.py",
    "_artifact/listing.py",
    "_artifact/downloads.py",  # guarded
    "_artifact/polling.py",
    # "_artifact/generation.py"  ← missing
]

# line 265 — _artifact/generation.py is absent
_REACH_IN_MIGRATED_MODULES: list[str] = [
    "_artifact/downloads.py",
    # "_artifact/generation.py"  ← missing
]

# line 62 — _source/_upload_decode.py is absent
_SOURCE_SERVICE_MODULES = [
    "_source/listing.py",
    "_source/polling.py",
    "_source/add.py",
    "_source/upload.py",
    "_source/content.py",
    # "_source/_upload_decode.py"  ← missing
]

This means:

  • test_artifact_service_modules_do_not_runtime_import_facades_or_core doesn't check _artifact/generation.py — a future drift that adds a from .._artifacts import ArtifactsAPI import would slip through silently.
  • test_artifact_services_have_no_facade_reach_in doesn't check _artifact/generation.py — a future self._api.whatever access wouldn't be caught.
  • test_source_service_modules_do_not_runtime_import_facades_or_core doesn't check _source/_upload_decode.py.

The generation service is correctly injection-based today (no self._api), so this is not a current bug — but the guardrails exist precisely to catch future drift, and skipping them here creates a blind spot for the two largest new files in this PR.

Fix this →

2. Stale docstring in _ApiReachInVisitor

File: tests/_guardrails/test_no_facade_reach_in.py, line 292–294

The _ApiReachInVisitor docstring still reads:

    """...
    ``_artifact_generation.py`` was folded into the ``ArtifactsAPI``
    facade (issue #1205) so it is no longer a separate guarded module.
    """

This was accurate when the old _artifact_generation.py was folded in, but this PR re-extracts it as _artifact/generation.py. The comment should be replaced with a reference to this new module.


Minor Observations (no action needed)

generate_study_guide resolves language before calling generate_report (generation.py:216): generate_study_guide calls get_default_language() then immediately passes the result to generate_report, which also calls get_default_language() internally when it receives None. The pre-resolve in generate_study_guide is redundant but harmless — generate_report always gets a non-None language. This matches the original facade behavior.

_upload_decode.py re-export block (upload.py:43–65): The blanket noqa: F401 star-re-import pattern is intentional (documented in the module docstring) and correct. Import-time errors will still surface if a name is removed from _upload_decode.py, so this is safe.

Facade delegators for _call_generate and _parse_generation_result (_artifacts.py:876–961): The PR description calls this out explicitly, and the delegators are correct. Tests call api._call_generate(...) and api._parse_generation_result(...) directly; the facade hops preserve those call sites without requiring any test rewrites.


Verdict

The extraction itself is correct and the module-size goals are met. Fixing the two guardrail omissions (adding the new modules to _ARTIFACT_SERVICE_MODULES, _REACH_IN_MIGRATED_MODULES, and _SOURCE_SERVICE_MODULES, plus updating the stale comment) would make this PR complete.

@teng-lin teng-lin force-pushed the refactor/extract-artifact-gen branch from 3825ae7 to c6a17ad Compare June 12, 2026 12:32

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the codebase to extract artifact generation logic into a new ArtifactGenerationService and source upload decode/validation helpers into _source/_upload_decode.py. This modularization successfully reduces the size of several oversized modules below the 1000-line budget, allowing their removal from the module size ratchet allowlist. Feedback on the changes suggests replacing instances of next(iter(sequence)) with explicit sequence unpacking in _upload_decode.py, downloads.py, and generation.py to make single-element invariants explicit and avoid potential StopIteration issues.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

I am having trouble creating individual review comments. Click here to see my feedback.

src/notebooklm/_source/_upload_decode.py (108-109)

medium

Following the project's general rules, prefer sequence unpacking over next(iter(sequence)) when retrieving the first or only element from a sequence. Since upload_ids is guaranteed to have exactly one element after the length check, we can explicitly unpack it to make the single-element invariant explicit and avoid potential StopIteration issues.

    if len(upload_ids) != 1:
        raise ValidationError("Upload URL must include exactly one non-empty upload_id")
    (upload_id,) = upload_ids
    if not upload_id:
        raise ValidationError("Upload URL must include exactly one non-empty upload_id")
References
  1. In Python, when retrieving the first or only element from a sequence, prefer sequence unpacking over next(iter(sequence)) to make the single-element invariant explicit and avoid StopIteration issues.

src/notebooklm/_artifact/downloads.py (901-902)

medium

Following the project's general rules, prefer sequence unpacking over next(iter(sequence)) when retrieving the first or only element from a sequence. Since writer_error is a one-slot exception box, we can explicitly unpack it to make the single-element invariant explicit.

                            if writer_error:
                                (err,) = writer_error
                                raise err
References
  1. In Python, when retrieving the first or only element from a sequence, prefer sequence unpacking over next(iter(sequence)) to make the single-element invariant explicit and avoid StopIteration issues.

src/notebooklm/_artifact/generation.py (535-540)

medium

To adhere to the project's guidelines of avoiding next(iter(sequence)) for retrieving the first element of a sequence (especially to prevent StopIteration issues), we can use sequence unpacking when the slice is non-empty. This also respects the custom linter constraint against direct integer indexing.

        # Best-effort debug label over the OUTGOING request body; the ``[2:3]``
        # slice-pick keeps it off ``name[int]`` (== old guarded ``params[2]``).
        descriptor = None
        if params[2:3]:
            (descriptor,) = params[2:3]
        artifact_type = "unknown"
        if isinstance(descriptor, list) and descriptor[2:3]:
            (artifact_type,) = descriptor[2:3]
References
  1. In Python, when retrieving the first or only element from a sequence, prefer sequence unpacking over next(iter(sequence)) to make the single-element invariant explicit and avoid StopIteration issues.

@teng-lin

Copy link
Copy Markdown
Owner Author

Addressed in 4f7f936:

@claude (reach-in guardrail gap):

  • Registered _artifact/generation.py in both _ARTIFACT_SERVICE_MODULES (line 55) and _REACH_IN_MIGRATED_MODULES (line 266) in tests/_guardrails/test_no_facade_reach_in.py, so the facade-import / facade-reach-in drift guards now actively cover the new service (verified: 28 reach-in tests pass with it registered; the service is injection-based with no self._api).
  • Refreshed the stale _ApiReachInVisitor docstring that still claimed _artifact_generation.py "was folded into the ArtifactsAPI facade (issue Refactor: collapse duplicate implementations of one concept (divergence-by-duplication) #1205)" — it now describes the re-extracted constructor-injected ArtifactGenerationService.
  • _source/_upload_decode.py registration in _SOURCE_SERVICE_MODULES belongs to the base PR refactor: drain module-size ratchet to two monoliths (budget 1000) #1564 (that module is introduced there, not in this PR), so I've left it for that PR.

@gemini-code-assist (next(iter(...)) → unpacking):

  • ArtifactGenerationService._call_generate now unpacks the at-most-one-element param slices ((descriptor,) = params[2:3] guarded by if params[2:3]) instead of next(iter(..., default)), making the single-element invariant explicit. The other two findings (_upload_decode.py:108-109, downloads.py:901-902) are on base-PR refactor: drain module-size ratchet to two monoliths (budget 1000) #1564 surface, not changed by this PR.

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Thanks for the update, @teng-lin. The guardrail registrations and the docstring refresh look correct and complete. Since you've verified that _source/_upload_decode.py is indeed part of the base PR #1564, leaving it out of this PR's guardrail updates is the right call to avoid merge conflicts or unnecessary churn. The shift to unpacking slices in _call_generate is a cleaner, more explicit pattern. Everything looks good to go.

claude added 2 commits June 12, 2026 08:48
…low budget

Extract the generation kickoff paths (generate_*, revise_slide, retry_failed,
plus _call_generate / _parse_generation_result) from the ArtifactsAPI facade
into a new ArtifactGenerationService (_artifact/generation.py), a peer to
ArtifactDownloadService and ArtifactPollingService. The facade keeps thin,
signature-preserving delegators (including _call_generate /
_parse_generation_result, which tests call directly) so the public API and
patch seams are unchanged.

Combined with aggressive docstring/comment trimming on the facade, this drains
_artifacts.py from 1447 to 990 lines, below the 1000-line module-size budget,
and removes its ALLOWLISTED_CEILINGS entry (one-way ratchet) — leaving
exceptions.py as the sole allowlisted module. Issue/ADR refs and positional
RPC-shape rationale were preserved.

docs/architecture.md updated (class index, file index, repo tree, and the
artifact-generation call-flow row).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ards

- Register _artifact/generation.py in _ARTIFACT_SERVICE_MODULES and
  _REACH_IN_MIGRATED_MODULES (test_no_facade_reach_in.py) so the
  facade-reach-in / facade-import drift guards actively cover the new
  service, and refresh the stale _ApiReachInVisitor docstring that still
  claimed the old _artifact_generation.py was folded into the facade (#1205).
- In ArtifactGenerationService._call_generate, unpack the at-most-one-element
  param slices instead of next(iter(...)) so the single-element invariant is
  explicit (per gemini-code-assist).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@teng-lin teng-lin force-pushed the refactor/extract-artifact-gen branch from 4f7f936 to 1dbbee6 Compare June 12, 2026 12:49
@teng-lin teng-lin changed the base branch from refactor/drain-module-size-ceilings to main June 12, 2026 12:49
Empty commit to fire the synchronize event so test.yml runs natively on the
PR now that its base is main (retargeting via API did not re-fire CI).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@teng-lin teng-lin merged commit b8b4191 into main Jun 12, 2026
26 checks passed
@teng-lin teng-lin deleted the refactor/extract-artifact-gen branch June 12, 2026 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants