Skip to content

Conversation

cwarnermm
Copy link
Member

This PR introduces a robust and flexible workflow for converting the static HTML output generated by Sphinx into clean, styled PDF documents, suitable for air-gapped and offline customer environments.

Key Technologies and Approach

WeasyPrint is used as the core HTML-to-PDF rendering engine. It offers:

  • Full support for modern HTML and CSS
  • Clean typography and print layout control
  • Offline operation (no dependency on external assets or CDNs) for air-gapped and restricted access

BeautifulSoup is used to:

  • Strip out internet-only elements (e.g., deployment badges, external links)
  • Remove navigation components like "On This Page" sidebars
  • Normalize inline image behavior and heading structures

Custom PDF builder script (generate_pdfs.py):

  • Merges multiple HTML sections into a single printable HTML document per guide
  • Injects a styled table of contents with page estimates
  • Applies print-optimized CSS and layout rules
  • Outputs clean PDFs into a dedicated /pdfs directory

Output Guides - 2 PDFs are generated:

  • Operations content (Deployment, Security, Administration)
  • Application content (Use Cases, End User, Integrations)

Each PDF is self-contained, styled, and optimized for offline distribution. More iteration is needed on PDF look and feel.

@cwarnermm cwarnermm added Work In Progress Not yet ready for review Guidance labels Aug 1, 2025
@cwarnermm cwarnermm changed the title Convert static HTML output to PDF POC: Convert static HTML output to PDF Aug 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Guidance Work In Progress Not yet ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant