Skip to content

Complete async rewrite with bulk export, custom fields, and JQL support#1

Open
dchuk wants to merge 127 commits intochrisbyboston:mainfrom
dchuk:refactor/cleanup-and-dedup
Open

Complete async rewrite with bulk export, custom fields, and JQL support#1
dchuk wants to merge 127 commits intochrisbyboston:mainfrom
dchuk:refactor/cleanup-and-dedup

Conversation

@dchuk
Copy link
Copy Markdown

@dchuk dchuk commented Feb 20, 2026

Summary

This PR represents a comprehensive evolution of jarkdown from a synchronous single-issue exporter into a fully async, feature-rich Jira archival tool. The work was completed across 4 development phases plus a final cleanup pass.

Highlights

  • Full async rewrite — replaced requests with aiohttp for concurrent I/O
  • Subcommand CLIexport, bulk, query, setup with backward compatibility
  • Bulk export — concurrent multi-issue export with semaphore-based rate limiting
  • JQL search — query Jira with JQL and export all matching issues
  • Custom fields — configurable field rendering with Atlassian Document Format (ADF) parsing
  • 25+ ADF node types — tables, panels, code blocks, media, task lists, decision items, etc.
  • YAML frontmatter — machine-readable metadata in every exported markdown file
  • Retry with backoff — exponential backoff + jitter for HTTP 429/503/504
  • uv package manager — migrated from pip/setuptools to Hatchling + uv
  • 231 tests across 13 test modules

Phase 1: Standard Field Coverage

Extended the markdown output with structured sections for fields that were previously ignored:

  • YAML frontmatter with always-present schema (key, summary, status, type, priority, assignee, reporter, created, updated, resolved, labels, components, fix versions, affected versions, time tracking)
  • Linked issues section with relationship types (blocks, is blocked by, duplicates, etc.)
  • Subtasks section with status indicators
  • Worklogs section with author, time spent, and ADF-to-plaintext comment conversion
  • Environment section (HTML-to-markdown)
  • Integration tests with comprehensive JSON fixtures

Phase 2: Custom Fields & ADF

Added a full pipeline for custom Jira fields and rich document parsing:

  • ConfigManager — TOML-based config with ~/.config/jarkdown/config.toml
  • FieldMetadataCache — XDG-compliant caching of Jira field metadata with 24h TTL
  • CustomFieldRenderer — type-aware rendering (text, number, date, user, option, array, cascading select)
  • ADF parser — 25+ Atlassian Document Format node types:
    • Block nodes: paragraph, heading, blockquote, code block, rule, media single/group
    • List nodes: bullet, ordered, task list, decision list
    • Complex nodes: table (with header detection), panel, expand
    • Inline nodes: text (with marks), mention, emoji, date, status, inline card, hard break
  • Field filter--fields, --exclude-fields, --include-all-fields CLI flags
  • setup subcommand — interactive credential configuration wizard

Phase 3: uv Migration

Modernized the build and dependency toolchain:

  • Hatchling build backend (replaced setuptools)
  • uv lockfile for reproducible dependency resolution
  • CI migration — all GitHub Actions jobs (test, lint, docs, publish) updated from pip to uv
  • Updated contributing docs and developer setup instructions

Phase 4: Bulk Export & JQL

The largest phase — async rewrite and concurrent export engine:

  • aiohttp clientJiraApiClient rewritten as async context manager with TCPConnector(limit_per_host=5)
  • Retry moduleRetryConfig dataclass, parse_retry_after(), exponential backoff with jitter
  • bulk subcommand — accepts multiple issue keys, exports concurrently
  • query subcommand — accepts JQL string, fetches matching issues via search_jql() with nextPageToken pagination
  • BulkExporterasyncio.Semaphore for configurable concurrency (default 3), asyncio.gather(return_exceptions=True) for partial failure handling
  • Progress reporting\rExporting {n}/{total}... ({key}) on stderr
  • Index fileindex.md with summary table of all exported issues (key, summary, status, type, assignee, result)
  • Backward compatibility — bare jarkdown ISSUE-KEY still works (auto-injects export)

Cleanup Pass

  • Removed unused requests dependency (legacy from pre-async era)
  • Extracted export_core.py with shared perform_export() function, eliminating duplicated export logic between single-issue and bulk export paths

Architecture

CLI (jarkdown.py)
 ├── export → export_core.perform_export()
 ├── bulk   → BulkExporter.export_bulk() → [Semaphore] → export_core.perform_export()
 ├── query  → search_jql() → BulkExporter
 └── setup  → interactive config wizard

export_core.perform_export():
  JiraApiClient.fetch_issue()
  → AttachmentHandler.download_all_attachments()
  → FieldMetadataCache (optional)
  → CustomFieldRenderer (optional)
  → MarkdownConverter.compose_markdown()
  → write files

New Modules (10 total)

Module Purpose
jarkdown.py CLI orchestrator with subcommand dispatch
jira_api_client.py Async aiohttp Jira REST API client
attachment_handler.py Async attachment downloader
markdown_converter.py HTML/ADF to Markdown with frontmatter
export_core.py Shared single-issue export workflow
bulk_exporter.py Concurrent multi-issue export engine
retry.py Exponential backoff with jitter
config_manager.py TOML config loading and merging
field_cache.py XDG-compliant field metadata cache
custom_field_renderer.py Type-aware custom field rendering
exceptions.py Exception hierarchy (JarkdownError base)

Test Coverage

  • 13 test modules, 231 tests, all passing
  • JSON fixtures for realistic API response mocking
  • Async test support via pytest-asyncio with asyncio_mode=auto
  • Session-boundary mocking (aioresponses at aiohttp level)

Dependencies

Runtime (6): aiohttp, markdownify, python-dotenv, PyYAML, platformdirs, tomli
Dev: pytest, pytest-asyncio, pytest-mock, aioresponses, ruff

Test plan

  • All 231 existing tests pass
  • Merge conflicts with upstream resolved (our fork is the superset)
  • Manual smoke test: jarkdown export <ISSUE-KEY>
  • Manual smoke test: jarkdown bulk <KEY-1> <KEY-2>
  • Manual smoke test: jarkdown query "project = TEST"

🤖 Generated with Claude Code

- Add pytest with mock and coverage plugins
- Create test structure with data fixtures
- Implement all unit tests from TEST-DESIGN.md:
  - JiraDownloader initialization tests
  - fetch_issue method tests with error handling
  - download_attachment(s) tests with conflict resolution
  - HTML to Markdown conversion tests
  - Attachment link replacement tests
  - Markdown composition tests
  - Size formatting tests
- Add E2E/CLI tests for command-line interface
- Update .gitignore to allow tests/ directory
- Achieve 87% code coverage with unit tests
- Add MIT LICENSE file for open source compliance
- Enhance README with badges and improved installation instructions
- Create comprehensive CONTRIBUTING.md guide
- Add modern Python packaging with pyproject.toml
- Implement GitHub Actions CI/CD pipeline with multi-version testing
- Add CHANGELOG.md following Keep a Changelog format
- Create GitHub issue and PR templates for better collaboration
- Update .gitignore for Python packaging artifacts
- Add MANIFEST.in for proper package distribution

This commit implements all Tier 1 and Tier 2 recommendations from
the open source readiness assessment, making the project ready for
public release and community contributions.
- Create docs/source/ for Sphinx documentation
- Create docs/design/ for internal design documents
- Move design documents to docs/design/
- Add Sphinx configuration and ReadTheDocs config
- Set up documentation build infrastructure
- Split monolithic JiraDownloader class into focused modules:
  - JiraApiClient: handles Jira API communication
  - AttachmentHandler: manages attachment downloads
  - MarkdownConverter: handles HTML to Markdown conversion
  - Custom exceptions: specific error handling hierarchy
- Update tests to match new modular structure
- Replace requirements.txt with pyproject.toml dependencies
- Update CLAUDE.md documentation to reflect new architecture
- Add jira_api_client, attachment_handler, markdown_converter, and exceptions
  to py-modules in pyproject.toml
- Fixes ModuleNotFoundError when installing the package
- Move GEMINI.md from docs/design/ to project root for better visibility
- Aligns with CLAUDE.md location for AI assistant documentation
- Document the modular architecture with separate components
- Define error handling framework with custom exceptions
- Specify testing strategy improvements
- Consolidate dependency management approach
- Remove .readthedocs.yaml and related Sphinx documentation files
- Project uses README.md for documentation instead of ReadTheDocs
- Simplifies project structure
- Remove old design documents
- Add timestamped versions for better version tracking
- Update CLAUDE.md with documentation preservation note
- Add repository cleanup plan document
- Fix component test by updating priority value in test fixture
- Remove incorrect SystemExit assertions from successful CLI tests
- Improve main() error handling to consistently use exceptions
- Patch load_dotenv in environment variable test to ensure proper isolation

All 32 tests now pass successfully.
- Implement all core components per technical design
- Add JiraApiClient for Jira Cloud REST API communication
- Add AttachmentHandler for downloading attachments with streaming
- Add MarkdownConverter for HTML to Markdown conversion
- Add proper exception hierarchy for error handling
- Add comprehensive test suite with 92% code coverage
- Add .env.example for easy configuration setup
- Update README with correct installation instructions
- Update .gitignore to exclude test output directories

The implementation follows the modular architecture specified in the
technical design with clear separation of concerns between components.
- Move all modules from root to jira_download_pkg/ directory
- Update imports to use relative imports within package
- Restructure pyproject.toml for proper package distribution
- Update test imports to work with new package structure
- Remove old standalone modules from repository root
- Updated JiraApiClient to fetch comment field from Jira API
- Added _compose_comments_section method to MarkdownConverter for formatting comments
- Comments now included in markdown export with author, date, and formatted body
- Attachment links within comments are properly replaced with local references
- Added comprehensive tests for comment functionality
- Created test fixture with sample comments and attachments

Implements Day 2 requirement from original technical design.
- Move all source files into jira_download_pkg package directory
- Add __init__.py to mark as Python package
- Update pyproject.toml to use package structure instead of py-modules
- Convert all imports to relative imports within the package
- Update test imports to reference new package structure
- Bump version to 1.1.0 to reflect significant structural change

This fixes the ModuleNotFoundError when installing with pipx and enables
the tool to be installed globally and run from any directory.
- Add Atlassian Document Format (ADF) parser for comment bodies
- Format comments with horizontal rules instead of blockquotes
- Update _compose_comments_section to handle both HTML and ADF formats
- Support markdown formatting (bold, italic, lists) in comments
- Replace attachment links in comments with local references
- Add comprehensive tests for comment formatting and ADF parsing

The comment section now appears after the description and before attachments,
with each comment showing the author, date, and properly formatted content.
Comments are separated by horizontal rules for better readability.
Adds a series of design and planning documents to the repository to guide future development and improve project quality.

- **Code Review:** A hyper-detailed code review of the entire codebase to identify strengths and areas for improvement.
- **CI Workflow Fix Plan:** A concrete plan to correct the dependency installation step in the GitHub Actions CI workflow.
- **Enhanced Documentation Plan:** A comprehensive strategy for creating and deploying a professional documentation site using Sphinx and Read the Docs to improve user experience.
….txt

The test job was trying to install dependencies from requirements.txt which doesn't exist in this project. Dependencies are properly defined in pyproject.toml following modern Python packaging standards (PEP 621).
… support

- Initialize Sphinx documentation with Furo theme
- Create comprehensive user and developer documentation
- Add Read the Docs configuration (.readthedocs.yaml)
- Update CI workflow to build and test documentation
- Add documentation badge to README
- Create detailed guides for installation, usage, configuration
- Add API reference with autodoc integration
- Include architecture documentation for developers
- Update contributing guide with correct development setup
- Remove -W flag from sphinx-build to allow expected autodoc warnings
- Fix Python syntax highlighting in architecture.md exception hierarchy
- Create _static directory to eliminate warning
- Add pre-commit as development dependency
- Configure ruff linter and pytest as pre-commit hooks
- Update contributing documentation with setup instructions
- Apply ruff formatting to all Python files
Add comprehensive code review document outlining strengths and areas for improvement including repository layout, version management, pre-commit hooks, code idioms, and documentation enhancements.
…structure

- Restructure repository to use src/ layout for cleaner separation
- Consolidate version management to single source in pyproject.toml
- Add --version flag support using importlib.metadata
- Simplify exception handling with single JiraDownloadError catch
- Improve pre-commit hooks with standard file consistency checks
- Remove redundant shell script wrapper in favor of setuptools entry point
- Update documentation to reflect new installation and usage patterns
- Expand contributing guide with detailed development workflow

BREAKING CHANGE: Package moved from jira_download_pkg to src/jira_download.
Users should reinstall with pip install -e . after pulling this update.
dchuk and others added 28 commits February 18, 2026 00:55
- Add aiohttp>=3.9.0 to project.dependencies
- Add pytest-asyncio>=0.23.0 and aioresponses>=0.7.6 to dev deps
- Set asyncio_mode = "auto" in pytest config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Parse integer seconds ("30" → 30.0) or HTTP-date format
- Clamp result to [0.0, 300.0] range
- Fallback to 5.0s on unparseable input

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Retries on aiohttp.ClientResponseError with retryable status codes
- Retries on asyncio.TimeoutError
- Raises immediately on non-retryable codes (401, 404, etc.)
- Exponential backoff: min(base_delay * 2**attempt, max_delay) + jitter
- Respects Retry-After header on first retry attempt
- Raises last exception after max_retries exhausted

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace requests.Session with aiohttp.ClientSession via __aenter__/__aexit__
- TCPConnector(limit_per_host=5) for connection pooling
- BasicAuth + ClientTimeout(total=30) + Accept/Content-Type headers
- await asyncio.sleep(0.250) after session close for SSL cleanup
- fetch_issue/fetch_fields: async with explicit raise_for_status()
- Error mapping: 401->AuthenticationError, 404->IssueNotFoundError, other->JiraApiError
- get_attachment_content_url: unchanged pure accessor (sync)
- download_attachment_stream included as async method

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- 16 test cases across TestRetryConfig, TestParseRetryAfter, TestRetryWithBackoff
- Covers: defaults, custom values, integer/HTTP-date parsing, clamping, backoff
- Async tests using pytest-asyncio auto mode + AsyncMock patching asyncio.sleep
- Verifies 429 retry, 404 immediate raise, TimeoutError retry, max-retry exhaustion

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- download_attachment(): async; buffers response with await response.read()
- File writes via asyncio.to_thread(file_path.write_bytes, data) to avoid blocking
- download_all_attachments(): async, sequential iteration (concurrency managed by wave 2 semaphore)
- Filename conflict resolution remains synchronous (stat-only, non-blocking)
- _format_size(): unchanged synchronous helper

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ndary

Tasks 1-4: complete jarkdown.py restructure
- Add parent_parser (add_help=False) with shared flags: --output, --verbose,
  --refresh-fields, --include-fields, --exclude-fields
- Add backward-compat shim: bare issue key on argv[1] injects "export" subcommand
  via _ISSUE_KEY_RE = re.compile(r"^[A-Z]+-\d+$")
- Add export subcommand with asyncio.run() boundary in _handle_export()
- Add async def _async_export() using async with JiraApiClient() context manager
- Change export_issue() to async def; uses await api_client.fetch_issue()
- Fix field cache refresh to await api_client.fetch_fields() directly (avoids
  broken sync wrapper against async JiraApiClient)
- Add bulk subcommand stub (nargs='+' issue_keys, --max-results, --batch-name,
  --concurrency) with _handle_bulk() printing "not yet implemented"
- Add query subcommand stub (jql positional, same flags) with _handle_query()
- Add setup subcommand invoking setup_configuration()
- Dispatch block: export→_handle_export, bulk→_handle_bulk, query→_handle_query,
  setup→setup_configuration(), no-command→help+exit(1)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace requests mock fixtures with aioresponses library
- TestJiraApiClient tests converted to async def using async context manager
- Use re.compile() URL matching to handle aiohttp query param encoding
- TestAttachmentHandler tests converted to async with AsyncMock for response.read()
- TestMarkdownConverter tests unchanged (no I/O)
- Fix test_field_cache.py::TestFetchFields: session is now None until __aenter__,
  update 3 tests to use aioresponses + async context manager
- All 50 tests pass: 29 test_components + 21 test_field_cache

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove requests import; replace requests.Session mocking with async
  JiraApiClient class-level mock (AsyncMock for __aenter__/__aexit__,
  fetch_issue, fetch_fields)
- Add _make_client_mock() helper: returns (mock_class, mock_instance) for
  async context manager pattern used by _async_export()
- Add _fake_download_all() side effect: creates real files on disk so tests
  checking file existence continue to pass without requests-level mocking
- Patch jarkdown.field_cache.FieldMetadataCache in all export tests to avoid
  real filesystem writes to ~/.config/jarkdown/ and skip async fetch_fields
- Update all sys.argv patterns to subcommand form:
  ["jarkdown", "TEST-123"] → ["jarkdown", "export", "TEST-123"]
- Add 5 new tests: test_backward_compat_bare_issue_key (shim injects "export"),
  test_bulk_subcommand_stub (exits 0), test_query_subcommand_stub (exits 0),
  test_setup_subcommand (calls setup_configuration), test_no_command_exits_one

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…piClient

- Paginates via nextPageToken until exhausted or max_results reached
- Page size capped at 50 (Jira API limit)
- Integrates retry_with_backoff for transient errors (429, 503)
- Raises AuthenticationError on 401, JiraApiError on other HTTP failures

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ExportResult dataclass tracks per-issue success/failure with output_path and error
- asyncio.Semaphore(concurrency) limits concurrent exports (default: 3)
- asyncio.gather(return_exceptions=True) for partial failure handling
- _do_export replicates export_issue() inline to avoid circular import
- Progress to stderr via \rExporting N/total... pattern with flush

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Import BulkExporter at top of jarkdown.py
- Extract _load_credentials() helper from _handle_export (DRY)
- Add _print_summary() to report success/failure counts to stderr
- Replace _handle_bulk stub with real implementation via _async_bulk
- Replace _handle_query stub with real implementation via _async_query
- query subcommand: search JQL, build issue_keys list, run BulkExporter
- Update test_cli.py: stub tests replaced with handler-routing tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- TestBulkExporterInit: default/custom concurrency, batch_name subdir, cwd fallback
- TestExportBulk: all-succeed, partial-failure, all-fail, semaphore-limits, unexpected exception
- TestSearchJql: single page, two-page pagination, max_results cap, empty result, 401 error
- TestGenerateIndexMd: header count, success link, failure error, file write, sort, data columns, dash fallback
- 22 new tests; full suite: 230 passed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
QA caught missing await in export_issue() for the async
download_all_attachments call. Also fixed test mocks to use
AsyncMock so the bug is properly covered.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update 9 documentation files to reflect the current codebase after
4 phases of development (230 tests, async rewrite, subcommand CLI,
bulk export, JQL queries, custom fields, ADF parsing, retry infra).

- CHANGELOG.md: new [0.2.0] section with all phase 1-4 features
- README.md: updated features, subcommand usage, full frontmatter schema,
  corrected dependencies (aiohttp, platformdirs, tomli)
- docs/source/usage.md: full subcommand docs, field filtering, retry info
- docs/source/architecture.md: async flow, all new components, aiohttp patterns
- docs/source/installation.md: uv as recommended, setup wizard
- docs/source/configuration.md: .jarkdown.toml, field cache, setup wizard
- docs/source/beginners_guide.md: uv install, setup wizard, bulk/query mention
- CONTRIBUTING.md: uv commands, updated project structure, async test example
- docs/source/contributing.md: same updates for Sphinx version

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add --include-json to parent_parser (available on export, bulk, query)
- export_issue() conditionally writes .json only when include_json=True
- BulkExporter gains include_json param; _do_export() conditionally writes .json
- Thread include_json through _async_export, _async_bulk, _async_query
- Update README.md, docs/source/usage.md, docs/source/beginners_guide.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add TEST-*/ and VERIFICATION.md to .gitignore
- Add test output directory guidance to CLAUDE.md (use --output ./tmp)
- Scrubbed TEST-DESIGN.md and TEST-456/ from git history via filter-branch

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add --limit as an alias for --max-results on query subparser (same dest)
- Add test asserting --limit 10 sets args.max_results == 10
- Update docs/source/usage.md and README.md to document --limit

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- README: replace vague concurrency feature bullet with explicit default;
  add CLI Defaults Reference table covering all flags across all subcommands
- usage.md export section: annotate every option heading with its default
  (--output, --verbose, --include-fields, --exclude-fields, --refresh-fields,
  --include-json)
- usage.md bulk section: expand options list to include all shared flags with
  defaults (--max-results, --output, --include-json, --verbose, field flags)
- usage.md query section: expand options list with full shared-flag defaults
  matching bulk section for consistency

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… logic

- Remove requests>=2.28.0 from pyproject.toml (replaced by aiohttp in async migration, never cleaned up)
- Extract shared export workflow into src/jarkdown/export_core.py::perform_export()
- jarkdown.export_issue() and BulkExporter._do_export() both delegate to perform_export()
- Eliminates ~50 lines of duplicated fetch/attach/cache/convert/write logic
- No circular imports: export_core imports only leaf modules (attachment_handler, field_cache, config_manager, markdown_converter)
- Remove unused dataclasses.field import from bulk_exporter.py
- Update test_cli.py patch targets from jarkdown.jarkdown.AttachmentHandler to jarkdown.export_core.AttachmentHandler

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
QA verified all 25 checks passing for bulk-export-jql phase.
Updated uv.lock after requests dependency removal.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Merge upstream changes using -X ours strategy since our fork is a
superset of the upstream codebase (async rewrite, bulk export, custom
fields, etc).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dchuk dchuk changed the title Remove unused requests dep, deduplicate export logic, and v0.2.0 polish Complete async rewrite with bulk export, custom fields, and JQL support Feb 20, 2026
@dchuk
Copy link
Copy Markdown
Author

dchuk commented Feb 20, 2026

Hey, this is a great project. I realize what I just created a pull request for is a substantial rewrite, sorry for the abrupt move here. It's overall working really well for me in my testing, so I wanted to put it here for your consideration, no pride of ai-authorship here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants