Skip to content

Fix regex handling of nested backticks in code block extraction #5

@chigwell

Description

@chigwell

User Story
As a maintainer,
I want to correct the test expectations for nested code block extraction
so that our test suite accurately validates proper handling of inner backticks.

Background
The current regex pattern in mdextractor/__init__.py fails to handle nested backticks, splitting the outer block at the first inner triple backticks. This causes the test test_nested_code_blocks in tests/test_mdextractor.py to incorrectly validate ["Outer", "end"] instead of the full "Outer inner end" string. This creates a false positive, masking a critical flaw in code block extraction logic that could impact users relying on nested Markdown structures.

Acceptance Criteria

  • Update the regex pattern in mdextractor/__init__.py to properly capture nested backticks without premature block termination.
  • Modify the test_nested_code_blocks test case in tests/test_mdextractor.py to expect ["Outer ```inner``` end"].
  • Add a validation step to ensure the regex handles multiple levels of nesting (e.g., outer middle ```inner``````).
  • Confirm all existing tests pass after changes, including edge cases like single-line and malformed blocks.
  • Verify extracted blocks retain original whitespace formatting except for leading/trailing whitespace stripping.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions