Skip to content

Conversation

@konard
Copy link
Member

@konard konard commented Sep 14, 2025

Summary

  • Fixed inconsistency between C# and Python regex engines when counting inner groups
  • Replaced numbered groups with named groups for maintainable and consistent patterns
  • Updated both C# and Python implementations to use consistent named group syntax

Problem

Python regular expression engine counts inner groups differently than C#, causing inconsistent behavior between the two implementations. As shown in the original issue:

C#: "$1 $2$3{$4};$5"
Python: r"\1 \2\3{\5};\6"

The numbering of capture groups diverged between the two engines, making it difficult to maintain identical transformation logic.

Solution

  1. Updated C# regex pattern (line 286) from numbered groups to named groups:

    // Before
    (new Regex(@"(struct|class) ([a-zA-Z0-9]+)(\s+){([\sa-zA-Z0-9;:_]+?)}([^;])"), "$1 $2$3{$4};$5", 0)
    
    // After 
    (new Regex(@"(?<type>struct|class) (?<name>[a-zA-Z0-9]+)(?<whitespace>\s+){(?<body>[\sa-zA-Z0-9;:_]+?)}(?<after>[^;])"), "${type} ${name}${whitespace}{${body}};${after}", 0)
  2. Updated Python regex patterns to match C# named groups:

    # Before
    SubRule(r"(struct|class) ([a-zA-Z0-9]+)(\s+){([\sa-zA-Z0-9;:_]+?)}([^;])", r"\1 \2\3{\4};\5", max_repeat=0)
    
    # After
    SubRule(r"(?P<type>struct|class) (?P<name>[a-zA-Z0-9]+)(?P<whitespace>\s+){(?P<body>[\sa-zA-Z0-9;:_]+?)}(?P<after>[^;])", r"\g<type> \g<name>\g<whitespace>{\g<body>};\g<after>", max_repeat=0)
  3. Fixed all Python backreferences from \k<name> (C# syntax) to (?P=name) (Python syntax) throughout the entire codebase

Test Results

Benefits

  • Consistency: Both C# and Python implementations now use equivalent named group patterns
  • Maintainability: Named groups are self-documenting and easier to understand
  • Reliability: Eliminates group counting discrepancies between regex engines
  • Future-proof: Makes it easier to translate patterns between C# and Python

🤖 Generated with Claude Code


Resolves #28

Adding CLAUDE.md with task information for AI processing.
This file will be removed when the task is complete.

Issue: #28
@konard konard self-assigned this Sep 14, 2025
Replace numbered groups with named groups to ensure consistent behavior
between C# and Python regex engines when counting inner groups.

Changes:
- Updated C# regex pattern from numbered groups ($1, $2, etc.) to named groups (${type}, ${name}, etc.)
- Updated Python regex patterns to use named group syntax (?P<name>) and backreferences (?P=name)
- Fixed all Python \k<name> backreferences to use (?P=name) syntax
- Both versions now use consistent named group patterns for maintainability

This resolves the issue where Python regex counts inner groups differently
than C# regex, ensuring both transformers produce identical results.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@konard konard changed the title [WIP] Python regular expression counts inner groups differently Fix Python regex group counting inconsistency with C# (issue #28) Sep 14, 2025
@konard konard marked this pull request as ready for review September 14, 2025 01:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python regular expression counts inner groups differently

1 participant