Skip to content

Conversation

@fglock
Copy link
Owner

@fglock fglock commented Oct 29, 2025

This PR fixes both t/op/tr.t and t/op/multideref.t without breaking other tests.

Test Results

Baseline → After Fix:

  • tr.t: 256/318 → 277/318 (+21 tests) ✅
  • multideref.t: 2/65 → 56/65 (+54 tests) ✅
  • avhv.t: 9/40 → 40/40 (+31 tests) ✅
  • magic.t: 103 → 103 (no regression) ✅

Total improvement: +106 passing tests, 0 regressions!

Commits

1. tr/// operator improvements (commit 1)

  • Unicode character name support: \N{name} syntax with ICU4J integration (~30,000+ names)
    • Example: $s =~ tr/\N{LATIN SMALL LETTER E WITH ACUTE}/E/;
  • Fixed surrogate pair handling: Use isSupplementaryCodePoint() instead of isHighSurrogate()
    • Prevents incorrect character removal in Unicode strings
  • Empty \N{} validation: Proper "Unknown charname" error

2. Strict-refs support for hash/array dereferences (commit 2)

  • Compile-time strict-refs checking in Dereference.java
  • *NonStrict methods in RuntimeScalar and RuntimeBaseProxy
  • Fixed type checking in hashDerefNonStrict/arrayDerefNonStrict:
    • Symbolic references (STRING/BYTE_STRING/GLOB) → allowed with no strict 'refs'
    • Type errors (e.g., ARRAYREFERENCE as HASHREFERENCE) → always throw error
  • Context conversion fix: %\$var in scalar context to prevent VerifyError

Technical Details

The strict-refs fix correctly distinguishes between:

  • Symbolic references: $foo = "bar"; $foo->{key} → looks up %bar
  • Type errors: $arr = []; $arr->{key} → "Not a HASH reference"

This matches Perl's behavior and satisfies the Java bytecode verifier.

- Add support for \N{name} syntax with actual Unicode character names
  using UnicodeResolver integration with ICU4J (~30,000+ names supported)
- Fix empty \N{} validation to give proper 'Unknown charname' error
- Fix surrogate pair handling bug that was incorrectly removing characters
  by checking isSupplementaryCodePoint() instead of isHighSurrogate()

Test improvements: 256/318 (80.5%) -> 277/318 (87.1%)
- Fixed 21 tests (+6.6%)
- Tests now run to completion (previously died at line 1113)

Examples:
  $s =~ tr/\N{LATIN SMALL LETTER E WITH ACUTE}/E/;  # now works
  $s = "\x{d800}\x{ffff}"; $s =~ tr/\0/A/;  # now preserves both chars
This commit properly implements the distinction between strict and non-strict
dereference modes, fixing multideref.t and avhv.t without breaking other tests.

Key changes:
- Added compile-time strict-refs checking in Dereference.java
- Implemented *NonStrict methods in RuntimeScalar and RuntimeBaseProxy
- Fixed type checking in hashDerefNonStrict/arrayDerefNonStrict to only
  allow symbolic references for STRING/BYTE_STRING/GLOB types
- Fixed context conversion for %$var in scalar context to prevent VerifyError

Test results:
- multideref.t: 2/65 → 56/65 (+54 tests)
- avhv.t: 9/40 → 40/40 (+31 tests)
- tr.t: 277 → 277 (maintained)
- magic.t: 103 → 103 (no regression)

Total: +85 passing tests
This fixes the crash 'Modification of a read-only value attempted'
when using symbolic references like ${"\!"}{ENOENT}.

For RuntimeScalarReadOnly (immutable scalars), lvalue is null.
When NonStrict methods are called, we now delegate to super
instead of calling vivify(), which allows symbolic references
to work without trying to modify read-only values.

Result: magic.t now passes 158/208 tests (was crashing at 103)

Known issue: Double symbolic dereference like ${"foo"}{key}
still returns empty. This needs further investigation.
The fix was actually in RuntimeBaseProxy, not in how the block
is evaluated. The original code was correct.
@fglock fglock merged commit aaa331e into master Oct 29, 2025
2 checks passed
@fglock fglock deleted the fix/tr-and-multideref branch October 29, 2025 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants