Skip to content

perf(hstr): Optimize rkyv implementation of Atom #10840

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft

Conversation

kdy1
Copy link
Member

@kdy1 kdy1 commented Jul 11, 2025

Description:

Previously we didn't have any deduplication logic, and cloned Atom as String each time it occurs in the AST.

Related issue:

@kdy1 kdy1 added this to the Planned milestone Jul 11, 2025
@kdy1 kdy1 self-assigned this Jul 11, 2025
Copy link

changeset-bot bot commented Jul 11, 2025

⚠️ No Changeset found

Latest commit: 64c94da

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Contributor

claude bot commented Jul 11, 2025

Claude finished @kdy1's task —— View job


Code Review Complete ✅

Todo List:

  • Read repository guidelines (CLAUDE.md) - No repo-specific guidelines found
  • Analyze changed files and implementation approach
  • Review code quality and best practices
  • Check for potential bugs or issues
  • Assess performance considerations
  • Evaluate security concerns
  • Review test coverage needs
  • Provide comprehensive feedback

Code Review Analysis

Changes Overview:

  • hstr/lib.rs: Moved rkyv implementation from main lib to separate module, reduced code duplication
  • hstr/rkyv.rs: New dedicated module for rkyv serialization/deserialization logic with SharedPointer implementation
  • swc_atoms/Cargo.toml: Updated hstr dependency version from 1.1.5 to 1.1.6
  • swc_atoms/lib.rs: Simplified by using rkyv derive macros instead of manual implementation

Key Optimization:
The PR implements shared pointer semantics for rkyv serialization, allowing deduplication of identical Atom instances during serialization instead of cloning each occurrence as a separate String.

Issues Found

🚨 Critical Bug: Incomplete SharedPointer Implementation

// hstr/src/rkyv.rs:77-84
impl SharedPointer<()> for Atom {
    fn alloc(
        metadata: <() as rkyv::ptr_meta::Pointee>::Metadata,
    ) -> Result<*mut (), std::alloc::LayoutError> {
        // Missing implementation - this will not compile
    }

    unsafe fn from_value(ptr: *mut ()) -> *mut () {
        // Missing implementation and return statement
    }

Problem: The alloc and from_value methods are completely empty and missing return statements. This will cause compilation failures.

Recommendation: These methods need proper implementations. Based on the pattern, they should probably:

  • alloc: Return a properly allocated pointer for the metadata
  • from_value: Convert the raw pointer to the appropriate format

⚠️ Safety Concerns

  1. Unsafe operations without validation (hstr/src/rkyv.rs:62-74):

    let raw_shared_ptr = deserializer.deserialize_shared::<_, Atom>(unsafe {
        // SAFETY: We assume that the pointer is valid because it's only used for Wasm plugins
        &*self.ptr.as_ptr()
    })?;

    Issue: The safety comment indicates assumptions about Wasm plugins, but doesn't validate pointer validity. This could lead to undefined behavior if used outside the intended context.

  2. Multiple unsafe pointer operations without bounds checking throughout the implementation.

Recommendation: Add proper validation and consider making the API safer or more explicit about its constraints.

📝 Code Quality Issues

  1. Missing error handling: Empty function bodies that should return errors or values
  2. Inconsistent safety documentation: Some unsafe blocks have detailed comments, others don't
  3. Magic numbers without constants: Direct use of 1 in padding without explanation

Recommendations:

  • Complete all function implementations
  • Add comprehensive safety documentation
  • Consider extracting magic numbers to named constants

Performance Analysis

Positive Impact:

  • The shared pointer approach should significantly reduce memory usage for duplicate atoms
  • Moving rkyv code to separate module improves compilation times when feature is disabled
  • Using derive macros in swc_atoms reduces manual implementation overhead

Current Concerns:

  • Empty function implementations suggest the optimization may not be working as intended
  • Potential performance regression due to incomplete implementation

Security Considerations

⚠️ Medium Risk:

  • Extensive use of unsafe operations with assumptions about Wasm context
  • Direct pointer manipulation without validation could lead to memory safety issues
  • No bounds checking on deserialized data

Recommendations:

  • Add validation for pointer operations
  • Consider runtime checks for debug builds
  • Document the exact context where this code is safe to use

Test Coverage

Inadequate: No tests specifically for the rkyv functionality. Critical missing tests:

  • Serialization/deserialization round-trips
  • Deduplication behavior verification
  • SharedPointer implementation correctness
  • Memory safety under various conditions

Recommendation: Add comprehensive tests before merging.

Overall Assessment

Status: ⚠️ NEEDS WORK

While the optimization approach is sound and should provide performance benefits, the incomplete implementation in SharedPointer trait makes this PR non-functional. The code will likely fail to compile and needs completion before it can be merged.

Action Items

  1. Fix critical bug: Complete the SharedPointer implementation
  2. Add comprehensive tests: Cover serialization, deduplication, and safety
  3. Improve safety: Add validation to unsafe operations
  4. Documentation: Clarify the intended usage context and safety requirements

Copy link

codspeed-hq bot commented Jul 11, 2025

CodSpeed Performance Report

Merging #10840 will not alter performance

Comparing kdy1/rkyv-impl (64c94da) with main (2b04efd)

Summary

✅ 140 untouched benchmarks

Copy link
Contributor

claude bot commented Jul 14, 2025

Claude finished @kdy1's task —— View job


Code Review Complete ✅

Todo List:

  • Read repository guidelines (CLAUDE.md) - No repo-specific guidelines found
  • Analyze all changed files and current implementation
  • Review code quality and best practices
  • Check for potential bugs or issues
  • Assess performance considerations
  • Evaluate security concerns
  • Review test coverage needs
  • Provide comprehensive feedback with specific recommendations

Critical Issues Found

🚨 SHOW STOPPER: Completely Empty Implementation

Location: crates/hstr/src/rkyv.rs lines 20, 28, 36

The rkyv implementation is completely empty and non-functional:

impl rkyv::Archive for Atom {
    fn resolve(&self, resolver: Self::Resolver, out: rkyv::Place<Self::Archived>) {} // EMPTY!
}

impl<S> rkyv::Serialize<S> for Atom {
    fn serialize(&self, serializer: &mut S) -> Result<Self::Resolver, S::Error> {} // EMPTY!
}

impl<D> Deserialize<Atom, D> for ArchivedAtom {
    fn deserialize(&self, deserializer: &mut D) -> Result<Atom, D::Error> {} // EMPTY!
}

Impact:

  • This code will not compile when rkyv feature is enabled
  • Functions don't return required types (Self::Resolver, Result<...>)
  • The performance optimization goals of this PR are not achieved

Detailed Analysis

📋 Code Quality & Best Practices

✅ Positive Changes:

  • Good separation of concerns: Moving rkyv-specific code to separate module (hstr/src/rkyv.rs)
  • Clean dependency management: Proper feature gating with #[cfg(feature = "rkyv")]
  • Simplified swc_atoms: Using derive macros instead of manual implementation reduces boilerplate

❌ Quality Issues:

  • Empty struct definition: ArchivedAtom {} has no fields but should represent archived atom data
  • Unused resolver: AtomResolver {} is empty and doesn't resolve anything
  • No error handling: Empty functions don't handle serialization/deserialization errors

🐛 Potential Bugs

  1. Compilation failure: Empty functions with return types will not compile
  2. Missing archived data: ArchivedAtom has no way to store the actual atom content
  3. No validation: Deserialization doesn't validate input data integrity
  4. Resource leaks: No proper cleanup in serialization error paths

Performance Considerations

❌ Current State:

  • The performance optimization goal is not achieved due to empty implementation
  • No deduplication logic is present (despite PR description claiming this was the goal)

✅ Architectural Improvements:

  • Moving to separate module reduces compile time when feature is disabled
  • Using derive macros in swc_atoms should reduce binary size

🔧 Potential Optimizations Needed:

  • Implement shared pointer/deduplication logic for identical atoms
  • Add string interning during serialization to avoid duplicates
  • Consider zero-copy deserialization for large atom collections

🔒 Security Concerns

⚠️ Medium Risk:

  • Unsafe operations potential: rkyv serialization often involves unsafe pointer operations (not yet implemented)
  • Deserialization validation: No validation of incoming serialized data could lead to memory corruption
  • Input sanitization: Missing bounds checking on deserialized data

🛡️ Recommendations:

  • Add validation in deserialize method to ensure data integrity
  • Implement proper error handling for malformed input
  • Consider using safe rkyv patterns where possible

🧪 Test Coverage

❌ Critical Gaps:

  • No rkyv-specific tests: Missing tests for serialization/deserialization functionality
  • No round-trip tests: Should verify serialize → deserialize produces identical atoms
  • No deduplication tests: Missing validation that identical atoms are properly shared
  • No error handling tests: Missing tests for malformed input handling

✅ Existing Coverage:

  • Basic atom functionality is well tested in hstr/src/tests.rs
  • Reference counting and store behavior is covered

📊 Impact Assessment

Aspect Status Impact
Compilation ❌ Broken High - PR will not build
Functionality ❌ None High - No optimization achieved
Performance ❌ Regression Medium - May slow down due to broken features
Security ⚠️ Unknown Medium - Missing validation
Maintainability ✅ Improved Low - Better code organization

Recommendations

🔧 Immediate Actions Required

  1. Fix compilation issues:

    // Example proper implementation needed
    fn serialize(&self, serializer: &mut S) -> Result<Self::Resolver, S::Error> {
        // Implement actual serialization logic
        serializer.serialize_str(self.as_str())?;
        Ok(AtomResolver {})
    }
  2. Implement ArchivedAtom structure:

    #[derive(Debug, Portable)]
    #[repr(C)]
    struct ArchivedAtom {
        data: rkyv::string::ArchivedString, // or similar
    }
  3. Add comprehensive tests for the rkyv functionality

🚀 Future Improvements

  • Implement deduplication logic as originally intended
  • Add benchmarks to verify performance improvements
  • Consider using rkyv::Archive derive instead of manual implementation
  • Add comprehensive documentation for the rkyv integration

Performance Optimization Strategy

To achieve the stated goal of reducing atom duplication:

  1. Implement shared pointer semantics during serialization
  2. Use string interning table to deduplicate identical atoms
  3. Add zero-copy deserialization for better performance

Verdict: ⛔ REQUIRES MAJOR REWORK

This PR cannot be merged in its current state due to fundamental compilation issues. While the architectural changes are sound, the implementation is incomplete and non-functional.

Recommended Actions:

  1. Complete the rkyv trait implementations
  2. Add comprehensive test suite
  3. Verify the performance optimization goals are met
  4. Address security concerns with proper validation

@kdy1 kdy1 changed the base branch from main to dev/rust July 15, 2025 04:56
Base automatically changed from dev/rust to main July 17, 2025 14:57
@kdy1 kdy1 removed this from the Planned milestone Jul 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Optimize rkyv impl of Atom
1 participant