Skip to content

Conversation

@ChuyueSun
Copy link
Owner

@ChuyueSun ChuyueSun commented Nov 6, 2025

  • Implement surgical insertion in view_inference (prevents spec keyword deletion)
  • Add pattern detection for all 5 View types
  • Create 4 educational examples for abstraction level teaching
  • Update spec_inference with smart example selection
  • Validated on 13 benchmarks: 84% success rate, 100% spec preservation

Primary bug FIXED: bitmap_2_todo achieves 100% verification (V=8/8)


Note

Overhauls view/spec inference with surgical insertion and abstraction-aware examples, adds repair-round timeouts, experiments/analysis tooling, CI/pre-commit, and comprehensive docs with VeriStruct rebrand.

  • Core (verification):
    • View Inference: Implement surgical insertion; detect existing spec fn view/impl View TODOs; robust brace matching; insert body/impl safely with delimiter/safety checks.
    • Spec Inference: Add low-level pattern detection (bit-vectors/packed structs); prioritize concrete postcondition examples; inject targeted guidance; syntax fixups for spec clauses.
    • Proof/Repairs: Regex-based syntax fixes; strengthened invariant/proof guidance; multiple modules streamlined; improved safety checks.
  • Repair orchestration:
    • Add repair round timeout plumbing (config, main, RepairRegistry) with round start/elapsed checks; tests and verifier script.
  • CLI/Runner updates:
    • Enhance run_agent.py, run_bench*.py, run_all_benchmarks.py (parallelism, no-cache, UX/messages); add analyze_results.py (root) and analysis tooling.
  • Experiments:
    • Add experiments/ framework (runner, analyzer, quick script, sample corpus, results reporting & visualizations).
  • CI/Tooling:
    • Add pre-commit config, flake8/markdownlint, GitHub Actions workflow, shellcheck config, ignore patterns.
  • Docs & Rebrand:
    • Rebrand to VeriStruct; major README and documentation additions (tutorials, modules, planner/workflow, examples); config/setup guides.
  • Misc:
    • Add lemma macros (bit.rs); numerous small refactors, logging, and robustness improvements.

Written by Cursor Bugbot for commit 6090459. This will update automatically on new commits. Configure here.

- Implement surgical insertion in view_inference (prevents spec keyword deletion)
- Add pattern detection for all 5 View types
- Create 4 educational examples for abstraction level teaching
- Update spec_inference with smart example selection
- Validated on 13 benchmarks: 84% success rate, 100% spec preservation

Primary bug FIXED: bitmap_2_todo achieves 100% verification (V=8/8)
- Remove reflection and analysis markdown files
- Add .gitignore patterns to prevent future tracking
- Keep files locally for reference but not in repository
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Broken checkpoint best-tracking overwrites better scores

Incorrect checkpoint best tracking logic that prevents proper score comparison. After retrieving checkpoint_best_score and checkpoint_best_code from context (lines 732-735), the code initializes them if they're None (lines 738-745). However, this initialization logic never compares the new best_score against the existing checkpoint_best_score to determine which is actually better. The code unconditionally overwrites both values with the current best_code and best_score on lines 747-748, even if the checkpoint best was superior. This means the global best tracking will always use the most recent result rather than the best result across all attempts, defeating the purpose of checkpoint best tracking.

The correct logic should compare scores and only update if the new score is better (or if checkpoint_best_score is None). This bug causes the system to lose track of the best code encountered during the workflow.

src/modules/view_inference.py#L731-L749

# Use cache only for first attempt
responses = self._get_llm_responses(
instruction,
code,
examples,
retry_attempt=retry_attempt,
use_cache=True,
context=context, # Pass context for tracking
# use_cache=(retry_attempt == 0)
)
if not responses and retry_attempt == max_retries - 1:
return code
safe_responses.extend(self._process_responses(responses, original_code))
if safe_responses:
self.logger.info(
f"Found {len(safe_responses)} safe responses after {retry_attempt + 1} attempts"
)

Fix in Cursor Fix in Web


Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- Remove TIMEOUT_IMPLEMENTATION_SUMMARY.txt
- Remove benchmark_summary_*.txt
- Remove check_benchmark_status.sh
- Update .gitignore to prevent future tracking
- Add _find_matching_brace helper that properly handles strings, comments, and nested braces
- Update has_spec_fn_view to use improved brace matching for accurate function body extraction
- Update has_view_trait_with_todo to correctly locate View trait implementations
- Improve _extract_view_impl and _extract_view_implementation for more reliable parsing
- Fix pre-commit workflow formatting

These changes significantly improve the robustness of parsing Verus code when detecting and extracting view implementations, especially in complex code with nested braces or comments.
…ction

- Fixed insert_view_body to detect and strip minimum indentation before adding 8 spaces
  This prevents double-indentation when LLM-generated view bodies already contain proper indentation
- Improved detect_spec_fn_view_todo to search all impl blocks instead of requiring struct and impl adjacency
  Makes detection more robust for various code layouts
fn main() {}


+
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Dangling Plus Line Causes Rust Compile Error

The file ends with an extra "+\n" line after the closing brace and main function. This appears to be a diff artifact that was accidentally committed as part of the file content. This will cause a compilation error as it's not valid Rust syntax.

Fix in Cursor Fix in Web

Major Changes:
- Rename project from VerusAgent to VeriStruct throughout
  - Updated README title and all references
  - Updated script banners and help text
  - Consistent with paper title (VeriStruct)

Script Consistency Improvements:
- run_agent.py: Enhanced help text with examples
- run_bench.py: Added comprehensive examples and clarifications
- run_bench_no_cache.py: Improved documentation
- run_all_benchmarks.py: Added argparse support and --configs argument

Documentation Enhancements:
- Added quick reference table for script selection
- Added design rationale section explaining argument patterns
- Fixed all usage examples with correct arguments
- Updated repository URL to ChuyueSun/VeriStruct
- Enhanced project structure with inline argument hints

Argument Consistency:
- run_agent.py: --test-file (path) + --config (singular)
- run_bench.py: --benchmark (name) + --configs (plural)
- run_bench_no_cache.py: --benchmark (name) + --configs (plural)
- run_all_benchmarks.py: --configs (plural)

All changes maintain backward compatibility and pass linting checks.
Updated all remaining references to use VeriStruct consistently:

Documentation:
- All tutorial files (README, 01-04)
- Technical documentation (workflow, planner, modules)
- Example documentation (bitmap, ringbuffer)
- Repair module documentation

Source Code:
- src/main.py
- src/modules/baserepair.py
- src/modules/statistics_collector.py
- src/modules/progress_logger.py
- src/modules/utils.py
- src/modules/repair_registry.py

Experiments:
- experiments/README.md
- experiments/experiment_runner.py
- experiments/analyze_results.py
- experiments/run_quick_experiment.sh
- experiments/sample_corpus.json

Root Files:
- README_BASELINE.md
- README_modules.md
- YOUR_CONFIG_SETUP.md
- run_baseline_bench.py
- run_repair_effectiveness_experiment.py
- run_all_benchmarks.py
- customize.fish

LaTeX Files:
- tex/pseudocode.tex
- tex/main_execution.tex

Configuration:
- src/configs/README.md

All changes verified with grep - zero instances of 'VerusAgent' remain.
Project now uses VeriStruct as the sole name throughout.
- Modified verus_common.md: Changed instruction from 'use vector.len()' to
  'use [email protected]()' for consistency with other view operations
- Updated spec_inference.py: Added explicit guidance to use [email protected]() for
  vectors/collections in spec contexts
- Added len_syntax_analysis.md: Documentation showing both v.len() and
  [email protected]() work correctly in Verus, but standardizing on [email protected]() for
  consistency with other @ operations (v@[i], [email protected])

Both syntaxes verify successfully, but [email protected]() provides:
- Consistency with other view operations
- Explicit indication of spec-level view usage
- Clearer mental model for specifications
@ChuyueSun ChuyueSun merged commit d2f79a0 into main Nov 7, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants