-
Notifications
You must be signed in to change notification settings - Fork 0
Fix view_inference spec deletion bug + add abstraction level examples #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Implement surgical insertion in view_inference (prevents spec keyword deletion) - Add pattern detection for all 5 View types - Create 4 educational examples for abstraction level teaching - Update spec_inference with smart example selection - Validated on 13 benchmarks: 84% success rate, 100% spec preservation Primary bug FIXED: bitmap_2_todo achieves 100% verification (V=8/8)
- Remove reflection and analysis markdown files - Add .gitignore patterns to prevent future tracking - Keep files locally for reference but not in repository
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Broken checkpoint best-tracking overwrites better scores
Incorrect checkpoint best tracking logic that prevents proper score comparison. After retrieving checkpoint_best_score and checkpoint_best_code from context (lines 732-735), the code initializes them if they're None (lines 738-745). However, this initialization logic never compares the new best_score against the existing checkpoint_best_score to determine which is actually better. The code unconditionally overwrites both values with the current best_code and best_score on lines 747-748, even if the checkpoint best was superior. This means the global best tracking will always use the most recent result rather than the best result across all attempts, defeating the purpose of checkpoint best tracking.
The correct logic should compare scores and only update if the new score is better (or if checkpoint_best_score is None). This bug causes the system to lose track of the best code encountered during the workflow.
src/modules/view_inference.py#L731-L749
VeriStruct/src/modules/view_inference.py
Lines 731 to 749 in dc974bc
| # Use cache only for first attempt | |
| responses = self._get_llm_responses( | |
| instruction, | |
| code, | |
| examples, | |
| retry_attempt=retry_attempt, | |
| use_cache=True, | |
| context=context, # Pass context for tracking | |
| # use_cache=(retry_attempt == 0) | |
| ) | |
| if not responses and retry_attempt == max_retries - 1: | |
| return code | |
| safe_responses.extend(self._process_responses(responses, original_code)) | |
| if safe_responses: | |
| self.logger.info( | |
| f"Found {len(safe_responses)} safe responses after {retry_attempt + 1} attempts" | |
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- Remove TIMEOUT_IMPLEMENTATION_SUMMARY.txt - Remove benchmark_summary_*.txt - Remove check_benchmark_status.sh - Update .gitignore to prevent future tracking
- Add _find_matching_brace helper that properly handles strings, comments, and nested braces - Update has_spec_fn_view to use improved brace matching for accurate function body extraction - Update has_view_trait_with_todo to correctly locate View trait implementations - Improve _extract_view_impl and _extract_view_implementation for more reliable parsing - Fix pre-commit workflow formatting These changes significantly improve the robustness of parsing Verus code when detecting and extracting view implementations, especially in complex code with nested braces or comments.
…ction - Fixed insert_view_body to detect and strip minimum indentation before adding 8 spaces This prevents double-indentation when LLM-generated view bodies already contain proper indentation - Improved detect_spec_fn_view_todo to search all impl blocks instead of requiring struct and impl adjacency Makes detection more robust for various code layouts
| fn main() {} | ||
|
|
||
|
|
||
| + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Major Changes: - Rename project from VerusAgent to VeriStruct throughout - Updated README title and all references - Updated script banners and help text - Consistent with paper title (VeriStruct) Script Consistency Improvements: - run_agent.py: Enhanced help text with examples - run_bench.py: Added comprehensive examples and clarifications - run_bench_no_cache.py: Improved documentation - run_all_benchmarks.py: Added argparse support and --configs argument Documentation Enhancements: - Added quick reference table for script selection - Added design rationale section explaining argument patterns - Fixed all usage examples with correct arguments - Updated repository URL to ChuyueSun/VeriStruct - Enhanced project structure with inline argument hints Argument Consistency: - run_agent.py: --test-file (path) + --config (singular) - run_bench.py: --benchmark (name) + --configs (plural) - run_bench_no_cache.py: --benchmark (name) + --configs (plural) - run_all_benchmarks.py: --configs (plural) All changes maintain backward compatibility and pass linting checks.
Updated all remaining references to use VeriStruct consistently: Documentation: - All tutorial files (README, 01-04) - Technical documentation (workflow, planner, modules) - Example documentation (bitmap, ringbuffer) - Repair module documentation Source Code: - src/main.py - src/modules/baserepair.py - src/modules/statistics_collector.py - src/modules/progress_logger.py - src/modules/utils.py - src/modules/repair_registry.py Experiments: - experiments/README.md - experiments/experiment_runner.py - experiments/analyze_results.py - experiments/run_quick_experiment.sh - experiments/sample_corpus.json Root Files: - README_BASELINE.md - README_modules.md - YOUR_CONFIG_SETUP.md - run_baseline_bench.py - run_repair_effectiveness_experiment.py - run_all_benchmarks.py - customize.fish LaTeX Files: - tex/pseudocode.tex - tex/main_execution.tex Configuration: - src/configs/README.md All changes verified with grep - zero instances of 'VerusAgent' remain. Project now uses VeriStruct as the sole name throughout.
- Modified verus_common.md: Changed instruction from 'use vector.len()' to 'use [email protected]()' for consistency with other view operations - Updated spec_inference.py: Added explicit guidance to use [email protected]() for vectors/collections in spec contexts - Added len_syntax_analysis.md: Documentation showing both v.len() and [email protected]() work correctly in Verus, but standardizing on [email protected]() for consistency with other @ operations (v@[i], [email protected]) Both syntaxes verify successfully, but [email protected]() provides: - Consistency with other view operations - Explicit indication of spec-level view usage - Clearer mental model for specifications
Primary bug FIXED: bitmap_2_todo achieves 100% verification (V=8/8)
Note
Overhauls view/spec inference with surgical insertion and abstraction-aware examples, adds repair-round timeouts, experiments/analysis tooling, CI/pre-commit, and comprehensive docs with VeriStruct rebrand.
spec fn view/impl ViewTODOs; robust brace matching; insert body/impl safely with delimiter/safety checks.run_agent.py,run_bench*.py,run_all_benchmarks.py(parallelism, no-cache, UX/messages); addanalyze_results.py(root) and analysis tooling.experiments/framework (runner, analyzer, quick script, sample corpus, results reporting & visualizations).bit.rs); numerous small refactors, logging, and robustness improvements.Written by Cursor Bugbot for commit 6090459. This will update automatically on new commits. Configure here.