Skip to content

Add automatic problem creation for scheduler failures#4

Merged
nlr-ai merged 1 commit intomainfrom
scheduler-problems
Jun 27, 2025
Merged

Add automatic problem creation for scheduler failures#4
nlr-ai merged 1 commit intomainfrom
scheduler-problems

Conversation

@nlr-ai
Copy link
Copy Markdown
Contributor

@nlr-ai nlr-ai commented Jun 27, 2025

Summary

Completes the self-healing infrastructure loop by automatically creating problem records when scheduled tasks fail, enabling Arsenale to detect and fix these issues autonomously.

What's New

Automatic Problem Creation

When any scheduled task fails, the scheduler now:

  1. Sends a Telegram notification (existing behavior)
  2. Creates a problem record in the PROBLEMS table (new)

Problem Details Include:

  • Unique Problem ID: scheduler_failure_YYYYMMDD_HHMMSS_task_name
  • Type: scheduler_task_failure
  • Severity: High (scheduler failures are critical)
  • Description: Script path, error message, and last 1000 chars of log output
  • Suggested Solutions: Common fixes like checking paths, dependencies, env vars
  • Assignment: ConsiglioDeiDieci (system problems)

Error Types Handled:

  • ✅ Non-zero return codes
  • ✅ FileNotFoundError (missing scripts)
  • ✅ General exceptions
  • ✅ Both 5-minute and hourly tasks

Duplicate Prevention

  • Checks for similar problems created in the last hour
  • Prevents spam when a task repeatedly fails

Integration with Arsenale

With this change, Arsenale can now:

  1. OBSERVE: Detect scheduler failures in the PROBLEMS table
  2. ASSESS: Analyze the error and design a fix
  3. EXECUTE: Implement the solution (fix scripts, add missing files, etc.)
  4. DOCUMENT: Verify the scheduler is working again

This creates a complete feedback loop where the infrastructure can heal itself!

Example Problem Record

{
  "ProblemId": "scheduler_failure_20250627_0315_citizen_activity_creation",
  "Type": "scheduler_task_failure",
  "Title": "Scheduler Task Failed: Citizen activity creation",
  "Description": "The scheduled task 'Citizen activity creation' failed...\n\nScript: /path/to/engine/createActivities.py\n\nError: Exception: No module named 'pyairtable'",
  "Status": "active",
  "Severity": "High",
  "Solutions": ["Check if the script exists...", "Review the error message...", "Check for missing dependencies..."]
}

This enhancement enables La Serenissima's infrastructure to autonomously detect and repair its own scheduled task failures, demonstrating true self-healing capabilities.

🤖 Generated with Claude Code

When scheduled tasks fail, the scheduler now automatically creates problem records
in the PROBLEMS table for Arsenale to detect and fix:

- Creates problems for all error types: return code failures, missing scripts, exceptions
- Applies to both 5-minute and hourly tasks
- Includes task name, script path, error details, and last log output
- Suggests common solutions (missing files, environment variables, etc.)
- Prevents duplicate problems within 1 hour window
- Assigns to ConsiglioDeiDieci as system problems

This enables Arsenale to autonomously detect and fix scheduler failures,
completing the feedback loop for self-healing infrastructure.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@nlr-ai nlr-ai merged commit 26d6cb2 into main Jun 27, 2025
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant