-
Notifications
You must be signed in to change notification settings - Fork 44
feat(vermeer): support task priority based scheduling #336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a comprehensive scheduler system with priority, dependency, and cron-based scheduling capabilities. The new scheduler replaces the previous simple queue-based system with a sophisticated task management framework supporting advanced scheduling algorithms.
Key changes:
- New scheduler architecture with modular components for algorithm, resource, task, and cron management
- Priority-based scheduling with support for task dependencies (preorders)
- Cron expression support for recurring tasks
- Worker group management and resource allocation improvements
Reviewed Changes
Copilot reviewed 40 out of 41 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| vermeer/test/scheduler/*.go | Test framework for scheduler functionality including priority, dependency, and cron testing |
| vermeer/apps/master/schedules/*.go | Core scheduler implementation with algorithm, resource, task, and cron managers |
| vermeer/client/*.go | Client API extensions for batch operations and task sequence tracking |
| vermeer/apps/structure/task.go | Task structure enhancements for scheduler fields |
| vermeer/apps/master/bl/*.go | Business logic updates to integrate with new scheduler |
| vermeer/config/*.ini | Configuration additions for scheduler parameters |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
e93187d to
d2efa3e
Compare
SummaryThis PR introduces a comprehensive scheduler redesign with priority-based scheduling, task dependencies, and cron support. The architecture is well-designed, but there are critical concurrency issues that must be addressed before merging. Verdict: REQUEST CHANGES Architecture ChangesMajor Refactoring ✅The PR completely redesigns the scheduler from a simple queue-based system to a sophisticated multi-component architecture: Old Architecture:
New Architecture:
New Features:
Critical Issues 🚨1. TOCTTOU Race Condition (scheduler_bl.go:421-423)Severity: CRITICAL // TODO: Is here need a lock? TOCTTOU
if taskInfo.State != structure.TaskStateWaiting {
logrus.Errorf("task state is not in 'Waiting' state, taskID: %v", taskInfo)
return
}The TODO correctly identifies a Time-Of-Check-Time-Of-Use vulnerability. State check and operations are not atomic. Fix: defer taskInfo.Unlock(taskInfo.Lock())
if taskInfo.State != structure.TaskStateWaiting {
// ...
}2. Resource Leak: Channel Not Closed (scheduler_bl.go:62)Severity: HIGH The Fix: Add cleanup: func (s *ScheduleBl) Shutdown() {
close(s.startChan)
}3. Dependency State Validation Missing (scheduler_bl.go:234-244)Severity: HIGH No check if dependency tasks are completed. Tasks could depend on failed/canceled tasks. Fix: if depTask.State != structure.TaskStateComplete {
return false, fmt.Errorf("dependency task %d is not complete (state: %s)", depTaskID, depTask.State)
}4. Potential Deadlock Pattern (scheduler_bl.go:97-99)Severity: HIGH Multiple functions acquire locks then call Fix: Use consistent locking strategy, consider High Priority Issues
|
| } | ||
| }() | ||
|
|
||
| // TODO: Is here need a lock? TOCTTOU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Race condition risk in task state checking
There's a TOCTTOU (Time-of-check to time-of-use) issue here. The comment indicates this concern but it should be addressed. Between checking the task state and starting the task, another goroutine could modify the state.
Suggested fix:
// Acquire lock before checking and starting
s.Lock()
defer s.Unlock(nil)
if taskInfo.State != structure.TaskStateWaiting {
logrus.Errorf("task state is not in 'Waiting' state, taskID: %v", taskInfo)
return
}
taskStarter, err := NewTaskStarter(taskInfo, agent)
if err != nil {
logrus.Errorf("failed to create new TaskStarter err: %v", err)
taskMgr.SetError(taskInfo, err.Error())
return
}
taskInfo.StartTime = time.Now()
err = taskStarter.StartTask()There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to need more time to do. It will need to change all APIs about task.setstate/taskmgr.setstate.
| return nil | ||
| } | ||
| logrus.Debugf("all tasks: %d, workerGroups: %d/%d", len(allTasks), len(idleWorkerGroups), len(concurrentWorkerGroups)) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented TODO needs resolution
This TODO indicates an important design decision that's unresolved. The scheduler needs to determine if tasks can run concurrently based on both user settings AND scheduler policies.
Action required:
- Either implement the logic to check concurrent execution capability
- Create a tracking issue for this work and reference it in the comment
- Document the temporary behavior and its limitations
Example:
// TODO(issue #XXX): Implement comprehensive concurrent execution check
// Currently only checks user settings. Need to add:
// 1. Resource availability check
// 2. Conflict detection with running tasks
// 3. Worker group capacity limits
// Temporary behavior: Tasks marked as non-exclusive may still be blocked
// if resources are insufficientThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we need to raise a new issue to solve this problem
4fc8b45 to
eda7af8
Compare
Purpose of the PR
Main Changes
Verifying these changes
I have write some scripts in /test folder.
Does this PR potentially affect the following parts?
Documentation Status
Doc - TODODoc - DoneDoc - No Need