Track active tasks' data separately from "archived" ones' #24

ms705 · 2015-06-19T07:33:48Z

We currently keep track of all tasks known to a coordinator in the TaskMap_t data structure owned by the Coordinator. This contains tasks in new, runnable, running, completed, failed and various other states. We use it for the web UI, scheduling and the management of task-specific data structures.

However, the flow graph (and, consequently, the cost models) sometimes needs to iterate over all tasks that are currently of interest to the scheduler (i.e., those which are still eligible for scheduling: runnable, running and failed ones), and can get tripped up by "archived" tasks that are still in the task map.

In order to increase the efficiency of such iterations and clear up the semantics, we should de-conflate the two purposes of the task map. There are several options for this:

Establish a separate data structure in the flow scheduler that keeps track of all tasks that are of interest to it.
- Pros: easy, not a breaking change, compatible with factoring the flow scheduler into a standalone module
- Cons: duplication of bookkeeping, need to manage another data structure, memory overhead
Re-designate the task map to only contain active tasks, and have an archival map for those that are no longer active.
- Pros: no memory overhead, clear separation of concerns
- Cons: major architectural change, need to still manage two data structures, potential for inconsistency
Garbage-collect finished tasks' state at some time after they finish (as in Mesos), and retire any information we want to retain to the knowledge base. --
- Pros: clean solution, also addresses state accumulation issues, clear separation of concerns
- Cons: invasive change that touches assumptions, needs state migration logic

Interested in views on what the best way forward is.

The text was updated successfully, but these errors were encountered:

ICGog · 2015-07-06T02:05:39Z

My feeling is that option 2 is the right way to do it. However, it depends on how much re-factoring we would have to do for the change. If it is relatively difficult, then we can go for option 1. It looks like a good in-between solution: not too-hacky and not difficult to get in.

Option 3 is the least appealing to me because it doesn't simplify at all the code of the cost models. We would still have to do all the tests to see if a task is active. Moreover, we would have additional knobs to twist (e.g., interval at which to GC tasks, number of inactive tasks stored before GC is triggered).

ms705 added scheduling infrastructure improvement labels Jun 19, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track active tasks' data separately from "archived" ones' #24

Track active tasks' data separately from "archived" ones' #24

ms705 commented Jun 19, 2015

ICGog commented Jul 6, 2015

Track active tasks' data separately from "archived" ones' #24

Track active tasks' data separately from "archived" ones' #24

Comments

ms705 commented Jun 19, 2015

ICGog commented Jul 6, 2015