Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#7157: Inject syscall templates dynamically in scheduler #7277

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

abhinav92003
Copy link
Contributor

@abhinav92003 abhinav92003 commented Feb 14, 2025

Adds support to the drmemtrace scheduler for injecting system call trace templates dynamically during scheduling. This obviates the need to create a separate statically-injected trace with system call trace templates.

Reuses context switch trace injection code to the extent possible.

Adds a new analyzer flag -sched_syscall_file and new scheduler_options_t fields to allow specifying the system call trace template file.

Adds a mock system call template file generated using the burst_syscall_inject test (slightly modified to use sysnums that match the checked-in test trace).

Adds various unit tests: a new scheduler unit test that verifies dynamic syscall trace injection and its effect on scheduling, analyzer tests that use -sched_syscall_file for dynamic injection in core-sharded and non-core-sharded modes, and an invariant checker test on the added mock system call template file.

Issue: #7157

Adds support to the drmemtrace scheduler for injecting system call trace templates dynamically during scheduling. This obviates the need to create a separate statically-injected trace with system call trace templates.

Reuses context switch trace injection code to the extent possible.

Adds a new analyzer flag -sched_syscall_file and new scheduler_options_t fields to allow specifying the system call trace template file.

Issue: #7157
"Av0is1ii1,Cv0is1ii1,Aiis2iii2,Ciis2iii2,Aiis1ii1,Ciis1ii1,Aiis2iii2,"
"Ciis2iii2,Aiis1ii1,Ciis1ii1");
assert(sched_as_string[1] ==
"Bv0is1ii1iis2iii2iis1ii1iis2iii2iis1ii1____________________________"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@derekbruening: Should core 2 have stolen either A or C from core 1?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a migration threshold that is almost certainly what is preventing that: it's assumed the cost of A migrating and having cold caches is higher than waiting to run on core0.

@@ -324,13 +324,21 @@ analyzer_tmpl_t<RecordType, ReaderType>::init_scheduler_common(
sched_ops = sched_type_t::make_scheduler_parallel_options(verbosity_);
sched_ops.replay_as_traced_istream = options.replay_as_traced_istream;
sched_ops.read_inputs_in_init = options.read_inputs_in_init;
sched_ops.kernel_syscall_trace_path = options.kernel_syscall_trace_path;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

analyzer.h docs for init_scheduler() say:

// For core-sharded, all of "options" is used; otherwise, only the
// read_inputs_in_init field is preserved.

Please add that kernel_syscall_trace_path is preserved.

Also, please update to match HEAD while at it:

  • Looks like that should also include replay_as_traced_istream for the current code.
  • Plus a comment saying the same rules apply should be on the other init_scheduler and init_scheduler_common I would think.

@@ -324,13 +324,21 @@ analyzer_tmpl_t<RecordType, ReaderType>::init_scheduler_common(
sched_ops = sched_type_t::make_scheduler_parallel_options(verbosity_);
sched_ops.replay_as_traced_istream = options.replay_as_traced_istream;
sched_ops.read_inputs_in_init = options.read_inputs_in_init;
sched_ops.kernel_syscall_trace_path = options.kernel_syscall_trace_path;
sched_ops.kernel_syscall_reader = std::move(options.kernel_syscall_reader);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 2 reader fields are not set by analyzer nor promised to be propagated so no reason to copy here (while not copying every other field). (Maybe worth a comment here repeating that we only preserve certain fields.)

if (worker_count_ <= 0)
worker_count_ = std::thread::hardware_concurrency();
output_count = worker_count_;
} else {
sched_ops = sched_type_t::make_scheduler_serial_options(verbosity_);
sched_ops.replay_as_traced_istream = options.replay_as_traced_istream;
sched_ops.read_inputs_in_init = options.read_inputs_in_init;
sched_ops.kernel_syscall_trace_path = options.kernel_syscall_trace_path;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point it seems worth sharing these preservation lines by joining the two parallel and serial in one else at line 323 and only splitting for the calls to make_scheduler_*_options and setting worker_count.

"sequences. The file can contain multiple sequences each with regular trace "
"headers and the sequence proper bracketed by "
"TRACE_MARKER_TYPE_CONTEXT_SWITCH_START and TRACE_MARKER_TYPE_CONTEXT_SWITCH_END "
"markers.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any actual change here: revert?

* indicating the system call it corresponds to. Sequences for
* multiple system calls are concatenated into a single file.
* Each sequence should be in the regular offline drmemtrace format.
* The sequence is inserted into the output stream after the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/The/Each/
s/after the/after any/

return sched_type_t::STATUS_ERROR_INVALID_PARAMETER;
}
}
if (switch_type != sched_type_t::SWITCH_INVALID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this check lost?

trace_marker_type_t marker_type;
uintptr_t marker_value;
// Good to queue the injected records at this point, because we now surely will
// be done with TRACE_MARKER_TYPE_SYSCALL.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be moved inside process_marker()? It feels out of place here. If for some reason it has to be here: please make a new helper function as this containing next_record() function is already long.

Basic counts tool results:
Total counts:
[1-9][0-9][0-9][0-9][0-9][0-9] total \(fetched\) instructions
5971 total unique \(fetched\) instructions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we hardcoded this number, should we hardcode the others as well as they don't vary? Or you're just trying to have fewer values to change if we update the data file? Ditto below.

sched_as_string[i] += ',';
sched_as_string[i] +=
'A' + static_cast<char>(memref.instr.tid - TID_BASE);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 6118-here plus some below look identical to test_kernel_switch_sequences() -- could we share this code via a new helper that lets each test customize the middle of the loop?

"Av0is1ii1,Cv0is1ii1,Aiis2iii2,Ciis2iii2,Aiis1ii1,Ciis1ii1,Aiis2iii2,"
"Ciis2iii2,Aiis1ii1,Ciis1ii1");
assert(sched_as_string[1] ==
"Bv0is1ii1iis2iii2iis1ii1iis2iii2iis1ii1____________________________"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a migration threshold that is almost certainly what is preventing that: it's assumed the cost of A migrating and having cold caches is higher than waiting to run on core0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants