-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf event array map kernel-side implementation #4144
base: main
Are you sure you want to change the base?
Conversation
7623a4f
to
10d4f86
Compare
d5f2cea
to
a5f9711
Compare
Moving to draft until the ring buffer refactoring is merged in #4204 |
This is currently blocked on #4204, as it uses the new ring buffer for wait-free reserve (at dispatch) in the per-cpu rings. Most of the refactoring to use the new ring buffer has been done, but after the new ring buffer implementation is merged I'll update+publish this again. |
6513f71
to
547a6e1
Compare
I split the context header support requirement out into #4267. After that passes all CICD tests I'll rebase this PR split into two commits - context header support then perf event array impl. |
ebpf_map_create(&map_name, &map_definition, (uintptr_t)ebpf_handle_invalid, &local_map) == EBPF_SUCCESS); | ||
map.reset(local_map); | ||
} | ||
// TODO (before merge): implementme |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment to track porting the necessary platform unit tests over to execution context unit tests using the new ebpf_maps based definition (vs. the old ebpf_perf_event_array based implementation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From lines 1608 and 1610, it looks like this PR should still be in Draft?
case BIND_OPERATION_BIND: | ||
if (ctx->app_id_end > ctx->app_id_start) { | ||
(void)bpf_perf_event_output( | ||
ctx, &process_map, flags, ctx->app_id_start, ctx->app_id_end - ctx->app_id_start); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bind context already has app_id_start and app_id_end as data begin and data end, which bpf_perf_event_output should prepend (assuming thats what the flags does). Why not add another random event payload?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It now passes the ctxlen flag to copy the app id and outputs a random value for the data argument.
*/ | ||
_Must_inspect_result_ ebpf_result_t | ||
ebpf_ring_buffer_reserve_exclusive( | ||
_Inout_ ebpf_ring_buffer_t* ring_buffer, _Outptr_result_bytebuffer_(length) uint8_t** data, size_t length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these rings "pinned" to a specific core?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There is one ring per core and the whole reserve_exclusive+memcpy+submit is done at dispatch to the ring for the current cpu -- so there can only be one producer writing to a single ring at a time.
I followed the linux flag semantics - below dispatch you must specify the EBPF_MAP_FLAG_CURRENT_CPU
flag, and at dispatch you can either use auto-cpu or manually specify the cpu you are currently running on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest updating the routine description to reflect that - what the word "exclusive" means.
ebpf_list_entry_t contexts; | ||
} ebpf_core_map_async_contexts_t; | ||
|
||
typedef struct _ebpf_core_ring_buffer_map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider:
typedef struct _ebpf_core_ring_buffer_map | |
typedef struct _ebpf_core_perf_ring | |
{ | |
ebpf_ring_buffer_t ring; | |
ebpf_core_map_async_contexts_t async; | |
} ebpf_core_ring_t; | |
typedef struct _ebpf_core_ring_buffer_map | |
{ | |
ebpf_core_map_t core_map; | |
ebpf_core_ring_t ring; | |
} ebpf_core_ring_buffer_map_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did try that out, but it doesn't currently save any lines of code, just adds a few lines/makes lines longer.
With some higher level restructuring it could maybe save a few lines, but the common functionality between perf array and ringbuf already uses shared code so the extra nesting doesn't simplify things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its just that it will look more symmetrical. I suggest you create an issue with these nit-pickish comments to be resolved a later time.
* @retval EBPF_NO_MEMORY Out of memory. | ||
*/ | ||
_Must_inspect_result_ ebpf_result_t | ||
ebpf_perf_event_array_map_write( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can user mode write into perf_event_array map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't support this yet via the libbpf interface (for ringbuf or perf array).
We already had the core functionality though for ringbuf, so I added the corresponding code for perf array.
|
||
Exit: | ||
EBPF_OBJECT_RELEASE_REFERENCE((ebpf_core_object_t*)map); | ||
if (reference_taken) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: EBPF_OBJECT_RELEASE_REFERENCE
already handles nullptr, do we need this check in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. I saw that some of the existing code used that pattern -- so I added it for the other updated ioctls. If there wasn't a perf/analyze reason for that though I can remove the reference_taken
bool and checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The recommended pattern is to not add if checks here, to reduce the number of code paths. I.e., reduce code complexity and improve code coverage.
static _Requires_lock_held_(ring_buffer_map->lock) void _ebpf_ring_buffer_map_signal_async_query_complete( | ||
_Inout_ ebpf_core_ring_buffer_map_t* ring_buffer_map) | ||
typedef void | ||
map_async_query_complete_t(_In_ const ebpf_core_map_t* map, uint64_t index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Try again to update SAL to cover locking.
_ebpf_ring_buffer_map_cancel_async_query(_In_ _Frees_ptr_ void* cancel_context) | ||
{ | ||
EBPF_LOG_ENTRY(); | ||
ebpf_core_map_async_query_context_t* context = (ebpf_core_map_async_query_context_t*)cancel_context; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the cancel context is cast to the same type ebpf_core_map_async_query_context_t*
both here and in line 2273, why not change the parameter type to ebpf_core_map_async_query_context_t*
?
@@ -8,7 +8,17 @@ | |||
|
|||
CXPLAT_EXTERN_C_BEGIN | |||
|
|||
typedef struct _ebpf_ring_buffer ebpf_ring_buffer_t; | |||
typedef struct _ebpf_ring_buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non blocking: consider creating a issue to make this struct opaque with "getter" functions. You can resolve this comment.
ebpf_perf_event_array_map_output( | ||
_In_ void* ctx, _Inout_ ebpf_map_t* map, uint64_t flags, _In_reads_bytes_(length) uint8_t* data, size_t length) | ||
{ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider ebpf_perf_event_array_map_output
calling ebpf_perf_event_array_map_output_simple
.
/** | ||
* @brief Write out a variable sized record to the perf event array. | ||
* | ||
* Writes a simple record to the ring for the current CPU. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider not using the word simple, and if you must, the routing description must explain what this means.
*/ | ||
EBPF_INLINE_HINT | ||
_Must_inspect_result_ ebpf_result_t | ||
ebpf_perf_event_array_map_output( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ebpf_perf_event_array_map_output( | |
ebpf_perf_event_array_map_output_with_capture( |
* @retval EBPF_OUT_OF_SPACE Unable to output to ring buffer due to inadequate space. | ||
*/ | ||
_Must_inspect_result_ ebpf_result_t | ||
ebpf_perf_event_array_map_output_simple( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ebpf_perf_event_array_map_output_simple( | |
ebpf_perf_event_array_map_output( |
docs/eBpfExtensions.md
Outdated
passed to the eBPF program. To support this feature, the extension can use the macro `EBPF_CONTEXT_HEADER` to include | ||
the context header at the start of the program context structure. Even when the context header is added, the pointer | ||
passed to the eBPF program is after the context header. | ||
This is required for all extensions to support so the core can store runtime state needed by helpers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's required, why do we need a flag? When would the flag be false?
@@ -1,12 +1,12 @@ | |||
# Perf Event Array | |||
|
|||
This document proposes support for a simplified version of the linux bpf map type BPF_MAP_TYPE_PERF_EVENT_ARRAY. | |||
This document describes the in-progress support for the bpf map type BPF_MAP_TYPE_PERF_EVENT_ARRAY. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it's only "in progress" then can you reference an issue number here that tracks the completion of it?
Want to make sure this line gets updated by the time it's complete.
#define EBPF_MAP_FLAG_INDEX_MASK 0xffffffffULL | ||
#define EBPF_MAP_FLAG_INDEX_SHIFT 0 | ||
#define EBPF_MAP_FLAG_CURRENT_CPU EBPF_MAP_FLAG_INDEX_MASK | ||
/* BPF_FUNC_perf_event_output flags for program types with data pointer in context */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/* BPF_FUNC_perf_event_output flags for program types with data pointer in context */ | |
/* BPF_FUNC_perf_event_output flags for program types with data pointer in context. */ |
nit: For consistency with line 417
|
||
Exit: | ||
EBPF_OBJECT_RELEASE_REFERENCE((ebpf_core_object_t*)map); | ||
if (reference_taken) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The recommended pattern is to not add if checks here, to reduce the number of code paths. I.e., reduce code complexity and improve code coverage.
* | ||
* Flags is used to select the CPU to write to and the context data to copy. | ||
* - EBPF_MAP_FLAG_INDEX_MASK - Mask for specifying cpu. | ||
* - EBPF_MAP_FLAG_CURRENT_CPU - Select current cpu. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* - EBPF_MAP_FLAG_CURRENT_CPU - Select current cpu. | |
* - EBPF_MAP_FLAG_CURRENT_CPU - Select current cpu. |
* Flags is used to select the CPU to write to and the context data to copy. | ||
* - EBPF_MAP_FLAG_INDEX_MASK - Mask for specifying cpu. | ||
* - EBPF_MAP_FLAG_CURRENT_CPU - Select current cpu. | ||
* - EBPF_MAP_FLAG_CTXLEN_MASK - Mask for specifying context length. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is "LEN" abbreviated when (say) "length" is spelled out as an argument?
The coding style guidelines in this project say to avoid abbreviations (unless it's for parity with Linux or to match abbreviations in the language or a library being called).
/** | ||
* @brief Get the data start and end pointers from the program context. | ||
* | ||
* @note Extension must support context headers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* @note Extension must support context headers. | |
* @note Extension must support context headers. |
Make doxygen line up. Also in line 495 and perhaps elsewhere.
ebpf_map_create(&map_name, &map_definition, (uintptr_t)ebpf_handle_invalid, &local_map) == EBPF_SUCCESS); | ||
map.reset(local_map); | ||
} | ||
// TODO (before merge): implementme |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From lines 1608 and 1610, it looks like this PR should still be in Draft?
Description
Implements the kernel side of perf event array maps.
This includes the core and kernel changes for #658, with part of the user-side API skeleton.
Testing
Includes platform and execution context tests.
Documentation
Documented in code and PerfEventArray.md.
Installation
N/A