Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf event array map kernel-side implementation #4144

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

mikeagun
Copy link
Contributor

@mikeagun mikeagun commented Jan 16, 2025

Description

Implements the kernel side of perf event array maps.

This includes the core and kernel changes for #658, with part of the user-side API skeleton.

Testing

Includes platform and execution context tests.

Documentation

Documented in code and PerfEventArray.md.

Installation

N/A

@mikeagun mikeagun marked this pull request as ready for review February 4, 2025 19:58
@mikeagun mikeagun changed the title DRAFT Perf event array map implementation. Perf event array map kernel-side implementation. Feb 4, 2025
@mikeagun
Copy link
Contributor Author

Moving to draft until the ring buffer refactoring is merged in #4204

@mikeagun mikeagun marked this pull request as draft February 12, 2025 21:54
@mikeagun
Copy link
Contributor Author

This is currently blocked on #4204, as it uses the new ring buffer for wait-free reserve (at dispatch) in the per-cpu rings. Most of the refactoring to use the new ring buffer has been done, but after the new ring buffer implementation is merged I'll update+publish this again.

@mikeagun
Copy link
Contributor Author

mikeagun commented Mar 7, 2025

I split the context header support requirement out into #4267. After that passes all CICD tests I'll rebase this PR split into two commits - context header support then perf event array impl.

@mikeagun mikeagun changed the title Perf event array map kernel-side implementation. Perf event array map kernel-side implementation Mar 7, 2025
ebpf_map_create(&map_name, &map_definition, (uintptr_t)ebpf_handle_invalid, &local_map) == EBPF_SUCCESS);
map.reset(local_map);
}
// TODO (before merge): implementme
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment to track porting the necessary platform unit tests over to execution context unit tests using the new ebpf_maps based definition (vs. the old ebpf_perf_event_array based implementation).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From lines 1608 and 1610, it looks like this PR should still be in Draft?

case BIND_OPERATION_BIND:
if (ctx->app_id_end > ctx->app_id_start) {
(void)bpf_perf_event_output(
ctx, &process_map, flags, ctx->app_id_start, ctx->app_id_end - ctx->app_id_start);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bind context already has app_id_start and app_id_end as data begin and data end, which bpf_perf_event_output should prepend (assuming thats what the flags does). Why not add another random event payload?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It now passes the ctxlen flag to copy the app id and outputs a random value for the data argument.

*/
_Must_inspect_result_ ebpf_result_t
ebpf_ring_buffer_reserve_exclusive(
_Inout_ ebpf_ring_buffer_t* ring_buffer, _Outptr_result_bytebuffer_(length) uint8_t** data, size_t length);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these rings "pinned" to a specific core?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

There is one ring per core and the whole reserve_exclusive+memcpy+submit is done at dispatch to the ring for the current cpu -- so there can only be one producer writing to a single ring at a time.

I followed the linux flag semantics - below dispatch you must specify the EBPF_MAP_FLAG_CURRENT_CPU flag, and at dispatch you can either use auto-cpu or manually specify the cpu you are currently running on.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest updating the routine description to reflect that - what the word "exclusive" means.

ebpf_list_entry_t contexts;
} ebpf_core_map_async_contexts_t;

typedef struct _ebpf_core_ring_buffer_map
Copy link
Collaborator

@shankarseal shankarseal Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider:

Suggested change
typedef struct _ebpf_core_ring_buffer_map
typedef struct _ebpf_core_perf_ring
{
ebpf_ring_buffer_t ring;
ebpf_core_map_async_contexts_t async;
} ebpf_core_ring_t;
typedef struct _ebpf_core_ring_buffer_map
{
ebpf_core_map_t core_map;
ebpf_core_ring_t ring;
} ebpf_core_ring_buffer_map_t;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did try that out, but it doesn't currently save any lines of code, just adds a few lines/makes lines longer.

With some higher level restructuring it could maybe save a few lines, but the common functionality between perf array and ringbuf already uses shared code so the extra nesting doesn't simplify things.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its just that it will look more symmetrical. I suggest you create an issue with these nit-pickish comments to be resolved a later time.

* @retval EBPF_NO_MEMORY Out of memory.
*/
_Must_inspect_result_ ebpf_result_t
ebpf_perf_event_array_map_write(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can user mode write into perf_event_array map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't support this yet via the libbpf interface (for ringbuf or perf array).

We already had the core functionality though for ringbuf, so I added the corresponding code for perf array.


Exit:
EBPF_OBJECT_RELEASE_REFERENCE((ebpf_core_object_t*)map);
if (reference_taken) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: EBPF_OBJECT_RELEASE_REFERENCE already handles nullptr, do we need this check in that case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I saw that some of the existing code used that pattern -- so I added it for the other updated ioctls. If there wasn't a perf/analyze reason for that though I can remove the reference_taken bool and checks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recommended pattern is to not add if checks here, to reduce the number of code paths. I.e., reduce code complexity and improve code coverage.

static _Requires_lock_held_(ring_buffer_map->lock) void _ebpf_ring_buffer_map_signal_async_query_complete(
_Inout_ ebpf_core_ring_buffer_map_t* ring_buffer_map)
typedef void
map_async_query_complete_t(_In_ const ebpf_core_map_t* map, uint64_t index);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Try again to update SAL to cover locking.

_ebpf_ring_buffer_map_cancel_async_query(_In_ _Frees_ptr_ void* cancel_context)
{
EBPF_LOG_ENTRY();
ebpf_core_map_async_query_context_t* context = (ebpf_core_map_async_query_context_t*)cancel_context;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the cancel context is cast to the same type ebpf_core_map_async_query_context_t* both here and in line 2273, why not change the parameter type to ebpf_core_map_async_query_context_t* ?

@@ -8,7 +8,17 @@

CXPLAT_EXTERN_C_BEGIN

typedef struct _ebpf_ring_buffer ebpf_ring_buffer_t;
typedef struct _ebpf_ring_buffer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non blocking: consider creating a issue to make this struct opaque with "getter" functions. You can resolve this comment.

ebpf_perf_event_array_map_output(
_In_ void* ctx, _Inout_ ebpf_map_t* map, uint64_t flags, _In_reads_bytes_(length) uint8_t* data, size_t length)
{

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider ebpf_perf_event_array_map_output calling ebpf_perf_event_array_map_output_simple.

/**
* @brief Write out a variable sized record to the perf event array.
*
* Writes a simple record to the ring for the current CPU.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider not using the word simple, and if you must, the routing description must explain what this means.

*/
EBPF_INLINE_HINT
_Must_inspect_result_ ebpf_result_t
ebpf_perf_event_array_map_output(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ebpf_perf_event_array_map_output(
ebpf_perf_event_array_map_output_with_capture(

* @retval EBPF_OUT_OF_SPACE Unable to output to ring buffer due to inadequate space.
*/
_Must_inspect_result_ ebpf_result_t
ebpf_perf_event_array_map_output_simple(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ebpf_perf_event_array_map_output_simple(
ebpf_perf_event_array_map_output(

passed to the eBPF program. To support this feature, the extension can use the macro `EBPF_CONTEXT_HEADER` to include
the context header at the start of the program context structure. Even when the context header is added, the pointer
passed to the eBPF program is after the context header.
This is required for all extensions to support so the core can store runtime state needed by helpers.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's required, why do we need a flag? When would the flag be false?

@@ -1,12 +1,12 @@
# Perf Event Array

This document proposes support for a simplified version of the linux bpf map type BPF_MAP_TYPE_PERF_EVENT_ARRAY.
This document describes the in-progress support for the bpf map type BPF_MAP_TYPE_PERF_EVENT_ARRAY.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's only "in progress" then can you reference an issue number here that tracks the completion of it?
Want to make sure this line gets updated by the time it's complete.

#define EBPF_MAP_FLAG_INDEX_MASK 0xffffffffULL
#define EBPF_MAP_FLAG_INDEX_SHIFT 0
#define EBPF_MAP_FLAG_CURRENT_CPU EBPF_MAP_FLAG_INDEX_MASK
/* BPF_FUNC_perf_event_output flags for program types with data pointer in context */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* BPF_FUNC_perf_event_output flags for program types with data pointer in context */
/* BPF_FUNC_perf_event_output flags for program types with data pointer in context. */

nit: For consistency with line 417


Exit:
EBPF_OBJECT_RELEASE_REFERENCE((ebpf_core_object_t*)map);
if (reference_taken) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recommended pattern is to not add if checks here, to reduce the number of code paths. I.e., reduce code complexity and improve code coverage.

*
* Flags is used to select the CPU to write to and the context data to copy.
* - EBPF_MAP_FLAG_INDEX_MASK - Mask for specifying cpu.
* - EBPF_MAP_FLAG_CURRENT_CPU - Select current cpu.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* - EBPF_MAP_FLAG_CURRENT_CPU - Select current cpu.
* - EBPF_MAP_FLAG_CURRENT_CPU - Select current cpu.

* Flags is used to select the CPU to write to and the context data to copy.
* - EBPF_MAP_FLAG_INDEX_MASK - Mask for specifying cpu.
* - EBPF_MAP_FLAG_CURRENT_CPU - Select current cpu.
* - EBPF_MAP_FLAG_CTXLEN_MASK - Mask for specifying context length.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is "LEN" abbreviated when (say) "length" is spelled out as an argument?
The coding style guidelines in this project say to avoid abbreviations (unless it's for parity with Linux or to match abbreviations in the language or a library being called).

/**
* @brief Get the data start and end pointers from the program context.
*
* @note Extension must support context headers.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @note Extension must support context headers.
* @note Extension must support context headers.

Make doxygen line up. Also in line 495 and perhaps elsewhere.

ebpf_map_create(&map_name, &map_definition, (uintptr_t)ebpf_handle_invalid, &local_map) == EBPF_SUCCESS);
map.reset(local_map);
}
// TODO (before merge): implementme
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From lines 1608 and 1610, it looks like this PR should still be in Draft?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

eBPF for Windows should support perf_event_array
5 participants