Skip to content

Conversation

sundb
Copy link
Owner

@sundb sundb commented Sep 30, 2025

No description provided.

Copy link

codecov bot commented Sep 30, 2025

Codecov Report

❌ Patch coverage is 97.35577% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.22%. Comparing base (791a5a7) to head (56a71bc).

Files with missing lines Patch % Lines
src/networking.c 97.61% 6 Missing ⚠️
src/cluster.c 88.88% 4 Missing ⚠️
src/acl.c 93.75% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #413      +/-   ##
============================================
+ Coverage     74.83%   75.22%   +0.38%     
============================================
  Files           130      130              
  Lines         73937    74167     +230     
============================================
+ Hits          55332    55793     +461     
+ Misses        18605    18374     -231     
Files with missing lines Coverage Δ
src/aof.c 82.09% <100.00%> (+0.10%) ⬆️
src/blocked.c 98.79% <100.00%> (ø)
src/cluster.h 100.00% <ø> (ø)
src/config.c 85.42% <ø> (ø)
src/db.c 91.43% <100.00%> (+0.70%) ⬆️
src/iothread.c 87.87% <100.00%> (+0.03%) ⬆️
src/memory_prefetch.c 99.34% <100.00%> (+2.65%) ⬆️
src/module.c 82.63% <100.00%> (+0.15%) ⬆️
src/multi.c 95.76% <100.00%> (ø)
src/replication.c 87.15% <100.00%> (+0.09%) ⬆️
... and 7 more

... and 13 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you can provide a short list of things that exists in the ROF code that you didn't take here and why. it'll be easier to review and discuss.
e.g. i see you did take some portions in multi.c but not all, and it seems you didn't take the ones in cluster.c

src/server.h Outdated
#define CLIENT_REEXECUTING_COMMAND (1ULL<<50) /* The client is re-executing the command. */
#define CLIENT_REPL_RDB_CHANNEL (1ULL<<51) /* Client which is used for rdb delivery as part of rdb channel replication */
#define CLIENT_INTERNAL (1ULL<<52) /* Internal client connection */
#define CLIENT_IN_PREFETCH (1ULL<<53) /* The client is in the prefetching batch. */

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we already "burn" the term "prefetch" into our code base for this (cpu cache warmup)? maybe we can rename to avoid teminology collision with ROF

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh, i see we did (memory_prefetch.c). so maybe we can just try to avoid using the term "prefetch" without "memory" 🤷

anyway, i'm not sure what this flag is used for. and if it's about "prefetch" or "preprocess"..

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flag is only used for IO-threads.
When the client moves from the io thread to the main thread (memory prefetch is not performed), The main thread will one by one to the client for the memory prefetch (in processClientsFromIOThread()), but when the pipeline is larger than the lookahead, when querybuf still has data we will enter processInputBuffer() again.
because there may be multiple clients queuing up for prefetch at the top level, we need to avoid starting a new prefetch in it.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we already "burn" the term "prefetch" into our code base for this (cpu cache warmup)? maybe we can rename to avoid teminology collision with ROF

we can change it to memory_prefetch

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so consider renaming or commenting on the purpose or use of that flag.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done with e0ff33a

src/networking.c Outdated
__thread int thread_reusable_qb_used = 0; /* Avoid multiple clients using reusable query
* buffer due to nested command execution. */

/* COMMAND_QUEUE_MIN_CAPACITY no longer needed with linked list implementation */

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outdated?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done with 4de9dd9

src/networking.c Outdated

/* Search for end of line */
newline = strchr(c->querybuf+c->qb_pos,'\n');
newline = memchr(c->querybuf+c->qb_pos,'\n',sdslen(c->querybuf) - c->qb_pos);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that a bug in ROF or any other version?
we do know sds is always null terminated.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from VK, I will revert them.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe they had a reason.. i'm just wondering what it was

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change was from valkey-io/valkey#1485.

Changed parsing code to use memchr instead of strchr:
During parsing command, ASAN got stuck for unknown reason when called to strchr to look for the next \r
Adding assert for null-terminated querybuf didn't resolve the issue.
Switched to memchr as it's more secure and resolves the issue

feel like it is caused by some race issues, and we don't need it.

src/networking.c Outdated
if (c->multibulklen == 0) {
/* The client should have been reset */
serverAssertWithInfo(c,NULL,c->argc == 0);
/* TODO: The client should have been reset */

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is that a TODO now?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment is outdated, i'll update it.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated in 4de9dd9

* 2) When the requested size is less than the current size, because
* we always allocate argv gradually with a maximum size of 1024,
* Therefore, if argv_len exceeds this limit, we always reallocate. */
if (unlikely(c->multibulklen > c->argv_len || c->argv_len > 1024)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if we're losing some efficiency for clients without pipeline here?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll verify it. I think prefetch fills this gap.

src/networking.c Outdated
}
}

void parseInputBuffer(client *c) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you extracted a portion of processInputBuffer to a function.
it'll be harder to review, and also to merge to ROF.
maybe you can provide a list of bullets explaining the differences?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just in order to avoid this method is too big, I deleted it in 742cb79, consistency with ROF.

src/networking.c Outdated
Comment on lines 2949 to 2950
/* Parse up to lookahead commands */
while (c->pending_cmds.ready_len < lookahead && c->querybuf && c->qb_pos < sdslen(c->querybuf)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a key difference from ROF, right?
you always parse a full batch and then execute a full batch, whereas in ROF we're greedy and parse more commands on every one we execute.
for ease of merges, maybe it's a good idea to refactor the code in a way that it can serve both purposes.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, will do it.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i added variable parse_more to determine whether it's needed to parse more commands.
in ROF we can set it to 1 to parse more commands all the time.

        /* Determine if we need to parse more commands from the query buffer.
         * Only parse when there are no ready commands waiting to be processed. */
        const int parse_more = !c->pending_cmds.ready_len;

Comment on lines +1116 to +1117
pendingCommand mc;
pendingCommand *mcp = &mc;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't see that we're using the pre-calculated slot number.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done with 8cfae3f

src/server.c Outdated
Comment on lines 4074 to 4076
if (server.cluster_enabled) {
getKeysResult result = (getKeysResult)GETKEYS_RESULT_INIT;
int numkeys = getKeysFromCommand(pcmd->cmd, pcmd->argv, pcmd->argc, &result);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is conditional here, it means ACL can't re-use it and if both cluster mode and ACL are in used, this part is done twice.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is for the command filter. I will try to optimize it at the end and ensure that the command filter is available.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my thinking is that this command filter feature isn't used anywhere AFAIK, and if we have a possibility for optimizations, we can consider dropping it, or making one of these aspects break in the presence of the other.

#define GETKEYS_RESULT_INIT { 0, MAX_KEYS_BUFFER, {{0}}, NULL }

/* Parser state and parse result of a command from a client's input buffer. */
struct pendingCommand {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd like to understand why we don't have to store the client* here, unlike OSS.
probably it's because the prefetch is synchronous and in ROF it's async.
right?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's the same reason why we don't need PENDING_CMD_FLAG_MULTI (in ROF it was used only for statistics tracking, right?)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd like to understand why we don't have to store the client* here, unlike OSS. probably it's because the prefetch is synchronous and in ROF it's async. right?

yes, these two are all for BigRedis, no needed for OSS.

src/blocked.c Outdated
Comment on lines 204 to 206
/* Free the current pending command to prevent it from being executed again
* when the client is unblocked from shutdown state. */
freeClientPendingCommands(c, 1);
Copy link
Owner Author

@sundb sundb Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oranagra please take a look this fix.
when we unblock a shutdown blocked client, before this PR, we would re-enter the processInputBuffer to obtain the next command. But now, because the first command in the Pending command list still exists, we forgot to remove the first command in the pending command list, which will result in the shutdownCommand being executed again.
I'm not sure if this fix is enough. If there are other types of blockages that don't require reprocess, we all need to manually call freeClientPendingCommand().

/* This function will execute any fully parsed commands pending on
 * the client. Returns C_ERR if the client is no longer valid after executing
 * the command, and C_OK for all other cases. */
int processPendingCommandAndInputBuffer(client *c) {
    /* Notice, this code is also called from 'processUnblockedClients'.
     * But in case of a module blocked client (see RM_Call 'K' flag) we do not reach this code path.
     * So whenever we change the code here we need to consider if we need this change on module
     * blocked client as well */
    if (c->flags & CLIENT_PENDING_COMMAND) {
        c->flags &= ~CLIENT_PENDING_COMMAND;
        if (processCommandAndResetClient(c) == C_ERR) {
            return C_ERR;
        }
    }

    /* Now process client if it has more data in it's buffer.
     *
     * Note: when a master client steps into this function,
     * it can always satisfy this condition, because its querybuf
     * contains data not applied. */
    if ((c->querybuf && sdslen(c->querybuf) > 0) || c->pending_cmds.ready_len > 0) {
        return processInputBuffer(c);
    }
    return C_OK;
}

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make noopt REDIS_CFLAGS='-Werror -DLOG_REQ_RES'
./runtest --log-req-res --no-latency --dont-clean --force-resp3 --tags -slow --verbose --dump-logs --single integration/shutdown --only "Shutting down master waits for replica then fails"

then we can see two -ERR Errors trying to SHUTDOWN. Check logs. in the stdout.reqres.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other types of blockages that don't require reprocess

sorry for my lack of focus, so isn't this bug exactly because this else is skipping the actions of the if which call
prepareForNextCommand, who calls resetClientInternal that handles that?
i.e. we don't have to worry about other types of blocked commands?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. we don't have to worry about other types of blocked commands?

you're right, i'm wrong.
if a client dones't have CLIENT_PENDING_COMMAND flag, we should reset the client and remove the first command from the pending command list.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue also exists in unstable, so i made a PR to fix it.
redis#14420

src/blocked.c Outdated
* which calls reqresAppendResponse) */
reqresAppendResponse(c);
resetClient(c);
prepareForNextCommand(c);
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oranagra follow #413 (comment)
The reason why I don't add clusterSlotStatsAddNetworkBytesInForUserClient() into prepareForNextCommand() is that we don't call clusterSlotStatsAddNetworkBytesInForUserClient() in the original code.
The previous commands of this client have already been processed once through commandProcessed().
Would it be duplicated if this statistic were also calculated here?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to be sure i understand.
so you're saying there's a bug in the per-slot metrics in the ROF branch, right? specifically for blocked commands.
we can mirror this change there ASAP, or wait till this one is merged and we'll handle the conflict (ROF doesn't currently run in cluster enabled).

maybe it's a good idea to add an argument to prepareForNextCommand and let it do that from there conditionally?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done with ed08a67

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, the reason i created this prepareForNextCommand was exactly that, that i was afraid that if i'll add code on all the random places that do this (resetClient, reqresAppendResponse), and some day the upstream will do some change in all these places, i might not get a merge conflict.
so i moved these into a dedicated "prepare" function, unifying all these places, which reduces the chances something will be overlooked by a merge.

i.e. it was a change aimed for being more explicit, and also hoping conflicts 😄
in that regard, the last commit you added really serves that purpose.

size_t argv_len_sum; /* Sum of lengths of objects in argv list. */
unsigned long long input_bytes;
struct redisCommand *cmd;
getKeysResult keys_result;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see you added it, but don't see it being set or used

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm still trying to add it and not introducing the break change of the command filter.

src/server.h Outdated
/* pendingCommand flags */
enum {
PENDING_CMD_FLAG_INCOMPLETE = 1 << 0, /* Command parsing is incomplete, still waiting for more data */
PENDING_CMD_KEYRESULT_INVALID = 1 << 1, /* Key result is invalid, needs to be recomputed */
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oranagra i added a new flag to indicate that this key result is invalid.
not sure if there is a better way to do it.

src/cluster.c Outdated
Comment on lines 1182 to 1183
if (pcmd->flags & PENDING_CMD_KEYRESULT_INVALID)
getKeysFromCommand(mcmd,margv,margc,&result);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we want to do the same in ACL?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done with 0250415 (#413)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still a few other places to obtain getKeysResult, but I think those are not hot paths and we can leave it as it is.

Copy link

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the superficial comments. i assume you know this code better than me at this point, so you can look into the concerns and disprove them without me taking a closer look.

src/acl.c Outdated
aclKeyResultCache cache;
initACLKeyResultCache(&cache);
if (key_result) {
cache.keys = *key_result;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't recall how this cache works, but don't we risk mixing that cache with the one from pcmd and causing a mess? please look into it.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's to avoid repeatedly obtaining getKeysResult in the ACLSelectorCheckCmd(), now, if we can get the cache from pcm, we can init it directly outside.

* causes the failure, either 0 if the command itself fails or the idx of the key/channel
* that causes the failure */
int ACLCheckAllUserCommandPerm(user *u, struct redisCommand *cmd, robj **argv, int argc, int *idxptr) {
int ACLCheckAllUserCommandPerm(user *u, struct redisCommand *cmd, robj **argv, int argc, getKeysResult *key_result, int *idxptr) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have two concerns that i'd like you to look into:

  1. maybe even if the command pre-processing it disabled, we can somehow let ACL, and Cluster share the same getkeys result, i.e. by using lazy creation of the result. i.e. the first one that needs it, gets it, and stores it so that the second one can use it. that is assuming the command pre-processing isn't always enabled.
  2. i don't recall how far we took this effort in ROF, but i understand we did take a different path (e.g. not passing it as an explicit argument to all these functions, and using a different way / flag to detect if it was computed or not). i don't mind merging your version when we get to it, replacing what we have in ROF (it'll be some extra work, but it'll be a one time effort). so i just wanna be sure what the approach you're taking here is also suitable there, and had only benefits, and no drawbacks.

@sundb sundb requested a review from Copilot October 10, 2025 05:56
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements lookahead pre-fetching functionality to optimize command processing pipeline by parsing and pre-processing multiple commands ahead of time, improving performance through better memory prefetching and reduced processing overhead.

  • Introduces a configurable lookahead parameter to control how many commands are parsed in advance
  • Refactors command parsing to use a pending command queue system instead of immediate processing
  • Updates multi-exec transaction handling to work with the new pending command architecture

Reviewed Changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/server.h Defines new data structures for pending commands and lookahead configuration
src/networking.c Implements the core pending command queue logic and lookahead parsing
src/server.c Adds command preprocessing and lookahead initialization
src/multi.c Updates transaction handling to work with pending commands
src/cluster.c Modifies cluster routing to use cached key results from preprocessing
src/acl.c Updates ACL checking to use cached key results and atomic flag operations
tests/unit/memefficiency.tcl Disables lookahead for defrag tests to avoid interference
src/config.c Adds lookahead configuration parameter
Other files Various updates to support the new pending command system

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

sundb and others added 2 commits October 10, 2025 14:02
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
@sundb sundb requested a review from Copilot October 10, 2025 06:04
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@sundb sundb requested a review from Copilot October 10, 2025 06:07
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 1 comment.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants