-
Notifications
You must be signed in to change notification settings - Fork 1.1k
chore: flush after serializing big string #5401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/server/rdb_save.cc
Outdated
// We flush here because if the next element in the bucket we are serializing is a container, | ||
// it will first serialize the first entry and then flush the internal buffer, even if | ||
// crossed the limit. | ||
FlushIfNeeded(FlushState::kFlushEndEntry); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adiholden This is a general issue not only limited to strings right ? Same applies for other data types as well that are not chunked (treated as big values)? So for example, we don't
flush after we serialize a json
. What if, all the elements in the bucket are json data types ? Shall I take care of that as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p.s, I could also add a test but 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why here and not in RdbSerializer::SaveEntry at the end of the function? this way you can be sure its actually after you finished saving the entry and not in the middle if you have some other serialization after SaveValue which I see we dont have today but might in the future. By moving to the end of save entry it will apply also to all datatypes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p.s add a test
@@ -64,7 +64,7 @@ size_t SliceSnapshot::GetThreadLocalMemoryUsage() { | |||
} | |||
|
|||
bool SliceSnapshot::IsSnaphotInProgress() { | |||
return tl_slice_snapshots.size() > 0; | |||
return !tl_slice_snapshots.empty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not resist, I will refactor all those cases eventually
Signed-off-by: kostas <[email protected]>
tests/dragonfly/replication_test.py
Outdated
await wait_for_replicas_state(c_replica) | ||
|
||
# make sure capacity hasn't changed after seeding | ||
new_capacity = await get_memory(c_master, "prime_capacity") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets move this right after line 3419
tests/dragonfly/replication_test.py
Outdated
@dfly_args({"proactor_threads": 1}) | ||
async def test_big_strings(df_factory): | ||
master = df_factory.create(proactor_threads=1, serialization_max_chunk_size=1) | ||
replica = df_factory.create(proactor_threads=1, serialization_max_chunk_size=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit : serialization_max_chunk_size no affect in replica
tests/dragonfly/replication_test.py
Outdated
# inbetween. This is not a great test for this but we are limited because we can't fill | ||
# a bucket with big strings as the memory in the gh runner is fairly limited. We at | ||
# least check for correctness and *some* improvement in the memory foot print. | ||
assert peak_memory < used_memory + five_mb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this is not very stable. I think that if we want to check that we actually flushed the data we can have a statistic for serializer peak bytes (updated every time we call SerializerBase::FlushToSink) which will be printed at the end of replication , part of the statistics and you can read it in this test to make sure the bytes in no more than a single entry size
Serializing entries of a bucket during snapshot might contain large strings which potentially cross the serialization threshold imposed by
FlushIfNeeded()
. If the next element in the bucket is a big value (let's say a hash table), we will first serialize its first element of the container and then Flush which can lead to memory pressure. To avoid that, we callFlushIfNeeded()
after we serialize each string.Resolves #5394