-
Notifications
You must be signed in to change notification settings - Fork 10
feat: StringValueFormat & BaseMetaFilter #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Warning Rate limit exceeded@guozhihao-224 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 0 minutes and 20 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughThis update introduces a new abstraction for value encoding and parsing in the storage module by replacing concrete structs with trait-based designs for internal and parsed values. It adds a dedicated module for string value format handling, implements a compaction filter and its factory for RocksDB, and revises error handling for data type conversions. The changes also update module imports, expose new public constants, and add dependencies on the Changes
Sequence Diagram(s)sequenceDiagram
participant RocksDB
participant BaseMetaFilterFactory
participant BaseMetaFilter
participant ParsedStringsValue
RocksDB->>BaseMetaFilterFactory: create_compaction_filter()
BaseMetaFilterFactory->>BaseMetaFilter: new()
RocksDB->>BaseMetaFilter: filter(key, value)
BaseMetaFilter->>ParsedStringsValue: parse value
ParsedStringsValue->>BaseMetaFilter: filter_decision(current_time)
BaseMetaFilter->>RocksDB: return CompactionDecision
Possibly related PRs
Suggested reviewers
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (6)
src/storage/mod.rs (1)
15-15
: Consider updating comment or removing entirely.The commented-out module (
base_data_value_format
) suggests a transition to the new trait-based design. If this module is no longer needed, consider removing it entirely rather than leaving it as a comment to maintain code clarity.src/storage/base_filter.rs (2)
92-108
: Test might be flaky due to sleep dependency.The test relies on
std::thread::sleep
to verify the TTL expiration, which can make it flaky in CI environments with variable performance. Consider using a time-independent approach where you can directly manipulate the current time or the expiration time for testing.#[test] fn test_strings_filter() { let mut filter = BaseMetaFilter::default(); let string_val: &'static [u8] = b"filter_val"; let mut string_val = crate::storage::strings_value_format::StringValue::new(string_val); - let ttl = 1_000_000; // 1 秒 = 1,000,000 微秒 + let ttl = 1_000_000; // 1 second = 1,000,000 microseconds crate::storage::base_value_format::InternalValue::set_relative_timestamp(&mut string_val, ttl); - let decision = filter.filter(0, b"filter_key", &crate::storage::base_value_format::InternalValue::encode(&string_val)); + // Test when current time is before expiration + let encoded_val = crate::storage::base_value_format::InternalValue::encode(&string_val); + let decision = filter.filter(0, b"filter_key", &encoded_val); assert!(matches!(decision, CompactionDecision::Keep)); - std::thread::sleep(std::time::Duration::from_secs(2)); - - let decision = filter.filter(0, b"filter_key", &crate::storage::base_value_format::InternalValue::encode(&string_val)); + // Create a separate test for expiration + // This eliminates the need for sleep and makes the test more reliable + } + + #[test] + fn test_strings_filter_expired() { + let mut filter = BaseMetaFilter::default(); + + // Create a value that's already expired + let string_val: &'static [u8] = b"filter_val"; + let mut string_val = crate::storage::strings_value_format::StringValue::new(string_val); + + // Set expiration time to 1 microsecond in the past + let now = Utc::now().timestamp_micros() as u64; + string_val.set_etime(now - 1); + + let encoded_val = crate::storage::base_value_format::InternalValue::encode(&string_val); + let decision = filter.filter(0, b"filter_key", &encoded_val); assert!(matches!(decision, CompactionDecision::Remove)); }Note: The above implementation assumes you can add a
set_etime
method to directly set the expiration time for testing purposes. If such a method doesn't exist, you might need to modify the approach.
98-98
: Consider translating Chinese comment.For consistency and clarity in an open-source project, consider translating the Chinese comment to English. This will help ensure all contributors can understand the code.
- let ttl = 1_000_000; // 1 秒 = 1,000,000 微秒 + let ttl = 1_000_000; // 1 second = 1,000,000 microsecondssrc/storage/base_value_format.rs (2)
37-53
: Consider leveragingnum_enum::TryFromPrimitive
to shrink boilerplateThe handcrafted
impl TryFrom<u8>
exhaustively matches every variant. Using a derive‑based helper likenum_enum::TryFromPrimitive
(orstrum
) would:
- remove this 13‑line match,
- guarantee the mapping always stays in sync when the enum is extended,
- and still return a custom
StorageError
for unknown values.-use super::error::{ - Result, - StorageError::{self, InvalidFormat}, -}; +use num_enum::TryFromPrimitive; + +use super::error::{Result, StorageError::InvalidFormat}; + +#[derive(Debug, Clone, Copy, PartialEq, Eq, TryFromPrimitive)] +#[repr(u8)] +pub enum DataType { + String = 0, + Hash = 1, + Set = 2, + List = 3, + ZSet = 4, + None = 5, + All = 6, +}This trims maintenance overhead and removes the lingering
TODO: use unified Result
.
74-81
: Expose an immutable buffer to avoid unnecessary cloning
encode(&self) -> BytesMut
forces every caller to clone/ freeze the buffer before storage. Returningbytes::Bytes
(or aBytesMut
by value plus a helper to freeze) reduces foot‑guns and better conveys immutability of the encoded payload.-pub trait InternalValue { - fn encode(&self) -> BytesMut; +pub trait InternalValue { + /// Returns an immutable, ready‑to‑persist slice. + fn encode(&self) -> Bytes;Adapting concrete impls is trivial (
BytesMut::freeze
).src/storage/strings_value_format.rs (1)
96-100
: Avoid an extra copy when wrapping the input slice
Bytes::copy_from_slice
always allocates and copies. When the lifetime allows,Bytes::from(input_data.to_vec())
performs the same copy, but if the slice already originates from aBytes
/BytesMut
, you could instead accept aBytes
directly or useBytes::from_static
for zero‑copy.This is a micro‑optimisation, but worthwhile in hot paths like compaction filtering.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (6)
Cargo.toml
(1 hunks)src/storage/base_filter.rs
(1 hunks)src/storage/base_value_format.rs
(3 hunks)src/storage/mod.rs
(2 hunks)src/storage/storage_define.rs
(2 hunks)src/storage/strings_value_format.rs
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
src/storage/base_filter.rs (3)
src/storage/base_key_format.rs (1)
key
(108-110)src/storage/strings_value_format.rs (5)
new
(30-39)new
(80-109)data_type
(111-113)set_relative_timestamp
(65-67)encode
(43-55)src/storage/base_value_format.rs (5)
new
(86-88)data_type
(91-91)try_from
(41-52)set_relative_timestamp
(80-80)encode
(77-77)
src/storage/base_value_format.rs (3)
src/storage/strings_value_format.rs (12)
encode
(43-55)set_etime
(57-59)etime
(123-125)set_ctime
(61-63)ctime
(119-121)set_relative_timestamp
(65-67)new
(30-39)new
(80-109)filter_decision
(131-136)data_type
(111-113)user_value
(115-117)reserve
(127-129)src/storage/redis_multi.rs (1)
ttl
(188-219)src/kstd/slice.rs (2)
new
(30-32)data
(43-45)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: cargo miri (ubuntu-latest)
- GitHub Check: cargo miri (windows-latest)
- GitHub Check: cargo miri (macos-latest)
🔇 Additional comments (8)
Cargo.toml (1)
21-22
: Dependencies added correctly for new functionality.The additions of
bytes
for buffer management andchrono
for timestamp handling align well with the new string value format and compaction filter implementation. Both are established libraries with stable versions.src/storage/mod.rs (1)
25-27
: New modules properly registered.The new modules for string value format, error handling, and compaction filtering are correctly registered. The organization follows the project structure and makes the new functionality available within the storage module.
src/storage/storage_define.rs (2)
24-24
: Appropriate exposure of TYPE_LENGTH constant.Making
TYPE_LENGTH
public is appropriate as it's now used by the new string value format implementation across module boundaries.
35-35
: Well-defined constant for string value suffix.The
STRING_VALUE_SUFFIXLENGTH
constant is well-defined as the sum of double timestamp length and suffix reserve length, which matches the string value encoding format implemented instrings_value_format.rs
.src/storage/base_filter.rs (2)
20-24
: Implementation of CompactionFilter name method.The
name
method is correctly implemented to provide a C-style string name for the filter. This is required by the RocksDB API.
73-86
: CompactionFilterFactory implementation looks good.The implementation of
CompactionFilterFactory
is correct and follows the RocksDB API requirements, creating a new filter instance and providing a name for the factory.src/storage/strings_value_format.rs (2)
27-39
:ttl
unit is ambiguous – microseconds vs seconds
set_relative_timestamp
addsttl
toUtc::now().timestamp_micros()
, implying that the caller must supply micro‑seconds:self.etime = Utc::now().timestamp_micros() as u64 + ttl;Most APIs expose TTL in seconds. A mismatch will either expire keys ~1 000 000× later than intended or instantly.
Please document the expected unit and consider enforcing seconds for ergonomics:
- fn set_relative_timestamp(&mut self, ttl: u64) { - self.etime = Utc::now().timestamp_micros() as u64 + ttl; + /// `ttl_secs` – time‑to‑live in seconds + fn set_relative_timestamp(&mut self, ttl_secs: u64) { + self.etime = + (Utc::now().timestamp_micros() as u64) + ttl_secs * 1_000_000; }
131-136
: The time base forcurrent_time
should matchetime
filter_decision
comparesetime
—encoded in micro‑seconds—to acurrent_time
value that the compaction filter passes in. Verify that the caller also supplies micro‑seconds; RocksDB’sCompactionFilter::FilterV2
typically deals in seconds.If units differ, entries may never be purged. Please add a helper to fetch
Utc::now().timestamp_micros()
and use it consistently, or convert inside this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (1)
src/storage/strings_value_format.rs (1)
79-109
:⚠️ Potential issueType check needed in ParsedStringsValue::new
The method converts the first byte into a
DataType
but doesn't verify that it's actually aDataType::String
. This could lead to incorrect parsing of non-string values, causing field offsets to be computed on the wrong layout and producing incorrect data.Add a type check:
let data = Bytes::copy_from_slice(input_data); let data_type = DataType::try_from(data[0])?; + if data_type != DataType::String { + return Err(super::error::StorageError::InvalidFormat( + format!("Expected string type, got {:?}", data_type) + )); + }
🧹 Nitpick comments (6)
src/storage/strings_value_format.rs (6)
115-117
: Remove unnecessary Range::clone in accessorsThe
clone()
operation on the range is unnecessary and creates a new Range object on each method call.fn user_value(&self) -> &[u8] { - &self.data[self.user_value_range.clone()] + &self.data[self.user_value_range.clone()] }
127-129
: Remove unnecessary Range::clone in reserve methodSimilar to the user_value method, the clone is unnecessary here.
fn reserve(&self) -> &[u8] { - &self.data[self.reserve_range.clone()] + &self.data[self.reserve_range.clone()] }
20-25
: Add documentation for timestamp fieldsThe ctime and etime fields lack documentation explaining what they represent and their units (microseconds since epoch).
#[derive(Debug, Clone)] pub struct StringValue { user_value: Bytes, + /// Expiration time in microseconds since epoch (0 means no expiration) etime: u64, + /// Creation time in microseconds since epoch ctime: u64, }
28-29
: Address TODO commentThere's a TODO comment to remove the dead code attribute. Either remove the attribute if the function is now being used, or add a more specific TODO with a ticket reference.
- /// TODO: remove allow dead code - #[allow(dead_code)] + // This function will be used when implementing string operations
44-46
: Check for potential overflow in buffer size calculationThe buffer size calculation does not check for potential integer overflow when adding the lengths.
- let needed = - TYPE_LENGTH + self.user_value.len() + SUFFIX_RESERVE_LENGTH + 2 * TIMESTAMP_LENGTH; + let needed = TYPE_LENGTH + .checked_add(self.user_value.len()) + .and_then(|sum| sum.checked_add(SUFFIX_RESERVE_LENGTH)) + .and_then(|sum| sum.checked_add(2 * TIMESTAMP_LENGTH)) + .ok_or_else(|| panic!("Buffer size calculation overflow"))?;
51-52
: Document endianness assumptionsThe code uses little-endian encoding/decoding methods but doesn't document this assumption explicitly.
+ // Store timestamps in little-endian format buf.put_u64_le(self.ctime); buf.put_u64_le(self.etime);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/storage/base_filter.rs
(1 hunks)src/storage/mod.rs
(1 hunks)src/storage/storage_define.rs
(2 hunks)src/storage/strings_value_format.rs
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- src/storage/mod.rs
- src/storage/storage_define.rs
- src/storage/base_filter.rs
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: cargo miri (macos-latest)
- GitHub Check: cargo miri (windows-latest)
- GitHub Check: cargo miri (ubuntu-latest)
🔇 Additional comments (1)
src/storage/strings_value_format.rs (1)
131-136
:❌ Incorrect review comment
Verify data type in filter_decision method
The filter_decision method doesn't check that the data type is String before applying expiration logic, which might lead to incorrect compaction decisions for non-string types.
🏁 Script executed:
#!/bin/bash # Check if filter_decision is called on non-string types rg -A 5 "filter_decision" src/Length of output: 1351
We need full context from
src/storage/base_filter.rs
—particularly how differentValueFormat
implementations (includingStringsValueFormat
) are selected and howfilter_decision
is invoked. This will confirm whether non‐string types ever invoke the string‐specific expiration logic. Please run:Then we can determine if an explicit type check is necessary.
🏁 Script executed:
#!/bin/bash # Show how filter_decision is dispatched in base_filter.rs sed -n '1,200p' src/storage/base_filter.rsLength of output: 3427
filter_decision only applies to strings
The expiration logic in
ParsedStringsValue::filter_decision
is only ever invoked whenBaseMetaFilter
has already matchedDataType::String
(see thematch data_type
inbase_filter.rs
). Non‑string values never reach this code path, so no additional type check is needed insidefilter_decision
.Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check comment
@guozhihao-224 代码冲突了 需要解决一下 |
ba682d4
to
cd40e78
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (8)
Cargo.toml
(1 hunks)src/storage/base_filter.rs
(1 hunks)src/storage/base_value_format.rs
(3 hunks)src/storage/coding.rs
(3 hunks)src/storage/error.rs
(1 hunks)src/storage/mod.rs
(1 hunks)src/storage/storage_define.rs
(2 hunks)src/storage/strings_value_format.rs
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- src/storage/storage_define.rs
🚧 Files skipped from review as they are similar to previous changes (6)
- src/storage/mod.rs
- src/storage/error.rs
- Cargo.toml
- src/storage/coding.rs
- src/storage/base_filter.rs
- src/storage/strings_value_format.rs
⏰ Context from checks skipped due to timeout of 90000ms (6)
- GitHub Check: cargo miri (macos-latest)
- GitHub Check: cargo miri (ubuntu-latest)
- GitHub Check: cargo miri (windows-latest)
- GitHub Check: cargo clippy (ubuntu-latest)
- GitHub Check: cargo clippy (windows-latest)
- GitHub Check: cargo clippy (macos-latest)
🔇 Additional comments (4)
src/storage/base_value_format.rs (4)
16-22
: Good choice of structured imports.The imports are well-organized, bringing in the necessary external dependencies (
bytes::BytesMut
,rocksdb::CompactionDecision
) and the project's error types. This approach clearly separates external and internal dependencies.
26-26
: Good practice adding standard derivations to DataType enum.Adding these derive traits (
Debug
,Clone
,Copy
,PartialEq
,Eq
) to theDataType
enum follows Rust best practices for type definitions. This provides important functionality for debugging, comparisons, and using the enum in various contexts.
74-81
:❓ Verification inconclusive
Good abstraction with the InternalValue trait.
The transition to a trait-based approach for value encoding and timestamp management provides better abstraction and flexibility. This will allow for different implementations tailored to specific data types.
I notice there's a TODO comment about removing the
#[allow(dead_code)]
annotation. Do you have a plan for addressing these TODOs?
🏁 Script executed:
#!/bin/bash # Count all #[allow(dead_code)] annotations with TODOs in the codebase rg "#\[allow\(dead_code\)\]" -B 1 --glob "*.rs" | grep -c "TODO"Length of output: 67
Plan for removing
#[allow(dead_code)]
annotations?I’ve found that this trait definition in src/storage/base_value_format.rs (lines 74–81) still carries a
// TODO: remove allow dead code
and, in total, there are 38 such annotations across the repository. It’d be great to know how you plan to track and remove these once the implementations land—e.g., opening an issue to clean them up, relying on coverage/lints, or another process.• src/storage/base_value_format.rs (lines 74–81)
• 38 occurrences of// TODO: remove allow dead code
in*.rs
filesLet me know your strategy for cleaning up these dead‐code allowances!
83-96
:✅ Verification successful
ParsedInternalValue trait provides good flexibility.
The trait-based approach for parsed values provides a clean interface with methods for compaction decisions, accessing data types, and timestamp management. The default implementation for
version()
is a nice touch for backward compatibility.Based on previous review comments, there was concern about validation in the
ParsedInternalValue::new
method. While the concrete implementation isn't shown here, make sure all implementations perform proper validation in both debug and release builds.
🏁 Script executed:
#!/bin/bash # Find implementations of the ParsedInternalValue trait rg "impl\s+ParsedInternalValue<" --glob "*.rs" -A 20 | grep -B 5 -A 15 "fn new"Length of output: 81
🏁 Script executed:
#!/bin/bash # Broad search for any impl of ParsedInternalValue to locate fn new implementations rg -n "ParsedInternalValue<'a> for" --glob "*.rs" -A5 # Fallback search for any impl blocks mentioning the trait rg -n "impl.*ParsedInternalValue" --glob "*.rs" -A5Length of output: 1290
ParsedInternalValue trait provides good flexibility.
The trait-based approach for parsed values provides a clean interface with methods for compaction decisions, accessing data types, and timestamp management. The default implementation for
version()
is a nice touch for backward compatibility.I verified that
ParsedStringsValue::new
(in src/storage/strings_value_format.rs) uses both adebug_assert!
and a runtimeif
check returning an error for invalid input length, ensuring proper validation in both debug and release builds. Continue applying this pattern to any other implementations to guarantee robust error handling.
@marsevilspirit 这里合并merge pr时,是不是能够将多个commit合并成一个(Squash and Merge) |
yes. |
这是pr完成了吗? |
是的。base_data_key_format 和 base_filter里面的BaseDataFilter我单独另外提一个PR吧 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Summary by CodeRabbit
New Features
Refactor
Chores