Consumer timestamp search using metadata implementation #43

jeffxiang · 2025-11-03T23:30:38Z

offsetsForTimes() is now implemented using the metadata file in S3 to perform a more efficient search process. The mechanism is as follows:

Retrieve metadata file from S3 for the particular topic-partition
Get the topic-partition's timeindex from the metadata, which provides each segment's last modified timestamp on Kafka
Locate the segment with the greatest last-modified timestamp less than or equal to the target timestamp. This gives us the earliest segment that we should search from. For example, if we have:

000100.log --> timestamp = 100
000200.log --> timestamp = 200
000300.log --> timestamp = 300
...

and we query offsetsForTimes(timestamp=250), we start searching from 000200.log. Even though 000200.log doesn't contain our target offset (the last modified timestamp of this segment is 200 while we're searching for timestamp=250), we still want to start our search here in case the timeindex has a gap immediately following this segment.

We start searching linearly from this segment onward by loading each segment's .timeindex file and looking at the last entry, corresponding to the last record in that segment. If that entry's timestamp is less than the target timestamp, we continue onto the next segment. Usually, the very next segment is the target segment unless there is a gap in the metadata timeindex entries
If the last entry in the segment timeindex file is greater than or equal to the target timestamp, we have found the segment
Once we have located the segment, we extract the greatest segment timeindex entry which is less than or equal to the target timestamp
After locating this segment timeindex entry, we perform a linear scan forward in the records starting from this offset, to locate the exact offset which is the first offset with a timestamp >= the target timestamp

vahidhashemian · 2025-11-20T01:17:20Z

ts-consumer/src/main/java/com/pinterest/kafka/tieredstorage/consumer/S3PartitionConsumer.java

+        S3Records segmentRecords = null;
+        try {
+            segmentRecords = S3Records.open(
+                    logObject.getLeft(),
+                    logObject.getMiddle(),
+                    startPosition,
+                    false,
+                    true,
+                    logObject.getRight().intValue(),
+                    true);
+
+            Iterator<S3ChannelRecordBatch> batches = segmentRecords.batchesFrom(startPosition).iterator();
+            while (batches.hasNext()) {
+                S3ChannelRecordBatch batch = batches.next();
+                for (Record record : batch) {
+                    if (record.timestamp() >= targetTimestamp) {
+                        return Optional.of(new OffsetAndTimestamp(record.offset(), record.timestamp(), Optional.empty()));
+                    }
+                }
+            }
+        } catch (IOException e) {


Any chance of resource leak in case of an exception? Should this be a try-with-resources block instead?

in the finally block we close the segmentRecords

vahidhashemian · 2025-11-20T01:19:37Z

ts-consumer/src/main/java/com/pinterest/kafka/tieredstorage/consumer/S3PartitionConsumer.java

+            return Optional.empty();
+        }
+
+        TimeIndex timeIndex = metadataOptional.get().getTimeIndex();


Do we need a null check for metadataOptional.get() to avoid NPE?

we never return a null for the metadataOptional or put a null value into the Optional<TopicPartitionMetadata>. The only check needed is Optional.isPresent() which is done above.

vahidhashemian · 2025-11-20T01:21:53Z

ts-consumer/src/main/java/com/pinterest/kafka/tieredstorage/consumer/S3PartitionConsumer.java

+     * 2. Find the largest timestamp in the segment's timeindex file
+     * 3. If the largest timestamp in the segment's timeindex file is less than the target timestamp, we continue to the next segment.
+     * 4. If the largest timestamp in the segment's timeindex file is greater than or equal to the target timestamp, we have found the segment.
+     * 5. Find the segment timeindex entry with a timestamp less than or equal to the target timestamp.


For step 5, do we perform a binary search?

Implicitly, yes. We do this by putting the entries into a sorted ConcurrentSkipListSet within the TimeIndex.load() method and then calling ConcurrentSkipListSet.floor() method. This is done in https://github.com/pinterest/tiered-storage/blob/consumer_seek/ts-common/src/main/java/com/pinterest/kafka/tieredstorage/common/metadata/TimeIndex.java#L296

ts-consumer/src/main/java/com/pinterest/kafka/tieredstorage/consumer/S3Utils.java

Consumer seek using metadata implementation

e5786fe

jeffxiang requested a review from a team as a code owner November 3, 2025 23:30

jeffxiang added 3 commits November 3, 2025 19:43

Remove doc typo

93c52cc

Fix logic in offsetsForTimes() to combine s3 and kafka results

4114429

Add unit tests

995b166

jeffxiang changed the title ~~Consumer seek using metadata implementation~~ Consumer timestamp search using metadata implementation Nov 20, 2025

vahidhashemian reviewed Nov 20, 2025

View reviewed changes

vahidhashemian approved these changes Nov 20, 2025

View reviewed changes

jeffxiang merged commit a19031f into main Nov 20, 2025
1 check passed

jeffxiang deleted the consumer_seek branch November 20, 2025 03:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consumer timestamp search using metadata implementation #43

Consumer timestamp search using metadata implementation #43

Uh oh!

jeffxiang commented Nov 3, 2025 •

edited

Loading

Uh oh!

vahidhashemian Nov 20, 2025

Uh oh!

jeffxiang Nov 20, 2025

Uh oh!

vahidhashemian Nov 20, 2025

Uh oh!

jeffxiang Nov 20, 2025

Uh oh!

vahidhashemian Nov 20, 2025

Uh oh!

jeffxiang Nov 20, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Consumer timestamp search using metadata implementation #43

Consumer timestamp search using metadata implementation #43

Uh oh!

Conversation

jeffxiang commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vahidhashemian Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

jeffxiang Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

vahidhashemian Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

jeffxiang Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

vahidhashemian Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

jeffxiang Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jeffxiang commented Nov 3, 2025 •

edited

Loading