[pull] dev from redpanda-data:dev #39

pull · 2023-08-07T19:23:33Z

See Commits and Changes for more details.

Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

Current version fails to build on Fedora 41. Co-authored-by: Noah Watkins <[email protected]>

cloud_storage: Move s3_imposter to cloud_io

CORE-8937 admin: swagger docs for patch cluster config body

bazel: Update krb5

Check that with redpanda.iceberg.delete=false old table data remains available even before we recreate the topic.

And switch back to normal admin after disruptions are over.

add log lines, fix typos

if we unmount the topic before this table may lack metadata

Introduce "offline mode" that cuts all ties to the topic in Redpanda cluster. It carries on querying the query engine and verifying results using info cached before going into offline mode.

for to make functionality is tested while topic is being actively used

Make it possible to configure the number of messages produced by stream

Add scenarios: 1) On unmount all messages that made their way to the topic eventually become available via query engine 2) Upon remount and further produce both old and new messages are in the topic and in the table

to prevent archiver shutdown while waiting

This is mostly to preserve iceberg properties, but also to make sure any newly introduced topic properties are preserved by default.

Allows to use it for subscriptions where feedback from a called function is necessary, such as a future or an error code. All functions are supposed to return the same type.

Make offset_monitor more universal so that it can be used for different data types.

Also create and subscribe one of these actions: flush data to cloud.

Adds the book-keeping variables `_dirty/closed_segment_bytes` to `disk_log_impl`, as well as some getter/setter functions. These functions will be used throughout `disk_log_impl` where required (segment rolling, compaction, segment eviction) to track the bytes contained in dirty and closed segments.

Uses the added functions `update_dirty/closed_segment_bytes()` in the required locations within `disk_log_impl` in order to bookkeep the dirty ratio. Bytes can be either removed or added by rolling new segments, compaction, and retention enforcement.

We were missing a type check in `_make_java_properties()` that would cause CDT to fail for other cloud storage providers.

pandaproxy/sr: Relax restrictions for bundled json schemas

Update and enable large_messages_test (LMT)

This seems to already happen and isn't needed.

Sort of amazing, but I found this and it helps improve caching because debug symbols are now relative to the redpanda repo instead of using absolute paths. I've been using this for a few days and it's been great.

`parse_rest_error_response` method tries to read fields from xml response and constructs `rest_error_response`. If a field is not found or is empty then it defaults to empty string. https://github.com/redpanda-data/redpanda/blob/d3c2f00c4071c2cbce1e1babdfc2291e3c9898ba/src/v/cloud_storage_clients/s3_client.cc#L411 Google Cloud Storage gives one of these replies which we parse but the Error.Code path is not present in the response and we trip with a bad lexical cast which results in an error log line. With this commit we'll default to unknown error code in that case. We already do the same for codes we don't recognize in the operator>>. lexical_cast does not call the operator>> at all for empty strings.

In CI this test was pretty close to timeout before switching to debug mode seastar. Now it became flaky. Increase the timeout.

bazel: use relative paths for debug symbols

…est-bump-timeout

`storage`: book keep `dirty_ratio` in `disk_log_impl`

rpk/chore: Bump Go dependencies.

The Java implementation expects snapshot removal updates to be serialized one at a time[1]. This meant that with Java REST catalogs, we could trigger catalog-side errors like: 2025-02-10T19:09:53.865 WARN [org.eclipse.jetty.server.HttpChannel] - /v1/namespaces/redpanda/tables/test java.lang.IllegalArgumentException: Invalid set of snapshot ids to remove. Expected one value but received: [2205058756266803285, 6389287107599228031, 837858603806954013, 8429544376017231169] at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkArgument(Preconditions.java:220) at org.apache.iceberg.MetadataUpdateParser.readRemoveSnapshots(MetadataUpdateParser.java:530) at org.apache.iceberg.MetadataUpdateParser.fromJson(MetadataUpdateParser.java:300) at org.apache.iceberg.rest.requests.UpdateTableRequestParser.lambda$fromJson$2(UpdateTableRequestParser.java:105) at java.base/java.lang.Iterable.forEach(Iterable.java:75) at org.apache.iceberg.rest.requests.UpdateTableRequestParser.fromJson(UpdateTableRequestParser.java:105) at org.apache.iceberg.rest.RESTSerializers$UpdateTableRequestDeserializer.deserialize(RESTSerializers.java:354) at org.apache.iceberg.rest.RESTSerializers$UpdateTableRequestDeserializer.deserialize(RESTSerializers.java:349) at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323) at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4825) at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3785) at org.apache.iceberg.rest.RESTCatalogServlet$ServletRequestContext.from(RESTCatalogServlet.java:179) at org.apache.iceberg.rest.RESTCatalogServlet.doPost(RESTCatalogServlet.java:78) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:554) This updates the snapshot removal action to return its removed snapshots individually. Note that this doesn't mean there are multiple calls to the catalog for removal, only that we serialize multiple lists each with a single removal instead of a single list with multiple removals. [1] https://github.com/apache/iceberg/blob/3e6da2e5437ffb3f643275927e5580cb9620256b/core/src/main/java/org/apache/iceberg/MetadataUpdateParser.java#L550-L553

Parameterizes test_remove_expired_snapshots on catalog type. This would have caught an incompatibility in which we serialized multiple snapshots per removal update, when the Java impl expected multiple removal updates, each with single snapshot[1]. This required a change to the Spark call to see snapshots, since Spark's REST catalog integration doesn't appear to support the "{table_name}.snapshot" system table. [1] https://github.com/apache/iceberg/blob/3e6da2e5437ffb3f643275927e5580cb9620256b/core/src/main/java/org/apache/iceberg/MetadataUpdateParser.java#L550-L553

…ing-conventions Updating search filter based on new naming conventions

We added a batch_size metric but it has a lantecy_metric label which doesn't make sense for this metric. Remove it.

iceberg: serialize snapshot removals individually

cloud_storage_clients: skip lexical_cast on empty strings

Fixing check azure instances output parsing

kafka-probe: remove latency label on batch_size

Because we now schedule adjacent segment compaction after sliding window compaction, this test was having trouble trying to reach the desired number of segments while producing. Increase the timeout as well as the `log_compaction_interval_ms` to allow the test to reach the desired number of segments.

`rptest`: add credentials type check in `NessieService`

The `datalake_staging` folder was recently moved under the redpanda data directory. It should not be considered as a namespace in `compute_size()`.

…_fix [CORE-8848] `rptest`: adjust compaction settings in `datalake/compaction_test`

pull bot added the ⤵️ pull label Aug 7, 2023

github-actions bot added area/k8s area/build area/rpk area/redpanda area/wasm labels Aug 8, 2023

github-actions bot removed the area/wasm label Nov 13, 2023

github-actions bot added the area/docs label Apr 18, 2024

StephanDollberg and others added 22 commits January 29, 2025 09:49

bazel: Update krb5

3a974e0

Current version fails to build on Fedora 41. Co-authored-by: Noah Watkins <[email protected]>

Merge pull request #24945 from Lazin/pr/move-s3-imposter-to-utils

6529e28

cloud_storage: Move s3_imposter to cloud_io

Merge pull request #24961 from pgellert/fix/patch-cluster-config-doc

00f7b14

CORE-8937 admin: swagger docs for patch cluster config body

Merge pull request #24970 from redpanda-data/stephan/update-bazel

e98e52c

bazel: Update krb5

tests/datalake/e2e: table can be queried when topic is deleted

db34c07

Check that with redpanda.iceberg.delete=false old table data remains available even before we recreate the topic.

tests/migrations: switch to flaky admin before disruption

6e1e0f5

And switch back to normal admin after disruptions are over.

tests/migrations: correct test_listing_inexistent_migration

ef4326a

tests/dl/verifier: improve logging

7a04790

add log lines, fix typos

tests/dl/verifier: add facility to wait until first message via iceberg

8613ac6

if we unmount the topic before this table may lack metadata

tests/dl/verifier: introduce mode for no communication with the topic

4a0d752

Introduce "offline mode" that cuts all ties to the topic in Redpanda cluster. It carries on querying the query engine and verifying results using info cached before going into offline mode.

tests/migrations: separate migration utility functions into a mixin

b5582de

tests/services/connect/stop: option to make sure stream NOT finished

917de67

for to make functionality is tested while topic is being actively used

tests/dl/simple_connect_test: refactor in prep to add un/mount tests

e16b1db

tests/dl/simple_connect_test/rpconnect: configure no of messages

835e90c

Make it possible to configure the number of messages produced by stream

tests/dl/simple_connect_test: test with data migrations

f5a1635

Add scenarios: 1) On unmount all messages that made their way to the topic eventually become available via query engine 2) Upon remount and further produce both old and new messages are in the topic and in the table

c/archival/ntp_archiver_service: hold gate while waiting for flush

fa82488

to prevent archiver shutdown while waiting

c/topic_recovery_service: copy all topic properties, override specific

a7e4e1c

This is mostly to preserve iceberg properties, but also to make sure any newly introduced topic properties are preserved by default.

c/migrations/backend: copy all topic properties, override specific ones

700619c

This is mostly to preserve iceberg properties, but also to make sure any newly introduced topic properties are preserved by default.

utils/notification_list: collect return values

727bfb2

Allows to use it for subscriptions where feedback from a called function is necessary, such as a future or an error code. All functions are supposed to return the same type.

r/offset_monitor: make templated so it can be used for kafka offsets

a6ed359

Make offset_monitor more universal so that it can be used for different data types.

dl/translation/stm: implement waiting for translation of specific offset

a84e7c2

c/partition: to dispatch async flush actions to components

491d67e

Also create and subscribe one of these actions: flush data to cloud.

WillemKauf and others added 30 commits February 7, 2025 02:08

storage: add dirty_ratio test to storage_e2e_test.cc

913c011

rptest: add credentials type check in NessieService

bb6c569

We were missing a type check in `_make_java_properties()` that would cause CDT to fail for other cloud storage providers.

Merge pull request #25020 from IoannisRP/bugs/CORE-8936/json-dialects

904e1ad

pandaproxy/sr: Relax restrictions for bundled json schemas

Merge pull request #24946 from travisdowns/td-large-messages-test

90c713a

Update and enable large_messages_test (LMT)

bazel: remove compiler-rt for sanitizers

f129a39

This seems to already happen and isn't needed.

bazel: use relative paths in debug symbols

6e55cbf

Sort of amazing, but I found this and it helps improve caching because debug symbols are now relative to the redpanda repo instead of using absolute paths. I've been using this for a few days and it's been great.

rptest: add dirty ratio testing to log_compaction_test.py

5665a1e

bazel: bump remote_partition_fuzz_test timeout

2a46820

In CI this test was pretty close to timeout before switching to debug mode seastar. Now it became flaky. Increase the timeout.

rpk/chore: Bump Go dependencies.

82a3051

Merge pull request #25063 from rockwotj/bazely

3d6d7e5

bazel: use relative paths for debug symbols

Merge pull request #25065 from nvartolomei/nv/remote_partition_fuzz_t…

f023704

…est-bump-timeout

Merge pull request #24649 from WillemKauf/dirty_ratio_compaction

c403039

`storage`: book keep `dirty_ratio` in `disk_log_impl`

Merge pull request #25066 from r-vasquez/bump-rpk

bc78b3c

rpk/chore: Bump Go dependencies.

Merge pull request #24981 from redpanda-data/rpdevmp/DEVPROD-2566-nam…

f7d447e

…ing-conventions Updating search filter based on new naming conventions

kafka-probe: remove latency label on batch_size

1cddde7

We added a batch_size metric but it has a lantecy_metric label which doesn't make sense for this metric. Remove it.

Merge pull request #25071 from andrwng/iceberg-individual-snap-removal

dcecfc0

iceberg: serialize snapshot removals individually

Merge pull request #25064 from nvartolomei/nv/cst-gcp-unknown-codes

eebad47

cloud_storage_clients: skip lexical_cast on empty strings

Fixing check azure instances output parsing

b241b32

Merge pull request #25073 from redpanda-data/DEVPROD-2618

b063290

Fixing check azure instances output parsing

Merge pull request #25072 from travisdowns/td-remove-latency-label

4d62c0e

kafka-probe: remove latency label on batch_size

Merge pull request #25058 from WillemKauf/nessie_cloud_storage_fix

01a9c7b

`rptest`: add credentials type check in `NessieService`

rptest: add sanity check to safe_listdir()

b4e6bed

rptest: consider datalake_staging in compute_size()

1849372

The `datalake_staging` folder was recently moved under the redpanda data directory. It should not be considered as a namespace in `compute_size()`.

Merge pull request #25075 from WillemKauf/datalake_compaction_timeout…

c8306cf

…_fix [CORE-8848] `rptest`: adjust compaction settings in `datalake/compaction_test`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] dev from redpanda-data:dev #39

[pull] dev from redpanda-data:dev #39

pull bot commented Aug 7, 2023 •

edited

Loading

[pull] dev from redpanda-data:dev #39

Are you sure you want to change the base?

[pull] dev from redpanda-data:dev #39

Conversation

pull bot commented Aug 7, 2023 • edited Loading

pull bot commented Aug 7, 2023 •

edited

Loading