Skip to content

[STREAM-841] Perf improvements for kafka connector#163

Closed
amdox-karl wants to merge 2 commits into
datastax:1.xfrom
amdox-karl:1.x
Closed

[STREAM-841] Perf improvements for kafka connector#163
amdox-karl wants to merge 2 commits into
datastax:1.xfrom
amdox-karl:1.x

Conversation

@amdox-karl

Copy link
Copy Markdown
Contributor

Hi, as requested, related to ticket TS022015413

@RomainAnselin

RomainAnselin commented Jun 3, 2026

Copy link
Copy Markdown

Adding for context: running a profiler of the kafka connector on a table leveraging some UDT showed a large amount of time spent on StructToUDTCodec.externalToInternal with 30% of time spent analysing the metadata. Flame graph available below:
jfr-noex.html

Quoting the OP, the changes can be summarized as follow

Heatmap showed 30% of compute was being spent in mapAndQueueRecord, specifically mapping struct->UDT.
kafka-sink/sink/src/main/java/com/datastax/oss/kafka/sink/KafkaStruct.java

  • added kafkaSchema() accessor for stable schema-key caching.

kafka-sink/sink/src/main/java/com/datastax/oss/kafka/sink/codecs/StructToUDTCodec.java

  • precomputed UDT metadata in fields (size, field names/types/internal names),
  • added schema-keyed plan cache (ConcurrentMap<Schema, StructPlan>),
  • moved expensive schema/type/codec setup into createPlan(...),
  • per-record path now just runs cached bindings

Will follow up with a CQL schema if available

@RomainAnselin

Copy link
Copy Markdown

Adding a CQL DDL example as well as an example of the UDT structure and tables leveraged

-- Structure summary:
--   udt_01:        125 fields (text, int, bigint, boolean, double)
--   table_01:       35 columns + composite PK (4 partition + 3 clustering) + secondary index on one column
--   table_02:       29 columns + composite PK (4 partition + 3 clustering)
--   table_03:        7 columns + composite PK (2 partition + 4 clustering)
--   table_04:        6 columns + composite PK (4 partition + 2 clustering)

udt_structure.txt

Comment thread sink/src/main/java/com/datastax/oss/kafka/sink/KafkaStruct.java Outdated
Comment thread sink/src/main/java/com/datastax/oss/kafka/sink/codecs/StructToUDTCodec.java Outdated
Comment thread sink/src/main/java/com/datastax/oss/kafka/sink/codecs/StructToUDTCodec.java Outdated
…ructPlan → StructToUdtPlan, and switched the plan cache to a bounded Caffeine cache
@amdox-karl amdox-karl requested a review from sandeep-ctds June 16, 2026 18:19
@sandeep-ctds sandeep-ctds marked this pull request as ready for review June 17, 2026 05:45

@sandeep-ctds sandeep-ctds left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making the cache size configurable could be a future improvement.

@amdox-karl

Copy link
Copy Markdown
Contributor Author

Hi @sandeep-ctds I note the integration test failures. I am trying to understand if they are related in any way with the PR content? It does not look like that to me, but please let me know if you need further changes from my side.

@sandeep-ctds

Copy link
Copy Markdown
Collaborator

Hi @sandeep-ctds I note the integration test failures. I am trying to understand if they are related in any way with the PR content? It does not look like that to me, but please let me know if you need further changes from my side.

They are not related. We are trying to mitigate them.

@sandeep-ctds

Copy link
Copy Markdown
Collaborator

As repository variables are not accessible to PRs from forked repositories, the PR checks are failing as expected.
We will raise an internal PR with the same changes and verify if the checks pass there.

@sandeep-ctds sandeep-ctds changed the title Perf improvements for kafka connector [STREAM-841] Perf improvements for kafka connector Jun 19, 2026
@sandeep-ctds

Copy link
Copy Markdown
Collaborator

Closing this in favour of the internal PR with same changes.
#166

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants