NIFI-15027 - adjust AvroWriter handling of invalid payloads; ConsumeKafka impact #10366

greyp9 · 2025-10-02T00:01:12Z

Summary

NIFI-15027

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Apache NiFi Jira issue created

Pull Request Tracking

Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

Pull Request based on current revision of the main branch
Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

Build completed using ./mvnw clean install -P contrib-check
- JDK 21
- JDK 25

Licensing

New dependencies are compatible with the Apache License 2.0 according to the License Policy
New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

Documentation formatting appears as expected in rendered files

…eKafka impact

jrsteinebrey · 2025-10-02T15:21:57Z

...ord-serialization-services/src/main/java/org/apache/nifi/avro/WriteAvroResultWithSchema.java

+        try {
+            dataFileWriter.append(rec);
+        } catch (final DataFileWriter.AppendWriteException e) {
+            throw new IOException(e);


Thanks for working on this ticket.
This changed line breaks other writeRecord() callers who explicitly catch DataFileWriter.AppendWriteException like this example
https://github.com/jrsteinebrey/nifi/blob/b0f29ef94e95be8160ec2cd5fbdfbef373451f90/nifi-extension-bundles/nifi-extension-utils/nifi-database-utils/src/main/java/org/apache/nifi/util/db/JdbcCommon.java#L466
They would need to be changed to catch IOException instead of AppendWriteException.
Instead of this change here in WriteAvroResultWithSchema.java,
I suggest that you consider changing the Kafka code here
https://github.com/apache/nifi/blob/1457950040d0fe86ade53770def6c5a95b6f0252/nifi-extension-bundles/nifi-kafka-bundle/nifi-kafka-processors/src/main/java/org/apache/nifi/kafka/processors/consumer/convert/AbstractRecordStreamKafkaMessageConverter.java#L112-L120\
to catch (Exception) instead of specific exception classes. Then the ticket is resolved and any future created exception classes also route to failure.

That's reasonable; thanks.

I'm not familiar with the reason for the "catch all" in AbstractRecordStreamKafkaMessageConverter.

To me, the problem seems to be that the Avro writer implementation throws a particular exception (class) that is not visible in the classpath of the Kafka implementation. So we can't act based on that particular exception.

Another variation would be for AvroWriter to throw MalformedRecordException instead of IOException, as that better conveys the particular problem (bad data).

There are potential side effects to either of these potential paths forward; hopefully others in the community will chime in.

I think I'd also go with the change only in the Kafka class where we would catch all exceptions to route to parse failure. Thoughts @exceptionfactory @markap14 ?

Reviewing the call structure, I favor the proposed approach that catches the AppendWriteException and throws something more specific. Wrapping it and throwing an IOException seems appropriate based on the description of AppendWriteException, although I would add a message to the IOException.

For broader context, the JdbcCommon handling of dataWriter.append() is not directly related, and in that case, catching AppendWriteException only serves to allow for more specific exception messaging.

The contract of RecordReaderFactory.createRecordReader() defines the three checked exceptions, which the KafkaMessageConverter handles as parse failures. Any other exceptions propagate to ConsumeKafka.onTrigger(), where the transaction is rolled back. For this reason, catching a general Exception as a parse failure could mask other issues that indicate a programming bug, versus a problem with the record or schema.

Thanks @jrsteinebrey @pvillard31 @exceptionfactory for your input!

I propose making this update to the changeset:

Another variation would be for AvroWriter to throw MalformedRecordException instead of IOException, as that better conveys the particular problem (bad data).

Does that work for everyone?

Although MalformedRecordException is the most precise, it does not align with the writeRecord method signature, since MalformedRecordException extends the base Exception class.

Sure, that makes sense. I will leave things as is.

Thanks, with that determined, the only other change I recommend is including a message for the IOException, such as Failed to write Avro Record.

jrsteinebrey · 2025-10-06T20:25:00Z

...ord-serialization-services/src/main/java/org/apache/nifi/avro/WriteAvroResultWithSchema.java

+        try {
+            dataFileWriter.append(rec);
+        } catch (final DataFileWriter.AppendWriteException e) {
+            throw new IOException(e);


Non-binding: I am good with IOException being thrown here like @exceptionfactory recommended.

exceptionfactory · 2025-10-07T21:06:16Z

...integration/src/test/java/org/apache/nifi/kafka/processors/ConsumeKafkaRecordWithNullIT.java

+    private static final String RESOURCE_AVRO_SCHEMA_NULLABLE = "src/test/resources/org/apache/nifi/kafka/reader/schemaNullable.avsc.json";
+    private static final String RESOURCE_AVRO_SCHEMA_REQUIRED = "src/test/resources/org/apache/nifi/kafka/reader/schemaRequired.avsc.json";


It might be useful to make these JSON files multiline strings within the test class, but I will defer to the current implementation if you prefer to leave it as is.

Thanks; my opinion is that embedded multiline resources (especially JSON) can be harder to read when the needed escapes are present. So, I'd like to retain the current implementation of those.

Yes, I agree that escaped JSON is harder to read. Multiline strings do not need escaping, which is the reason for the suggestion, but I'm fine with leaving the current approach for now.

Maybe I misunderstood? Are you saying that lines 64, 65 would be better as four lines?

I meant that the JSON could be defined as follows with a multiline string:

private static final String SCHEMA_JSON = """ { "name": "test", "type": "record", "fields": [ { "name": "text", "type": "string" }, { "name": "ordinal", "type": "long" } ] } """;

TIL! I'll make that change.

exceptionfactory · 2025-10-08T19:02:24Z

Thanks for the updates to the tests @greyp9, if you can just add a message to the IOException, this looks ready to go.

CDPDFX-15027 - adjust AvroWriter handling of invalid payloads; Consum…

71a9f3a

…eKafka impact

pvillard31 changed the title ~~CDPDFX-15027 - adjust AvroWriter handling of invalid payloads; ConsumeKafka impact~~ NIFI-15027 - adjust AvroWriter handling of invalid payloads; ConsumeKafka impact Oct 2, 2025

jrsteinebrey reviewed Oct 2, 2025

View reviewed changes

greyp9 requested review from pvillard31, jrsteinebrey and exceptionfactory October 6, 2025 18:13

jrsteinebrey approved these changes Oct 6, 2025

View reviewed changes

exceptionfactory reviewed Oct 7, 2025

View reviewed changes

embed AVRO schemas in IT; remove external resources

1de2aed

		private static final String RESOURCE_AVRO_SCHEMA_NULLABLE = "src/test/resources/org/apache/nifi/kafka/reader/schemaNullable.avsc.json";
		private static final String RESOURCE_AVRO_SCHEMA_REQUIRED = "src/test/resources/org/apache/nifi/kafka/reader/schemaRequired.avsc.json";

NIFI-15027 - adjust AvroWriter handling of invalid payloads; ConsumeKafka impact #10366

Are you sure you want to change the base?

NIFI-15027 - adjust AvroWriter handling of invalid payloads; ConsumeKafka impact #10366

Conversation

greyp9 commented Oct 2, 2025

Summary

Tracking

Issue Tracking

Pull Request Tracking

Pull Request Formatting

Verification

Build

Licensing

Documentation

Uh oh!

jrsteinebrey Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jrsteinebrey Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

exceptionfactory Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

exceptionfactory commented Oct 8, 2025

Uh oh!

Uh oh!

jrsteinebrey Oct 2, 2025 •

edited

Loading

jrsteinebrey Oct 6, 2025 •

edited

Loading

exceptionfactory Oct 8, 2025 •

edited

Loading