You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am developing an Apache Storm (v2.5.0) topology that reads events from a spout (BaseRichSpout), counts the number of events in tumbling windows (BaseWindowedBolt), and prints the count (BaseRichBolt). The topology works fine, but there are some out-of-order events in my dataset. The BaseWindowedBolt provides withLateTupleStream method to route late events to a separate stream. However, when I try to process late events, I get a serialization exception:
Caused by: com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Class is not registered: org.apache.storm.shade.com.google.common.collect.SingletonImmutableBiMap
Note: To register thisclass use: kryo.register(org.apache.storm.shade.com.google.common.collect.SingletonImmutableBiMap.class);
Serialization trace:
defaultResources (org.apache.storm.task.WorkerTopologyContext)
context (org.apache.storm.tuple.TupleImpl)
at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:101) ~[kryo-4.0.2.jar:?]
at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508) ~[kryo-4.0.2.jar:?]
at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575) ~[kryo-4.0.2.jar:?]
at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:79) ~[kryo-4.0.2.jar:?]
at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508) ~[kryo-4.0.2.jar:?]
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651) ~[kryo-4.0.2.jar:?]
at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) ~[kryo-4.0.2.jar:?]
at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) ~[kryo-4.0.2.jar:?]
at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:557) ~[kryo-4.0.2.jar:?]
at org.apache.storm.serialization.KryoValuesSerializer.serializeInto(KryoValuesSerializer.java:38) ~[storm-client-2.5.0.jar:2.5.0]
at org.apache.storm.serialization.KryoTupleSerializer.serialize(KryoTupleSerializer.java:40) ~[storm-client-2.5.0.jar:2.5.0]
at org.apache.storm.daemon.worker.WorkerState.checkSerialize(WorkerState.java:613) ~[storm-client-2.5.0.jar:2.5.0]
at org.apache.storm.executor.ExecutorTransfer.tryTransferLocal(ExecutorTransfer.java:101) ~[storm-client-2.5.0.jar:2.5.0]
at org.apache.storm.executor.ExecutorTransfer.tryTransfer(ExecutorTransfer.java:66) ~[storm-client-2.5.0.jar:2.5.0]
at org.apache.storm.executor.LocalExecutor$1.tryTransfer(LocalExecutor.java:36) ~[storm-client-2.5.0.jar:2.5.0]
at org.apache.storm.executor.bolt.BoltOutputCollectorImpl.boltEmit(BoltOutputCollectorImpl.java:112) ~[storm-client-2.5.0.jar:2.5.0]
at org.apache.storm.executor.bolt.BoltOutputCollectorImpl.emit(BoltOutputCollectorImpl.java:65) ~[storm-client-2.5.0.jar:2.5.0]
at org.apache.storm.task.OutputCollector.emit(OutputCollector.java:93) ~[storm-client-2.5.0.jar:2.5.0]
at org.apache.storm.task.OutputCollector.emit(OutputCollector.java:93) ~[storm-client-2.5.0.jar:2.5.0]
at org.apache.storm.task.OutputCollector.emit(OutputCollector.java:42) ~[storm-client-2.5.0.jar:2.5.0]
at org.apache.storm.topology.WindowedBoltExecutor.execute(WindowedBoltExecutor.java:313) ~[storm-client-2.5.0.jar:2.5.0]
at org.apache.storm.executor.bolt.BoltExecutor.tupleActionFn(BoltExecutor.java:212) ~[storm-client-2.5.0.jar:2.5.0]
at org.apache.storm.executor.Executor.accept(Executor.java:294) ~[storm-client-2.5.0.jar:2.5.0]
... 6 more
I have defined my topology as below:
publicclass TestTopology {
publicstatic void main (String[] args) throws Exception {
Config config = new Config();
config.put(Config.TOPOLOGY_TESTING_ALWAYS_TRY_SERIALIZE, true);
config.registerSerialization(TupleImpl.class);
config.registerSerialization(WorkerTopologyContext.class);
config.registerSerialization(Fields.class);
LocalCluster cluster = new LocalCluster();
And `PrintBolt` just prints the `windowBolt` output. (`LatePrintBolt` is similar)
If I don't set the `LatePrintBolt` in `TopologyBuilder`, I get the correct results
However, when I try to print lateEvents stream, I get the same output but on the first late event, I get the above-mentioned exception.
I have debugged the issue. When WindowedBoltExecutor receives a late tuple, it emits the late tuple but BoltOutputCollectorImpl rewraps it in a new Tuple. Now, this new tuple contains WorkerTopologyContext, which is not serializable, hence the error.
Thank you for your response and sorry for the late response, I have been away from my machine.
I just checked it, and the new version does not solve the problem.
As far as I understood, the problem is not with serialization but with the wrong implementation of the late tuple management. The input in WindowBoltExecutor is already a Tuple. The tuple contains WorkerTopologyContext, which is not serializable (some volatile attributes). Hence, the error. In my opinion, we should change the line to
I have been trying to fix this bug for the past two days. Upon closer inspection, I found that it is a much bigger problem than just changing the parameters of the emit function.
We cannot change it to input.getValues() as we define only one output field. By design, it expects a tuple. However, a tuple can never be serialized due to some volatile attributes. Hence, lateTupleStream will only work when there is no serialization.
I think we need the input of original authors kosii and arunm on how to solve this bug.
Given the late community health and the discussion of moving to the attic earlier this year, I doubt, that there will be much traction from the original authors. If you can think of a good solution, feel free to provide a PR or send a mail to the dev@ list to discuss a proposal in more depth.
I am developing an Apache Storm (v2.5.0) topology that reads events from a spout (BaseRichSpout), counts the number of events in tumbling windows (BaseWindowedBolt), and prints the count (BaseRichBolt). The topology works fine, but there are some out-of-order events in my dataset. The BaseWindowedBolt provides withLateTupleStream method to route late events to a separate stream. However, when I try to process late events, I get a serialization exception:
I have defined my topology as below:
Where `LateEventSpout` is
And `WindowBolt` is:
And `PrintBolt` just prints the `windowBolt` output. (`LatePrintBolt` is similar)
If I don't set the `LatePrintBolt` in `TopologyBuilder`, I get the correct results
However, when I try to print lateEvents stream, I get the same output but on the first late event, I get the above-mentioned exception.
I have debugged the issue. When WindowedBoltExecutor receives a late tuple, it emits the late tuple but BoltOutputCollectorImpl rewraps it in a new Tuple. Now, this new tuple contains WorkerTopologyContext, which is not serializable, hence the error.
Originally reported by jawadtahir, imported from: Processing late tuples from BaseWindowedBolt results in serialization exception
The text was updated successfully, but these errors were encountered: