How does spark calculate the difference between the two pieces of data (kafka data source)?

the data source is kafka, and a field is a timestamp. We want to calculate the difference between the timestamps of the two pieces of data, and then add a new field to store this value and send it out.
I checked. Do you want to reducebykeyandwindow? With this, can you just set the window size to two batchtime? Is it possible that the two windows are too small and then there is no corresponding previous piece of data?
or if I set it to 10 batch time, is the result a difference of 10 times?
is there any other way?


you can use spark DataFrame (or spark sql) windowing function.

Menu