I am trying to improve an event-driven processing system, which is having some problems because events are not guaranteed to arrive at the correct chronological sequence. This is due to the batch processes and the upstream caching that is currently out of my control.
The sequence errors are not a complete disaster for my processor, mainly because it is oriented to FSM and, therefore, faces "strange" transitions and, for the most part, ignores them. In addition, events are marked with the time and are expected to be delayed, so we are used to reconstructing the story. However, there are some cases that cause unwanted behavior or unnecessary double processing.
I'm looking for a technique that can make a queue, group events and sort them before processing them, or maybe identify "transactions" in the flow.
An example of a correct flow of events would be like this, taking into account that all events have the timestamp:
10:01 User login (user A) 10:02 device event 1 10:03 device event 2 10:10 End of user session (user A) 10:15 User login (user A) 10:16 End of user session (user A) 10:32 User session start (user B) 10:34 event device 3 10:35 device event 4 10:50 End of user session (user B)
My subsequent processor keeps track of who is using a device, but also needs to correlate the other events on the device with the users. It does this by maintaining the state of the sessions, while receiving the other events.
Each event is processed in practice by different workers in the message queue, with a central database. So there are potential career risks, but that is not the focus of this question.
The problems arise when the same flow arrives in this way, where … they indicate the gaps between the three "lots", since they are received much later.
10:10 End of user session (user A) 10:01 User login (user A) 10:02 device event 1 10:03 device event 2 ... 10:16 End of user session (user A) 10:15 User login (user A) 10:34 event device 3 10:35 device event 4 ... 10:50 End of user session (user B) 10:32 User session start (user B)
I am particularly interested in "the final event of the device in a session". So here I need the session 10:10 and + the event 10:03 Device 2, to complete the image. I know that any device event scheduled between 10:01 and 10:10 is "owned" by user A, therefore, when I receive event 2 of the device, I can correlate it – OK. When I receive the start event at 10:01, I can ignore it, since I saw the corresponding ending (simply annoying). When I receive event 1 of the device, I can not know if it is the last or not, so I process it. Then I receive the device event 2 immediately after and do the same work again, update the status to assume that this is the last one. I can not predict if there will be more events on the device, so the WSF has to follow that assumption, which in this case is correct.
The next batch is more difficult to deal with. I get a second "empty" session from user A, it's not a problem in itself. Then I get some device events out of sequence that are for user B session that I have not yet received. This is not a critical problem, I can update the device model associated with this information, but I still can not complete the processing.
Eventually, user B events arrive and I can correlate again with device events, again ignoring "older" events when possible.
We hope you can see that this adds a lot of difficulty to the prosecution and is probably leading to some missing cases.
What can I do to massage this flow of events to make it more actionable?
I've been thinking about:
- provision of events (but requires correct sequence)
- re-buffering the queue for X minutes (but I still can not be sure how long)
- implementing something like Nagle's algorithm for fragmentation and pause / space detection
- Combine all the workers in one, with an FSM (reflecting the boxing session) that then generates the events once they have complied with the interdependence sequence controls
- Do not fix the queue and implement a resistant random order processor
Because I can make some assumptions about the likely contents of the flow, I can consider a "transaction detector" or not make assumptions just a more generalized "reorder the flow" approach.
I know that sequence numbers would solve it easily, but as mentioned above I can not modify the upstream editor.
I'm not looking for a complete solution, just pointers to algorithms or techniques used for this kind of problem, so I can investigate more.