What is a worker node's internal shutdown sequence?

G H · February 2024

I haven't found any documentation for this - please point me at it if it exists - but I wonder what the shutdown procedure of a worker is?

I assume it stops all active listeners, then finishes processing any in-flight events before shutting each thread down?

Does it disconnect from the leader before processing those events, or otherwise remove itself from the pool of workers available to run collections?

Does it attempt to drain it's persistent queues? If so for how long and what happens it if fails?

Do destination connections stay up till the very end, or does each get disconnected as soon as there are no more events destined for it?

Plus lots of other considerations I'm sure - would be useful have a bullet list or flow chart of the shutdown sequence for reference.

Jon Rust · February 2024

Stop receiving new data
Flush anything in memory either to PQ or to the destinations if they're available
After flush, close the connections to destinations
Since it's in a shutdown state, the Leader will stop allocating new collection jobs to it
Current/configured jobs will be orphaned and the Leader will reassign them
PQ drain is not attempted. The point of PQ is a durable store. And draining a large PQ could take a lot of time, which is not great when you're trying to get work done based on the shutdown.

There are refinements in the process in development, but that's the basic idea.

Comments