We have updated our Terms of Service, Code of Conduct, and Addendum.

What is a worker node's internal shutdown sequence?

G H
G H Posts: 10

I haven't found any documentation for this - please point me at it if it exists - but I wonder what the shutdown procedure of a worker is?

I assume it stops all active listeners, then finishes processing any in-flight events before shutting each thread down?

Does it disconnect from the leader before processing those events, or otherwise remove itself from the pool of workers available to run collections?

Does it attempt to drain it's persistent queues? If so for how long and what happens it if fails?

Do destination connections stay up till the very end, or does each get disconnected as soon as there are no more events destined for it?

Plus lots of other considerations I'm sure - would be useful have a bullet list or flow chart of the shutdown sequence for reference.

Comments

  • Jon Rust
    Jon Rust Posts: 487 mod

    Referencing the Slack thread (thanks @Brandon McCombs)

    • Stop receiving new data
    • Flush anything in memory either to PQ or to the destinations if they're available
    • After flush, close the connections to destinations
    • Since it's in a shutdown state, the Leader will stop allocating new collection jobs to it
    • Current/configured jobs will be orphaned and the Leader will reassign them
    • PQ drain is not attempted. The point of PQ is a durable store. And draining a large PQ could take a lot of time, which is not great when you're trying to get work done based on the shutdown.

    There are refinements in the process in development, but that's the basic idea.