Failure of worker node

Silambarasan Selvamani · June 2023

When a worker nodes fail (down for whatever reason), what happens to the data that the specific worker had received for processing ( like, in-memory , PQ data ) ?
Worker nodes receive data in distributed manner for processing , how does it ensures that data delivered to any third party destination (e.g. splunk ) is the chronological order of the events ?
1. e.g. in case of hardware issues / slowness if a worker node is experiencing lag in processing data ., can it be expected that data send to any output destination has latency ?

Raanan Dagan · July 2023

The Worker Nodes and Worker Processes are stateless. Therefore, if one of them died, the Splunk Forwarder will be notified that the TCP connection died, and it will resubmit the request to the next available one. Furthermore, to increase the system’s resiliency, the leader process also acts as a watchdog for worker processes, restarting any that exit or crash.
As for PQ, for both Source PQ and Destination PQ When the receiver is ready, the output will start draining the queues in FIFO (First In, First Out) fashion and Order is maintained.
Another option during the draining process for Destination PQ, if Strict ordering is disabled, Cribl Stream will prioritize new events over draining the queue. This is like LIFO (Last In, First Out) fashion.

Raanan Dagan · July 2023

The Worker Nodes and Worker Processes are stateless. Therefore, if one of them died, the Splunk Forwarder will be notified that the TCP connection died, and it will resubmit the request to the next available one. Furthermore, to increase the system’s resiliency, the leader process also acts as a watchdog for worker processes, restarting any that exit or crash.
As for PQ, for both Source PQ and Destination PQ When the receiver is ready, the output will start draining the queues in FIFO (First In, First Out) fashion and Order is maintained.
Another option during the draining process for Destination PQ, if Strict ordering is disabled, Cribl Stream will prioritize new events over draining the queue. This is like LIFO (Last In, First Out) fashion.

Failure of worker node

Best Answer

Answers

Categories