Recurring UI problem with one worker group. Leader restart helps temporarily.
We are seeing this issue quite often, the leader UI shows the the rotating wheel whenever a particular worker group is selected and it never loads the pipelines or routes. This issue happens only for one worker group and others load fine. A restart of the leader fixes the problem temporarily. I noticed the following error on leader but not clear on what it is indicating. The CPU usage of the leader is below average.
Answers
-
Is leader node trying to connect to the worker and it is not able to? is that what the message mean?
0 -
<@U0110PHRCSX> What version of Cribl Stream are you running?
0 -
4.1.2
0 -
How many worker process are connecting to the leader node?
0 -
And what is the size (cpu and ram) of the leader node?
0 -
leader has 8vcpus and 16GB RAM
0 -
A total of 184 worker processes
0 -
but the problematic worker group only has 6 worker processes
0 -
only the API process from each node connects to the leader. RPC comms will run over that connection.
0 -
This is known issue with the RPC process in 4.1.1 and 4.1.2. please upgrade to 4.1.3
0 -
do you have 184 processes or nodes <@U0110PHRCSX> ?
0 -
worker processes, we have a total of 6 workers
0 -
ok, i wanted to ensure we're using proper terms
0 -
i will get the upgrade done and monitor
0 -
are you using collectors or pull-based sources like Office365 or Kinesis Streams?
0 -
<@U012ZP93EER> sorry for the delay in responding i had to step out for a while
0 -
yes, there are REST API collectors
0 -
ok, that increases the likelihood of encountering the known issue that Eric mentioned so upgrading is highly recommended like he said.
0 -
ok, thank you
0