One of the Workers in one of the Worker Groups seems to be stuck in deploying a version
Hello all, I am facing an issue where one of the Workers in one of the Worker Groups seems to be stuck in deploying a version while the other worker updated the version correctly as per the attached screenshot, Have anyone faced the same issue and what kind of troubleshooting / workaround can be done , Worth mentioning that I have tried rebooting the stuck worker twice and rebooted the leader as well.
Answers
-
Go to the worker CLI -> type ' ps -ef | grep cribl ' .. There is a good chance a runaway process is still refusing to go down Kill the process -> stop -> start .. normally does the trick
0 -
Thanks Raanan, Will give it a try but can you help me understand what is the runaway process ?
0 -
Check if there is any PQ data or dst staging directory in CRIBL_HOME that is many GB in size. If so then relocate those directories outside of CRIBL_HOME. It's possible the backup is taking too long , or outright failing, which causes the config to not get loaded in a timely manner.
0 -
The runaway process would be a cribl process.
0 -
Thanks <@U012ZP93EER> and <@U01J549PR6Y>, Just for the sake of the record we found the inputs.yaml owned by the root user while the whole application running by functional user
0 -
This was the main reason of having the Worker stuck in the version allocation process after restart
0 -
Good find and happy to hear the issue has been fixed
0