We have updated our Terms of Service, Code of Conduct, and Addendum.

One of the Workers in one of the Worker Groups seems to be stuck in deploying a version

Options

Hello all, I am facing an issue where one of the Workers in one of the Worker Groups seems to be stuck in deploying a version while the other worker updated the version correctly as per the attached screenshot, Have anyone faced the same issue and what kind of troubleshooting / workaround can be done , Worth mentioning that I have tried rebooting the stuck worker twice and rebooted the leader as well.

Answers

  • Raanan Dagan
    Raanan Dagan Posts: 101 mod
    Options

    Go to the worker CLI -> type ' ps -ef | grep cribl ' .. There is a good chance a runaway process is still refusing to go down Kill the process -> stop -> start .. normally does the trick

  • Mina Yacoub
    Options

    Thanks Raanan, Will give it a try but can you help me understand what is the runaway process ?

  • Brandon McCombs
    Options

    Check if there is any PQ data or dst staging directory in CRIBL_HOME that is many GB in size. If so then relocate those directories outside of CRIBL_HOME. It's possible the backup is taking too long , or outright failing, which causes the config to not get loaded in a timely manner.

  • Brandon McCombs
    Options

    The runaway process would be a cribl process.

  • Mina Yacoub
    Options

    Thanks <@U012ZP93EER&gt; and <@U01J549PR6Y&gt;, Just for the sake of the record we found the inputs.yaml owned by the root user while the whole application running by functional user

  • Mina Yacoub
    Options

    This was the main reason of having the Worker stuck in the version allocation process after restart

  • Raanan Dagan
    Raanan Dagan Posts: 101 mod
    Options

    Good find and happy to hear the issue has been fixed