We have updated our Terms of Service, Code of Conduct, and Addendum.

Opening Worker Group configuration gives error

Oliver Hoppe
Oliver Hoppe Posts: 50 ✭✭

I am seeing this error when trying to access the worker group configuration from the UI:
“The Config Helper service is not available because a configuration file doesn’t exist or the settings are invalid. Please fix it and restart Cribl server”:

Best Answer

  • Robbert Hink
    Robbert Hink Posts: 17
    Answer ✓

    This means that some socket files were deleted from the leader node. Stream uses Socket files for inter-process communication (IPC) between the Leader and distributed processes and services. Each worker group has a unique socket file on the leader node and there is one for the metrics service as well as other services.

    If this issue happens intermittently there are scenarios that can cause this to happen:

    • Multiple configuration changes and deploys at quick succession
    • These files are no longer available anymore due to a cleanup of the /tmp folder on the leader node.
    • Kubernetes Pod restarts due to memory limits having been reached, causing the deletion of the temporary folder, the default location for the socket files.
    • The leader is suffering from resource starvation, causing a connection to the underlying socket files to not be possible on time.

    Restarting the leader node should resolve the issue in all scenarios, even if only temporary. The steps outlined in our documentation (here), will help to make the resolution more permanent by means of preventing that the socket files can get deleted in the first place.


Answers

  • Robbert Hink
    Robbert Hink Posts: 17
    Answer ✓

    This means that some socket files were deleted from the leader node. Stream uses Socket files for inter-process communication (IPC) between the Leader and distributed processes and services. Each worker group has a unique socket file on the leader node and there is one for the metrics service as well as other services.

    If this issue happens intermittently there are scenarios that can cause this to happen:

    • Multiple configuration changes and deploys at quick succession
    • These files are no longer available anymore due to a cleanup of the /tmp folder on the leader node.
    • Kubernetes Pod restarts due to memory limits having been reached, causing the deletion of the temporary folder, the default location for the socket files.
    • The leader is suffering from resource starvation, causing a connection to the underlying socket files to not be possible on time.

    Restarting the leader node should resolve the issue in all scenarios, even if only temporary. The steps outlined in our documentation (here), will help to make the resolution more permanent by means of preventing that the socket files can get deleted in the first place.