We have updated our Terms of Service, Code of Conduct, and Addendum.

Guidance on when to add additional processors due to a large number of consecutive tasks?

The sizing pages talk about scaling in regards to data volumes in and out. However, is there any guidance on when to add additional processors due to a large number of consecutive tasks (i.e. Collection Jobs)? Does each Collection Job run in its own worker process? On an 2 node worker group with 8 vCPUs/ea could we run into queuing issues if we had 50 or 100 collectors attempting to run at the same time?

Answers

  • Jobs are broken into tasks which are put into a job queue and are taken off the queue in the leader node as the worker processes in the group request tasks to complete. In this manner, all the data that was discovered is distributed as evenly as possible across the worker group.

  • Brendan Dalpe
    Brendan Dalpe Posts: 201 mod

    Something to consider is the limits page regarding the number of jobs/tasks that can be run concurrently: https://docs.cribl.io/stream/collectors-job-limits/

  • It's best to avoid scheduling jobs in such a way that they run simultaneously. Some overlap may be unavoidable but the more processes that are available then the more tasks that can be executed to finish a job.

  • Brendan Dalpe
    Brendan Dalpe Posts: 201 mod

    Thank you all

  • Jon Rust
    Jon Rust Posts: 475 mod

    for larger collection use cases, i'd encourage a separate worker group dedicated to collection