We have updated our Terms of Service, Code of Conduct, and Addendum.

REST Collector Strategies - pull all events since last run

Hi,

for most of my SaaS services, I have to use the REST collector to pull events periodically, as they do not support sending their events directly into Cribl.

I wonder what the recommended strategy is to ensure that all events are getting collected.

  • What schedule is common to run the REST collector on? every few hours or rather minutes?
  • Most APIs support giving a "from" timestamp parameter - but what variable can I use within Cribl to reference "the last run time of a job" ?
  • In general: Is it better to give the API precise instructions beforehand, so it will not return events already collected, or is it better to just collect duplicate events and then drop them in Cribl (I did it with a Suppression rule, but the thresholds need to be very long)

Thanks a lot.

Comments

  • Jon Rust
    Jon Rust Posts: 487 mod

    The REST Collector currently does not have a checkpointing mechanism. Most implementations use the time. For example, run every 5 minutes and grab the last 5 minutes of logs. I usually recommend offsetting that by a minute or 2. (Eg, run every 5, but set earliest time in the API to -6m and the latest to -1m.)

    Checkpointing is in development, so this will change in the future. I don't have an ETA tho.

    Clients that have required it up to now have used an external store to retrieve the latest time in the Discover phase. Redis is a common choice.