We have updated our Terms of Service, Code of Conduct, and Addendum.

Does Cribl have a checkpoint feature, similar to Splunk's fishbucket feature?

Robbert Hink
Robbert Hink Posts: 17
edited September 2023 in General Discussions

Hi All,Does Cribl have a checkpoint feature, similar to Splunk's fishbucket feature?Eg. Ingesting from a source like kafka(doesn't know what it has sent) via Cribl to Splunk, but Cribl for some reason is down for a period of time, how does Cribl know what it's ingested already and what it hasn't.

Answers

  • Brendan Dalpe
    Brendan Dalpe Posts: 201 mod

    The Kafka (and by extension Azure Event Hub, and Amazon Kinesis Source) keep checkpoints with the leader of the last timestamp they read from. This information is synced with the Leader on a periodic basis.

  • Only Kinesis input uses the leader in this regard. Kafka and EH will commit offsets to the broker, which is normal Kafka functionality, so that when another consumer starts reading from a partition the broker tells the new consumer the last offset the previous consumer read from to avoid missed data but it won't necessarily duplicate data.

  • See also consumer groups in kafka. We've had success sending data to multiple locations that way.

  • Consumer groups don't affect the use of commits though.

  • We don't have dedicated topics in Kafka, thus the original question, Using Cribl to drop what's not needed, but we obviouisly need that checkpoint feature between the 2 so that we don't reingest if destination is unavailable for some reason

  • Seems like <@U012ZP93EER&gt; and <@U01LSBF5953&gt; are saying opposite things?

  • The Kafka protocol inherently uses checkpoints. Those are sent back to the leader to utilize when another consumer reads from a partition. Regardless of what we do on our side we still perform commits.