Does Cribl have a checkpoint feature, similar to Splunk's fishbucket feature?

Robbert Hink · September 2023

Hi All,Does Cribl have a checkpoint feature, similar to Splunk's fishbucket feature?Eg. Ingesting from a source like kafka(doesn't know what it has sent) via Cribl to Splunk, but Cribl for some reason is down for a period of time, how does Cribl know what it's ingested already and what it hasn't.

Brendan Dalpe · September 2023

The Kafka (and by extension Azure Event Hub, and Amazon Kinesis Source) keep checkpoints with the leader of the last timestamp they read from. This information is synced with the Leader on a periodic basis.

Brandon McCombs · September 2023

Only Kinesis input uses the leader in this regard. Kafka and EH will commit offsets to the broker, which is normal Kafka functionality, so that when another consumer starts reading from a partition the broker tells the new consumer the last offset the previous consumer read from to avoid missed data but it won't necessarily duplicate data.

Christopher Owen · September 2023

See also consumer groups in kafka. We've had success sending data to multiple locations that way.

Brandon McCombs · September 2023

Consumer groups don't affect the use of commits though.

Robbert Hink · September 2023

We don't have dedicated topics in Kafka, thus the original question, Using Cribl to drop what's not needed, but we obviouisly need that checkpoint feature between the 2 so that we don't reingest if destination is unavailable for some reason

Robbert Hink · September 2023

Seems like <@U012ZP93EER> and <@U01LSBF5953> are saying opposite things?

Brandon McCombs · September 2023

The Kafka protocol inherently uses checkpoints. Those are sent back to the leader to utilize when another consumer reads from a partition. Regardless of what we do on our side we still perform commits.

Does Cribl have a checkpoint feature, similar to Splunk's fishbucket feature?

Answers

Categories