Does Cribl have a checkpoint feature, similar to Splunk's fishbucket feature?
Hi All,Does Cribl have a checkpoint feature, similar to Splunk's fishbucket feature?Eg. Ingesting from a source like kafka(doesn't know what it has sent) via Cribl to Splunk, but Cribl for some reason is down for a period of time, how does Cribl know what it's ingested already and what it hasn't.
Answers
-
The Kafka (and by extension Azure Event Hub, and Amazon Kinesis Source) keep checkpoints with the leader of the last timestamp they read from. This information is synced with the Leader on a periodic basis.
0 -
Only Kinesis input uses the leader in this regard. Kafka and EH will commit offsets to the broker, which is normal Kafka functionality, so that when another consumer starts reading from a partition the broker tells the new consumer the last offset the previous consumer read from to avoid missed data but it won't necessarily duplicate data.
0 -
See also consumer groups in kafka. We've had success sending data to multiple locations that way.
0 -
Consumer groups don't affect the use of commits though.
0 -
We don't have dedicated topics in Kafka, thus the original question, Using Cribl to drop what's not needed, but we obviouisly need that checkpoint feature between the 2 so that we don't reingest if destination is unavailable for some reason
0 -
Seems like <@U012ZP93EER> and <@U01LSBF5953> are saying opposite things?
0 -
The Kafka protocol inherently uses checkpoints. Those are sent back to the leader to utilize when another consumer reads from a partition. Regardless of what we do on our side we still perform commits.
0