How are you handling token persistence?
I imagine a lot of people are using the Splunk Load Balanced destination with indexer discovery. How are you handling token persistence? We patch/recycle our Cluster Master monthly so looking for a method to "restore" the tokens without impacting data delivery. I believe tokens are stored in the kvstore, $SPLUNK_HOME/etc/passwd and $SPLUNK_HOME/auth/splunk.secret files.
Answers
-
<@U02QJ374Z3R> there are many who are using Splunk LB, but as far as I know .. using Indexer discovery + updating the Indexer discovery often .. I am not sure how often most customers updating the token ..
0 -
with Cribl there are 2 ways (that I can think-off) to update the Indexer Discovery token. Manually in the UI or Cribl API
0 -
Once you update the Token, Cribl will not impact data delivery since .. " Worker Process Rolling Restart During a restart, to minimize ingestion disruption and increase availability of network ports, Worker Processes on a Worker Node are restarted in a rolling fashion. 20% of running processes – with a minimum of one process – are restarted at a time. A Worker Process must come up and report as started before the next one is restarted. This rolling restart continues until all processes have restarted. If a Worker Process fails to restart, configurations will be rolled back. "
0 -
I don't use Indexer Discovery, nor do I recommend it to clients. Just say no.
0 -
<@U01J549PR6Y> - it's not that I want to update the token regularly in Cribl (if at all unless a breach or something). But when I recycle my cluster manager the auth tokens get wiped on the Splunk side and would therefore be invalidated in Cribl indexer-discovery.. Granted this is more of a Splunk question, but figured enough Cribl customers leverage Splunk LB destination that someone has a solution for it.
0 -
<@UEGNG8MJB> any particular reason? Curious what better method exists to maintain a working list of online indexers.
0 -
i used DNS. one name, all the indexer IPs behind it. I'm curious why a CM restart trashes the token. Never seen that before.
0 -
Not a restart, but swapping out the EC2 instance for updated AMI etc.
0 -
We also recycle the indexers for the same reasons. (security compliance)
0 -
As Jon said, most common that I have seen are the DNS alternative
0 -
DNS. Cribl handles DNS round robin waaaay better than Splunk. And being an old-school network guy who prefers an NLB over any round robin DNS, that says something. The programmers did an excellent job of leveraging DNS entries that are loaded with IPs. And the Cribl load balance approach is better than Splunk's.
0 -
I've found DNS is just reliable with Stream.
0 -
Thanks. So then the 'discovery' config would look something like this?
0 -
correct!
0 -
great....stay tuned.
0 -
Worked like a charm. Thanks for the guidance on this, much appreciated.
0 -
One follow up, which backpressure method best suits Splunk LB destination? The way I understand this setting, is that if the destination (ie Splunk indexers) cannot receive data: » block - buffer in memory on cribl workers » drop - /dev/null it » PQ - queue it on disk on the cribl workers until destination is accepting data
0 -
enable PQ
0 -
and set-up some constraints around the storage
0 -
i would enable compression too (not on by default iirc)
0 -
Reading https://docs.cribl.io/stream/persistent-queues/#persistent-queue-details-and-constraints|this, it looks like the PQ's use workers' storage. Hard to but a number on the amount of storage a worker can allocate for PQ, since it's relative to the amount of data you'd be sending to the destination, right?
0 -
yes rate of sending * expected down time / number of workers * compression expected
0 -
if you were doing 240 GB per day, that's 10 GB per hour divided by 2 workers == 5 GB per worker; then factor in compression
0 -
and add 50% :slightly_smiling_face:
0 -
napkin math ftw thanks a lot Jon. I did set it to 5GB, since our workers are on Fargate and have 20GB of ephemeral storage by default. And since PQ is worst case sort of scenario, ie the entire indexer cluster is :dumpsterfire: , seems to be a safe setting.
0 -
eggzactly
0 -
don't overthink it
0 -
or over-provision it
0 -
> don't overthink it too late for that :laughing:
0 -
One more note .. Max queue size is per Worker Processes (not per Worker node)
0