How are you handling token persistence?

Paul Dott · September 2023

I imagine a lot of people are using the Splunk Load Balanced destination with indexer discovery. How are you handling token persistence? We patch/recycle our Cluster Master monthly so looking for a method to "restore" the tokens without impacting data delivery. I believe tokens are stored in the kvstore, $SPLUNK_HOME/etc/passwd and $SPLUNK_HOME/auth/splunk.secret files.

Raanan Dagan · September 2023

<@U02QJ374Z3R> there are many who are using Splunk LB, but as far as I know .. using Indexer discovery + updating the Indexer discovery often .. I am not sure how often most customers updating the token ..

Raanan Dagan · September 2023

with Cribl there are 2 ways (that I can think-off) to update the Indexer Discovery token. Manually in the UI or Cribl API

Raanan Dagan · September 2023

Once you update the Token, Cribl will not impact data delivery since .. " Worker Process Rolling Restart During a restart, to minimize ingestion disruption and increase availability of network ports, Worker Processes on a Worker Node are restarted in a rolling fashion. 20% of running processes ‚Äì with a minimum of one process ‚Äì are restarted at a time. A Worker Process must come up and report as started before the next one is restarted. This rolling restart continues until all processes have restarted. If a Worker Process fails to restart, configurations will be rolled back. "

Paul Dott · September 2023

I don't use Indexer Discovery, nor do I recommend it to clients. Just say no.

Paul Dott · September 2023

<@U01J549PR6Y> - it's not that I want to update the token regularly in Cribl (if at all unless a breach or something). But when I recycle my cluster manager the auth tokens get wiped on the Splunk side and would therefore be invalidated in Cribl indexer-discovery.. Granted this is more of a Splunk question, but figured enough Cribl customers leverage Splunk LB destination that someone has a solution for it.

Paul Dott · September 2023

<@UEGNG8MJB> any particular reason? Curious what better method exists to maintain a working list of online indexers.

Jon Rust · September 2023

i used DNS. one name, all the indexer IPs behind it. I'm curious why a CM restart trashes the token. Never seen that before.

Paul Dott · September 2023

Not a restart, but swapping out the EC2 instance for updated AMI etc.

Paul Dott · September 2023

We also recycle the indexers for the same reasons. (security compliance)

Raanan Dagan · September 2023

As Jon said, most common that I have seen are the DNS alternative

Paul Dott · September 2023

DNS. Cribl handles DNS round robin waaaay better than Splunk. And being an old-school network guy who prefers an NLB over any round robin DNS, that says something. The programmers did an excellent job of leveraging DNS entries that are loaded with IPs. And the Cribl load balance approach is better than Splunk's.

Raanan Dagan · September 2023

I've found DNS is just reliable with Stream.

Paul Dott · September 2023

Thanks. So then the 'discovery' config would look something like this?

Jon Rust · September 2023

correct!

Paul Dott · September 2023

great....stay tuned.

Paul Dott · September 2023

Worked like a charm. Thanks for the guidance on this, much appreciated.

Paul Dott · September 2023

One follow up, which backpressure method best suits Splunk LB destination? The way I understand this setting, is that if the destination (ie Splunk indexers) cannot receive data: » block - buffer in memory on cribl workers » drop - /dev/null it » PQ - queue it on disk on the cribl workers until destination is accepting data

Jon Rust · September 2023

enable PQ

Jon Rust · September 2023

and set-up some constraints around the storage

Jon Rust · September 2023

i would enable compression too (not on by default iirc)

Paul Dott · September 2023

Reading https://docs.cribl.io/stream/persistent-queues/#persistent-queue-details-and-constraints|this, it looks like the PQ's use workers' storage. Hard to but a number on the amount of storage a worker can allocate for PQ, since it's relative to the amount of data you'd be sending to the destination, right?

Jon Rust · September 2023

yes rate of sending * expected down time / number of workers * compression expected

Jon Rust · September 2023

if you were doing 240 GB per day, that's 10 GB per hour divided by 2 workers == 5 GB per worker; then factor in compression

Jon Rust · September 2023

and add 50% :slightly_smiling_face:

Paul Dott · September 2023

napkin math ftw thanks a lot Jon. I did set it to 5GB, since our workers are on Fargate and have 20GB of ephemeral storage by default. And since PQ is worst case sort of scenario, ie the entire indexer cluster is :dumpsterfire: , seems to be a safe setting.

Jon Rust · September 2023

eggzactly

Jon Rust · September 2023

don't overthink it

Jon Rust · September 2023

or over-provision it

Paul Dott · September 2023

> don't overthink it too late for that :laughing:

Raanan Dagan · September 2023

One more note .. Max queue size is per Worker Processes (not per Worker node)

How are you handling token persistence?

Answers

Categories