While tuning isn’t strictly required, users may sometimes have trouble getting data into Stream from Splunk universal forwarders (UF). Usually this presents as a performance issue that results in the forwarders getting blocked by Stream. Why would you need to change anything on the UF when the forwarders can successfully send the same data to Splunk indexers without tuning?
The short answer is: Stream and Splunk are architected differently.
The work that Stream does with events compared to indexer processes is quite different. Stream sends events to indexers as cooked but not compressed. Stream must do significantly more processing on events versus indexers. The default settings for Splunk forwarders work well for most Splunk environments. However, because they’re built for Splunk, they aren’t necessarily going to apply as-is for optimal transmission to Stream. Note that while this article is focused on how to tune Splunk Universal Forwarders, the sizing calculations carry over to any input.
Which Knobs Do I Turn?
The goal is to tune the Splunk forwarders to maximize throughput without overwhelming the individual Stream Worker processes with too much data. To that end, here is the list of settings to tune in your Splunk forwarders:
- AutoLBFrequency (outputs.conf) – You use this setting to help prevent excessive TCP pinning (as Cribl calls it), also known as sticky sessions. The default value is 30 seconds, and you want to ensure your setting is no higher than the default. While there is no magic number here, the higher the setting, the worse the performance will be. This is because high volume connections can saturate a worker process if the process gets too much data over too long of a time period. “Too long” depends on your events and your Stream configuration. The utility of this setting is dependent on another aspect: event breaking.
- Event breaking (props.conf) – Event breaking converts the data stream into discrete events. Forwarders won’t switch to another output server if they’re sending unbroken events because they wait for an event boundary (or EOF) to close a TCP connection. Otherwise, you would get truncated events. So, if the forwarders don’t know where the event boundary is, they won’t close the connection, which renders the AutoLBFrequency setting irrelevant.
Note: Technically, the forcedTimebasedAutoLB
setting can be used in place of event boundaries or an EOF to switch output servers. However, this introduces a risk of truncated events. Because we have seen truncated events when customers use this setting, we explicitly note in our Splunk TCP docs that it should be turned off when sending to Stream. While Splunk indexers can mitigate the truncation, Stream does not have the same mechanism used by Splunk indexers. Because the integrity of your events is important, use this setting at your own risk. Incidentally, with Splunk v6.5+, Splunk itself recommends using Event Breakers rather than this setting.
- MaxKBps (limits.conf) – Use this setting to adjust the forwarder throughput. While it defaults to 256, you don’t need to limit the throughput here as long as you have optimized all other settings. Go ahead and set this to 0. Because the setting is applied to each forwarders’ ingestion pipeline, not the
tcpoutput
processor, it doesn’t affect how fast data is sent to Stream but rather how fast the forwarder will ingest the data. - pipelineSetSelectionPolicy (server.conf) – Use this setting to ensure optimal distribution of events across Splunk ingestion pipelines and thus outgoing TCP connections. The default setting is round-robin, but we’ve seen that pipeline usage can become quite unbalanced using the default algorithm. If you change this setting to
weighted_random
, it will yield better results. - parallelIngestionPipelines(server.conf) – This setting defaults to 1. We often ask users to modify this setting; doing so will help leverage more of those Worker processes in your Stream environment. Each pipeline handles data from ingress to egress in a forwarder, which dictates how many outbound TCP connections are used. The more connections over which data can be sent to Stream, the higher throughput you’ll achieve in Stream by employing more worker processes. That said, there are some caveats:
- The benefits of this setting are heavily dependent on the quantity of inputs. Because each pipeline handles both ingress and egress, additional pipelines won’t be used by a given forwarder if there aren’t enough inputs configured to require them. For example, if you have only one input stanza in your
inputs.conf
, using a value of two for this setting won’t leverage the extra pipeline.
Many Cribl customers have syslog devices sending events to syslog-ng running on the same host as a UF. Syslog-ng is writing events to files, and a UF is monitoring these files. High-volume firewalls can generate multiple terabytes of daily data written to multiple files in time order. However, each file set is going to be treated as a single input by a UF, so a UF will only use one connection to send each file set’s data. This can quickly saturate a worker process for high-volume sources. Your best solution for this problem is to eliminate the UF from the path, at least for the high-volume syslog sources, so they can send their data directly to Stream (via a load balancer). An alternative solution is to ensure that the forwarder’s TCP connections to Stream are shut down frequently so that they are sent to a different Stream process when they reopen; event breakers aid in allowing these connections to rotate.
- Next is the added CPU resources per pipeline. If your forwarder host doesn’t have the CPU resources to add more pipelines, there is a risk that there will be too few connections to distribute the data volume. This increases the chances that the Stream processes will be overloaded. In this case, you may need to add more UFs or consider eliminating them altogether, as mentioned in the syslog example.
What is the recommended number for ingestion pipelines? Before detailing that, we need to understand how we measure throughput in Stream. If you know our sizing information, skip the next section and proceed to Pipeline Engineering.
Throughput Under a Microscope
As documented here, there is a limit to the data volume that can be processed per worker process per unit of time. So, we’ve created general guidelines to help ensure your worker processes aren’t pushed beyond their limits. The exact real-world limits vary case by case, so we have general guidelines. The value of 400 GB/day (x64) in Cribl documentation is based on minimal data processing with a single destination. A single destination implies ingress and egress are split evenly at 200 GB. The more copies on egress, the lower the throughput of each TCP stream, including the ingress stream, to keep the total around 400 GB/day. The 400 recommendation becomes slightly higher when using Graviton (ARM) processors, but to keep this discussion simple, we'll leave the math (detailed below for x64) for Graviton up to you. To simplify this analysis, take a look at the throughput introspection Splunk dashboard provided in our Splunk app hosted on GitHub. This provides throughput information per worker process as well as per input and output.
Although our sizing guidance is based on a per-day volume, it is imperative to focus on volumes at smaller time scales. For example, 400 GB/day per worker process is the daily limit, but that does not mean 400 GB can be received all within, for example, one hour and nothing else for another 23 hours. You must consider physical limitations because processing isn’t free. So, in reality, be mindful of the per-second threshold to best ensure your processes aren’t overloaded at any given moment of the day. Of course, you can’t plan for every spike in traffic, but doing some level of planning helps mitigate the risk of processes displaying unexpected behavior because they are overwhelmed. In our example, 400 GB/day translates to 4.74 MB/s or 2.37 MB/s on ingress and 2.37 MB/s on egress. These are the numbers we’ll be referencing below.
Continuing with our example, we are striving to ensure the total throughput of a given worker process at any time does not exceed 4.74 MB/s, or, if it does, then we must strive not to let it exceed that threshold for too long. Remember, the 400 GB/day is a guideline rather than a hard limit. No processing at all (as you see with a default passthru
Pipeline that has 0 functions in it) allows for a higher throughput ceiling. There is also the possibility that, in some environments, the processes won’t reach 400 GB/day, let alone exceed it. It just depends on what’s in your event processing configuration.
So, our throughput (in and out) rate is 4.74 MB/s. This is the aggregate ingress rate for all network sockets (TCP and UDP) from which a process is receiving data, plus the aggregate rate of all egress network sockets. In the simplest scenario, a process receives data over just one connection. That isn’t exactly realistic, but we’re going to simplify to make calculations easier in this exercise. We’ll assume one connection will constitute the entire ingress rate.
The other side of throughput is the egress rate. Users typically have at least some, if not all, events sent to multiple destinations. This type of configuration requires additional CPU and networking resources because each copy of data sent out to one or more destinations must be put on the wire separately in its own data stream along with the related overhead. As a result, although the overall throughput stays at 400 GB/day, the ratio of ingress vs egress traffic must be adjusted in our calculations. If we account for three outbound copies (and to simplify further, we’ll assume three full copies with event size on ingress equivalent to that of egress), that now drops the 200 GB ingress down to 100 GB/day so that we can have 300 GB/day on egress. This helps us stay within the 400 GB/day.
Pipeline Engineering
We can return to the discussion of determining how many Splunk ingestion pipelines you need. Estimating this requires knowing the total data volume generated by a given forwarder combined with the 4.74 MB/s (derived from the 400 GB/day/process number used previously), and the number of separate data streams (that is, copies of events) when factoring in both ingress and egress.
In other words:
- Max_ingress_rate = 4.74 MB/s / #_of_tcp_streams.
- # parallel pipelines = Forwarder’s_total_daily_volume / max_ingress_rate.
The max throughput of each event stream copy is calculated with equation 1. To simplify all these calculations, our first assumption is that the entire ingress data volume is also being sent on egress. This is in contrast to a subset of those events being sent to some destinations while a full set is being sent to others. That mixed scenario is beyond the scope of this discussion. As a result of that assumption, the max throughput of the ingress stream can be considered the same as the throughput of any of the ingress or egress streams. This greatly simplifies our calculations.
Secondly, these calculations assume the event size of each egress stream is the same as on ingress. Keep in mind that JSON Unroll, enrichment, reduction, and so forth will cause egress volume to be much different than ingress.
Let's work through an example. One copy of data on ingress, and three full copies on egress, which is a total of four full copies.
1.185 MB/s per stream = 4.74 MB/s / 4.
Now, let’s assume a UF is sending 800 GB/day among many different inputs (to fully leverage multiple ingestion pipelines). This translates into an average of 9.48 MB/s.
8 pipelines = 9.48 MB/s / 1.185 MB/s.
What does 8 pipelines mean? What we’ve calculated is the minimum number of ingestion pipelines that should be used on this particular UF to help reduce (but maybe not eliminate) the likelihood the Stream processes receiving data from this UF will be overwhelmed when accounting for all ingress and egress data streams related to the UF’s data. These calculations can’t guarantee anything because this is only an estimate. In reality, the pipelines aren’t used perfectly evenly within a UF (the aforementioned weighted_random
will help address that). Another reason that there is no guarantee of zero processing lag is that we’re using the average of a forwarder’s daily volume. When a traffic spike occurs, there still could be some short-lived processing lag from Stream processes.
We’re also simplifying the configuration by assuming these worker processes are not processing events from any other sources. Remember that some of your UFs will be processing differing amounts of events due to collecting different data sets, so you’ll need to do this calculation for each forwarder group.
If your configuration involves sending subsets of the ingress event stream to one or more destinations or your events on egress are larger/smaller compared to ingress, then these calculations will need to be modified accordingly. It’s difficult to account for all variables, so use this information as guidance for items to consider, and be prepared to encounter scenarios not yet accounted for and, potentially, future tuning.
Conclusion
Splunk forwarders are complicated beasts, but the same settings making them complicated also make them as flexible as possible. Many users may not need to leverage those settings when in a homogeneous Splunk environment but these settings are heavily relied upon for optimizing performance with Cribl Stream. We encourage you to review your Splunk forwarder configurations if you experience slower performance when sending to Stream than Splunk indexers. Usually the solution requires simple tuning of the forwarder configurations to interoperate with Cribl Stream.