Handling System Metrics overwhelming 4200 port

William Chen · November 2023

Has anyone experienced system metrics crippling the 4200 port? What are some best practices or documentation to combat this and just setting configurations for system metrics in general?

Tony Reinke - Cribl · November 2023

What are you using to send the metrics to Stream? Are you using Cribl Edge or another Stream instance?

Brandon McCombs · November 2023

The specific action depends on the actual problem with the metrics. To determine that may require more analysis or simply some trial and error.

In no particular order:

Disable Full Fidelity option in the Cribl internal metrics source. Not only does this send full fidelity through the pipelines but also to the leader.
If disk IO is high then disable Metrics Persistence on leader node. This doesn't prevent the leader node from still receiving the metrics from clients but minimizes the impact involved with writing to disk. The side effect is that if you restart the leader node (or use leader HA and have a failover) then the metrics will not be be retained.
Reduce cardinality limit in Group Settings->General->Limits->Metrics. It can be difficult to know whether to modify this or not so if the metrics are being sent to an analytics tool downstream then that tool can be used to determine if cardinality is high for any particular metric.
In a fleet or worker group's settings: modify the Metrics Never Drop List by removing anything that you may have added in the past. By default only total.* and system.* are specified; these will populate your Event In/Out and CPU Load graphs on the leader's Monitoring page so if you remove them (which you may need to) then you'll lose those graphs. If you've added more to this list then that means more metrics are possibly being sent to the leader that may have been previously getting dropped by the clients.
Add more entries to the Disable field metrics list to exclude metrics from being sent to the leader. By default this contains host, source, sourcetype, index, and project.

If there are numerous Edge clients (more than Stream workers) then you may need to modify the Metrics to send from Edge Nodes setting to reduce the metrics being sent. By default it's set to Basic. If you've increased the metrics in the past by modifying this setting then change it back to Basic or even try Minimal.

Handling System Metrics overwhelming 4200 port

Answers

Categories