User Examples

Practical Application for the Sampling Function: ITOps and Kubernetes

Observability Challenges with Kubernetes

Kubernetes can be a particularly challenging data source to manage. the transient and ephemeral nature makes the collection of observability metrics quite difficult. Furthermore, platform administrators frequently lack control over the specific logging configurations of each application within Kubernetes. These applications often generate complex events that create high volumes of data sent to the SIEM or logging platform. This leads to expensive increases in licensing and inefficient allocation of manpower to analyze data that doesn’t always provide significant value.

Cribl Edge and Kubernetes, a Perfect Match

While there are many tools that can be used to collect Kubernetes logs and metrics, installing, configuring, and maintaining them requires additional resources and IT costs. Moreover, these products introduce an additional layer of complexity as they lack visibility into the events occurring at the source level.

Cribl Edge is the perfect solution for monitoring Kubernetes data. Using Edge reduces the complexity of your tech stack by unifying collection and processing into a single tool. Instead of relying on a distinct tool for data ingestion, and then processing before sending to its ultimate destination.Why Cribl Edge for Kubernetes:

Helm charts allow for quick and easy deployments of Cribl Edge
Collect data at the actual source
Deploy in cluster to collect the right amount of logs and metrics
Capture console logs from containerized applications
Capture metrics with enhanced metadata
Filter rules allow you to easily filter out the data you do not want

Making use of native functionality in Cribl Edge can provide additional benefits like optimizing metric output, reducing log/alert fatigue, and facilitating troubleshooting of issues.

Now that you know some of the benefits of using Cribl Edge and Kubernetes together, let's take a look at a specific use-case.

Monitoring Kubernetes Metrics with the Sampling Function

There are several Kubernetes metrics that can be monitored:

Kubernetes cluster metrics
Control plane metrics
Kubernetes node Metrics
Pod Metrics
Application metrics

Many of the metrics produced by Kubernetes pertain to the same cluster, but from different perspectives or levels of depth. This makes them the ideal candidate to leverage reduction techniques while also maintaining the fidelity necessary for troubleshooting. Using Cribl Edge’s Sampling Function provides greater control over output than other solutions, like Filter Expressions or even increasing the polling interval.

Let's walk through a simple use case involving monitoring memory.

Use Case: You are receiving Out Of Memory (OOM) errors due to a pod utilizing more memory than anticipated. To optimize node status output concerning memory, and reduce less critical information while still powering dashboards and alerts, leverage the sampling function. Sample the ‘kube_node_status_condition’ where the condition ‘MemoryPressure’ is false.

Step 1: Add Source and Destination -> Create a Pipeline to work with

First, make sure you have the source and destination of choice configured. Then, create or select the route between the source and the destination of choice.

Manage -> Fleet -> Collect

yE4o6oT_euDdiVFEoGgVUEVZPsXSTS0hSNwqLzSuE9lCPog_1v8KUDj6nKFUEYBaEnonrEoyB6jNfZu0TfEhSN7bCaHwNT5TnBjRmSfuYECW9iHevGHcW-XQl0n5UFz0B8eH_XYwvd06_8UeI3PYRn0

In our example, you will be working with the Kubernetes Metrics source which are kube-state-metrics. This data provides live information about the resource utilization of objects like pods, nodes, and deployments at the highest level.

You will use Prometheus as the destination since the output schema is based on the Prometheus format, but any supported destination can be used.

Pro TIP: Other metric sources, such as cAdvisor, provide low-level stats from containers (e.g., RAM and CPU usage) and can also provide valuable insights. It’s easy to collect this information with the Prometheus Edge Scraper source.

Step 2: Add the Sampling Function, and Modify the Sampling Rate

Once you have created or selected the pipeline, add the Sampling Function to your Route. change our Sample Rate from 1 to 30 so that we can begin to understand how this function will impact our results.

Pipeline -> Add Function -> Search for Sampling

8LXabpVUYrtUR-NYIqNoIQq_joTC5XSYlbxzPsiX_8JG6A0WG_j7kgZWVW7tA2c45nr5la6WcsCKCmuvI5EEVr11k9BRuRpspG4svPp4ztHF_0xr_XsO7HY_ZU-Gqb1Ruw9zs_GeT3ZElBcPWhCWSIk

Leaving the filter set to ‘true’ allows all events to pass through the filter, but since you have set the Sampling Rate to 30, the output will be 1 event of every 30 passing through the filter. However, increasing the Sampling Rate does not increase the fidelity of the output, and likely decreases it because we are sampling both critical and non-critical metrics. Continue to see how you can get the best results.

Step 3: Set Sample Data Parameters

Capture sample data to provide the base data model for creating and testing the Sampling function.

Sample Data -> Capture Data -> Capture -> Modify Capture Time (sec), Capture up to N Events -> Start

Capture Time (sec) = 60 ; Capture Up to N Events = 100

kVB1iV0SGuTD04rvgMMvwClR0f_80-pF8ZZQ52HD4JJoloZPAWp2_Kos0dpW2WY3VZzcAS1ILW5brbyUi-sdpawoYMgq77MyoiZQFNDiKVzBCCWSRj4o7BGPTRGtDqLsDDFSY-bysY6StNPyk5FklXE

Modify the Capture Time (sec), and the Capture Up to N Events based on how large you want the sample data model.

Pro TIP: If you receive a “request entity too large” you can adjust the default sample size, 256K, to a maximum size of 3 MB. Learn more about controlling the sample size on Cribl Edge.

Step 4: Validate, Filter Sample Data

Now that you have sample data time and event parameters set, select fields to work with. This is a great way to hone in on only the fields you are interested in sampling.

Filter Expression -> Fields -> Save as a Sample File

Add Filter Expression: ‘kube_node_status_condition >= 0’

9W3Rqn-P8PYUyT69QAynlmz19zpGaIyrOXTrPG4A8MYbaWg5NeZ8BzP-wH7PoEiEHvuQSMh7G-JcmwsRpjslYhtgW2MmpLwl2yZhL1-vnTNeEiuOJJnUm25-UL-zlns3K59S4nwQBjVM_ZS-EdOlNUQ

To focus on node status fields, you can filter for the presence of a field name in the event by adding ‘kube_node_status_condition >= 0' to the Filter Expression. This will create a sample with only logs that contain ‘kube_node_status_condition’.

If you find that you need more data to work with, based on the Filter Expression, adjust your time and event parameters.

Step 5: Add Sample Rules

By adding Sampling Rules you can apply specific criteria while maintaining the desired level of fidelity for metrics.

Select -> Add Rule → Advanced Mode

Add a rule that will sample the less critical information, at a rate that will still power dashboards and alerting at the destination.

Filter: status== ‘false’ && condition== ‘MemoryPressure’

jrOIfBq8charWk_VdVPce4iajAgHQY61GQ_ZyyELHrLyu0zoFaRYEqjBYNF1L-Aued1zheI3tVX92K8RbXpqh0vZQ203k45zQSZskWB_uWlVy7H3DJaNxIz13k4-rDit32AnfRUxhcWZO5faurgvQFY

Add a rule that will sample the less critical information, at a rate that will still power dashboards and alerting at the destination.

2kGC5sSUzHN9jCpI8G-5BbebCo4axkSROkMjXCDCwwQmIIzD-5IlUJVxYNHjGb78AwDZDBfJ8Fu_VNMiDVmQy9oNx-_E23XH0fh9OsrxTIogVAHplJRa0yIJcpYbU_uDGPjs9Qfgz6dLf8AarHOo2Ok

The sample rate of 3 was selected due to the high volume of metrics ingested from the source. This will output ⅓ of the metrics for node status where ‘MemoryPressure’ is ‘false’, indicating there is NOT memory pressure on the node memory. For this sample, a rate of 3 reduces the output of less critical metrics while still powering and alerts.

ProTIP: You can increase or decrease the sampling rate based on the confidence you have in the system performance and how closely it should be monitored. For example, if there is a cluster that has been performing well, and no changes have been made recently, you may feel comfortable increasing this sampling rate. However, if there have been changes made to the cluster, or there are performance issues to be monitored more closely, decreasing the sampling rate may be appropriate.

Step 6: Preview changes

Run the sample data model through the pipeline and see the Sampling function in action.

Sample Data -> Select File Name - > Run

Nkr_xt3WYDWRKCq4hheGse1o8ScCHvBdNpjivTBG8BlP-24Cw05LIWSRpf-ay0fIJx3TpBXke5o39bTSCzPnIQWU6sglWGRCF1zCQgLZHcrDezFrcX0LKHYabZU3Znq2qqzPUxYn0CEqapEamaU-i9Q

Take a closer look.

Preview -> Simple -> Pipeline Diagnostics

4quqcADZgKqv00pcp1JYBt6UodcxRfBUDOImEWcUOlt45244Hm6aV3BZ6xNyfCYNM9BI2vM75vjp-4lVJe__aIxdTtLtUVS4x4_DaK-a0tUNhMi1BQbQfHwpjJmwAz8sn4awrNBY7g0MSW5zoWab1WU

Step 7: Modify and Test the Sample Data Model

Further modify the output through your pipeline by adjusting the sampling rate, and/or adding additional filters to achieve the desired reduction of data, while not missing critical information.

jLsdgzKc9O4MwhjbcKBjtxmen71oVLgj3-bCTpqnclCifBjZS35Zd2A_g53kut8ui1uk1L7FxkqIgsUhVWdizZr7mJdTEQahCr4uyIZlezLN6JbnNiAMPUwotytn7yRCrqt3_L1wb3pfE5XJ6iSz7hI

Choose to leave the Filter Expression set to ‘True’ to allow all data to pass through, or add Filter Expressions to specify relevant events. In the case of node status conditions, you do not want to limit the output to only events that contain ‘kube_node_status_condition >= 0’, as you did with the sample data. However, you may want to add more rules to filter other types of metrics to achieve the best visibility into other conditions.

Example:

status== ‘false’ && condition== ‘DiskPressure’

status== ‘false’ && condition== ‘PIDPressure’

Step 8: Check Your Work!

Validate the output of your pipeline by capturing live data at the destination.

Destination -> Capture -> Show Internal Fields

Atq3kegezECzRxqZUXhR9lfBuaFB0l80vMgPh5gRSmkgaWp0Ym2-HcrSUYWoN-d50uzthNlxi0oc8lddafnDfHV6ubkJUpRFti-V0kDXrjbq2blGziKfNbaZsi_OIt8JHCvn5rv2IDFyOrRFNswUTyM

Wrap Up!

This same process can be used to optimize many other aspects of metrics related to Kubernetes, while still maintaining visibility into the system state as a whole. You can use the sampling technique for optimization purposes, testing purposes, etc. To learn more about Optimization techniques, check out the Cribl Guide to Tools Optimization.