Looking into some dropped events when sending to one of my splunk sources
Hey team! Looking into some dropped events when sending to one of my splunk sources. Heavy throughput, sending a large number of events, but trying to figure out why some are missing. I caught an instance where I had the live data stream open so I saw the event come in, and it never made it through. No error logs in Cribl stating the destination was having issues, no error logs in Splunk stating it did something bad with the event. Running it through the full preview and it looks like it should come out as expected on the Splunk side. Trying to understand after those steps where I might even be able to start looking for a miniscule number of dropped events.
Answers
-
Usually if you can't find it, then the timestamp was not correct.
0 -
You mean sending to your Splunk Destinations right?
0 -
And all the obvious metadata was there when it was sent such as index, source, sourcetype?
0 -
I can prove the timestamp was correct. I have the raw event stored from live data capture in Cribl, and if I run it through the preview, everything is fine it seems.
0 -
Yessir! Top event here went through, second one did not. No resource contention, no error logs. (These are post-processing)
0 -
Hmm upon further investigation it seems that it's actually making it to Splunk, but the data that _should_be searchable since it's indexed (tkg_instance) is appearing but not searchable in Splunk. Seems like not a y'all problem! Sorry for raising false flags here.
0 -
Are you trying with the double colon syntax for indexed fields? ```tkg_instance::dc2-okc-dmz```
0 -
Like in splunk? I've never done anything but =.. let me go try that and see
0 -
Yes, in Splunk.
0 -
To google, but tomorrow!
0 -
It worked, but i guess i need to understand why. Thanks for the quick help (that was completely not something you needed to do and very appreciated).
0 -
What kind of black magic is this?
0 -
Index time fields in Splunk are not indexed. The only thing you can use for naked term search is data in `_raw`. Otherwise, you have to query the index fields themselves and that will only work I believe with exact term matches (no wildcards).
0 -
I know index time sounds like it's being indexed, but the terms in those fields do not make it in the index. The field values themselves are stored directly in the index. Note, this does make those fields available for `| tstats` which can give you very fast reporting on that data.
0 -
Yeah, while confusing I think I got that for the most part. The issue for me is why :: works in the exact same query where = drops some fields. Also how widespread this is and where else it's impacting our environment
0 -
See `fields.conf`
0 -
Same sourcetype, same logs with 3-4 character difference. One works and one doesn't. Just feels like the inconsistency makes me less sure than ever about what's going actually happening. Not that you guys have any obligation to know the answer or even help, just putting words on paper about how frustrating this specific situation is. I'll read into fields.conf but it's definitely bizarre that some times it works and sometimes it doesn't. Also no messages about search time extractions which is the only reason I'd think it would "fail" in some situations. Not everything is black in white, but I feel like I need to understand a lot more to assess if this issue exists anywhere else and is causing ghost pains/invisible inaccuracies.
0 -
This is a global setting in Splunk which specifies which index time fields should be seen as fields you can use the `=` syntax with. It is not scoped to applications, so be careful, it will create a field in every app.
0 -
Unfortunately it's not in there, I specifically drop it before reserializing since it exists on all fields. It wasn't necessary and took up screen space for our users trying to quickly scroll through. But we do need it to filter, so I passed it in outside of _raw which to my understanding was fine. This is the first time I've noticed it having issues. Could totally put it in _raw though if we think it would be more consistent since we could spath to explicitly define it or something :man-shrugging: I had only heard indexed fields caused higher storage costs at the benefit of speed for searching
0 -
If you're seeing it working with `=`, likely it's working because it sees the same field name in the JSON that's in `_raw` and you extracting those fields at search time.
0 -
Just thought I knew more than I did I guess and need to understand a little more before we continue down this route.
0