Extracting values from a netscaler log
this is beyond my skillset, so was hoping someone could help me. I have netscaler logs that come in that are key<space>value and was hoping there was a regex to extract them all. Goes like this: ```Source 1.2.3.4:18356 - Vserver 4.5.6.7:389 - NatIP 8.9.0.1:18356 - Destination 6.6.6.6:389 -``` That's just one sample. The rest are sending in a similar format. Was looking for one regex to extract them and put them in fields.
Answers
-
hey <@ULBGHDPNY> - thanks for the help here. Was wondering if I could run another one by you. You seem to be the regex expert on this channel. I have a feed coming in that uses key-value events. The ones with strings are enclosed in quotes ("), which makes them useless in Spunk for the TERM command. I'd like to remove the quotes, if they exist from each key-value pair. Also, if possible, replace spaces in the value with underscores. Is that possible? Here's an examples: ```1.2.3.4 devname="DEVNAME" devid="AZEVTM21000028" eventtime=1681220885920164816 tz="-0400" logid="0000000013" type="traffic" subtype="forward" level="notice" vd="root" srcip=4.5.6.7 srcport=50239 srcintf="port2" srcintfrole="undefined" dstip=7.8.9.0 dstport=10000 dstintf="port5" dstintfrole="undefined" srccountry="Reserved" dstcountry="Reserved" sessionid=1316660221 proto=6 vrf=1 action="timeout" policyid=42 policytype="policy" poluuid="65eAAA79a-88Z1-51ec-87EGF8-61fd6d21915b" policyname="AWS-POLICHY-TO-MADEUP-Subnets" service="TCP-1000" trandisp="noop" duration=10 sentbyte=52 rcvdbyte=0 sentpkt=1 rcvdpkt=0 appcat="unscanned" testfield="SOME TEXT"```
0 -
wow. Thanks, so much!
0 -
No regex required for the first ask. Just use the Parser function, Operation mode in reserialize, Type K=V, source field _raw. Boom. Quotes gone
0 -
To combine the second ask, change the Parser to extract, saving to a new field, call it `parsed`. Then use Mask function to clean up spaces. Then Serialize function to change back to K=V. Finally, and Eval to drop the `parsed` field. ```{ "conf": { "output": "default", "streamtags": [], "groups": {}, "asyncFuncTimeout": 1000, "functions": [ { "filter": "true", "conf": { "mode": "extract", "type": "kvp", "srcField": "raw", "cleanFields": false, "allowedKeyChars": [], "allowedValueChars": [], "dstField": "parsed" }, "id": "serde" }, { "filter": "true", "conf": { "rules": [ { "matchRegex": "/\s+/g", "replaceExpr": "''" } ], "fields": [ "parsed" ], "depth": 5 }, "id": "mask" }, { "filter": "true", "conf": { "type": "kvp", "fields": [ "" ], "dstField": "_raw", "cleanFields": false, "srcField": "parsed" }, "id": "serialize" }, { "filter": "true", "conf": { "remove": [ "parsed" ] }, "id": "eval" } ] }, "id": "scottB" }```
0 -
awesome. I'll try this out. Thanks so much.
0 -
result sample:
0 -
did the first part. Amazing. Love this thing!
0 -
```{ "conf": { "output": "default", "streamtags": [], "groups": {}, "asyncFuncTimeout": 1000, "functions": [ { "filter": "true", "conf": { "mode": "extract", "type": "kvp", "srcField": "raw", "cleanFields": false, "allowedKeyChars": [], "allowedValueChars": [], "fieldFilterExpr": "value != null && value != 'undefined'", "dstField": "parsed" }, "id": "serde" }, { "filter": "true", "conf": { "srcField": "parsed.eventtime", "dstField": "_time", "defaultTimezone": "local", "timeExpression": "time.getTime() / 1000", "offset": 0, "maxLen": 150, "defaultTime": "now", "latestDateAllowed": "+1week", "earliestDateAllowed": "-420weeks" }, "id": "auto_timestamp" }, { "filter": "true", "conf": { "rules": [ { "matchRegex": "/\s+/g", "replaceExpr": "''" } ], "fields": [ "parsed" ], "depth": 5 }, "id": "mask" }, { "filter": "true", "conf": { "type": "kvp", "fields": [ "!eventtime", "!tz", "" ], "dstField": "_raw", "cleanFields": false, "srcField": "parsed" }, "id": "serialize" }, { "filter": "true", "conf": { "remove": [ "parsed" ] }, "id": "eval" } ] }, "id": "scottB" }```
0 -
(tz doesn't make sense in the context of epoch time -style timestamps)
0 -
^^^ my simple clean up recs » Ditch fields that are empty or 'undefined' » use eventtime for _time, but then drop it and the pointless tz field
0 -
yeah, I'm debating dropping the time. Need to run it by the analysts. They'll probably want to keep it until they feel more comfortable.
0 -
you're keeping it :slightly_smiling_face: in _time
0 -
Didn't realize just how many undefined values were in this feed. Dropped. Gotta love it.
0 -
quotes, undefined fields, and the time field, i'd guess you whack 20%+ from the overall volume
0 -
hey <@ULBGHDPNY> - sorry to keep bugging you. I have one last one,, if that''s ok. I have a key<space>value feed and I want to turn it into a key=value format. I tried what you mentioned above and a few other things, but that didn't work. Any suggestions?
0 -
sure, give me a few minutes to wrap up a call
0 -
sure
0 -
can you provide a sample event with this format?
0 -
```Apr 11 17:29:09 host1 shd_logs_bdc1nx: Status: CPULd 3.4 DskUtil 7.4 RAMUtil 17.7 Reqs 171 Band 82695 Latency 68 CacheHit 7 CliConn 19821 SrvConn 20379 MemBuf 98 SwpPgOut 54243 ProxLd 31 Wbrs_WucLd 0.0 LogLd 0.0 RptLd 0.0 WebrootLd 1.1 SophosLd 15.7 McafeeLd 0.0 WTTLd 0.0 Apr 11 17:29:03 host2 shd_logs_bdc3nx: Status: CPULd 4.9 DskUtil 8.0 RAMUtil 18.4 Reqs 184 Band 401701 Latency 147 CacheHit 4 CliConn 19516 SrvConn 20169 MemBuf 98 SwpPgOut 56092 ProxLd 45 Wbrs_WucLd 0.0 LogLd 0.0 RptLd 0.0 WebrootLd 0.0 SophosLd 19.2 McafeeLd 0.0 WTTLd 0.0 Apr 11 17:29:00 host3 shd_logs_ndc2nx: Status: CPULd 1.3 DskUtil 5.0 RAMUtil 14.6 Reqs 0 Band 0 Latency 4 CacheHit 0 CliConn 7 SrvConn 10 MemBuf 63 SwpPgOut 0 ProxLd 0 Wbrs_WucLd 0.0 LogLd 0.0 RptLd 0.0 WebrootLd 0.0 SophosLd 0.0 McafeeLd 0.0 WTTLd 0.0```
0 -
My take on it: » Extract the part that has the KV pairs into `payload` » Use regex extract on `payload` with KEY_0 and VALUE_0 shenanigans. » Use Serialize to push those extracted fields back into raw as K=V » use eval to clean up the mess ```{ "conf": { "output": "default", "streamtags": [], "groups": {}, "asyncFuncTimeout": 1000, "functions": [ { "filter": "true", "conf": { "source": "_raw", "iterations": 100, "overwrite": false, "regex": "/Status: (?<payload>.*)/" }, "id": "regex_extract" }, { "filter": "true", "conf": { "source": "payload", "iterations": 100, "overwrite": false, "regex": "/(?<_KEY_0>\S+)\s+(?<_VALUE_0>\S+)/" }, "id": "regex_extract" }, { "filter": "true", "conf": { "type": "kvp", "fields": [ "!", "!cribl_breaker", "!host", "!source", "!payload", "!index", "" ], "dstField": "_raw", "cleanFields": false }, "id": "serialize" }, { "filter": "true", "conf": { "keep": [ "_raw", "_time", "source", "index" ], "remove": [ "*" ] }, "id": "eval" } ] }, "id": "scottb2" }```
0 -
thanks. Perfect. I can learn a lot from this.
0