Creating New Fields from _raw

Steve Bennett · January 14

I want to create new fields from values in _raw. See below. I want to create a series of folders based off the year, month, date and hour.

2025 (parent folder) - 01(subfolder) - 12(sub under 01) - 23(subfolder under 12).

_raw:"2025-01-12 23:59:56","Another Value","Dummy Data","","Something Really Cool"

I'm using Regex Extraction and I can see the selection in the preview as Group 1, Group 2, etc.

However, I can not convert these into individual fields. Any suggestions?

Jon Rust · January 14

In the Regex Extraction function, you can use named groups, which you can then refer to as fields. Example:

source: _raw
regex: ^(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2}) (?<hour>\d{2})

This will result in 4 new fields in your event.

Another option would be to parse the time, which should have happened automatically in this case. The _time field should have the epoch time in it. You can use that to "extract" year, month, day and hour as needed. In an Eval function you can use C.Time.strftime, for example:

Add field
mypath = C.Time.strftime(_time,"%Y/%m/%d/%H")

Steve Bennett · January 14

I fill like there is a next step. Because when I do Full Preview, I still don't see those fields or values. I can see them in the preview of the Regex Advance view but no where else.

Jon Rust · January 14

What happens if you use Simple Preview? Can you share screencaps?

Edit: full preview is usually an unneeded complexity

Steve Bennett · January 14

Wow… It's in Simple Preview but not Full….

Jon Rust · January 14

Full Preview takes everything into account. Source, routes, filters, pipelines, etc. It will require you to consider all the steps. It's not usually required, and will take a few extra steps. Simple preview is where you want to be in most cases.

Steve Bennett · January 14

John, You are awesome. So the pipeline can now pass these fields on the destination? In my case an S3 compatible bucket?

Jon Rust · January 14

Yep. If you're trying to set the path for the bucket, you may want to do this in the S3 destination configs.

Steve Bennett · January 14

So. I'm parsing DNS logs from Cisco Umbrella. Cisco keeps a copy in a managed s3 bucket but only for a rolling 30 days. I'm trying to preserve the folder structure as best as possible but I wanted to better segment it by adding an hour folder so I didn't have as much scrolling. I'm sending a copy to my Splunk Cloud instance and a copy to some on-premise S3 compatible storage for longer term storage.

UPDATE: I forgot to mention that when I tried to do this at the Destination, I got a bunch of unexpected token errors.

Jon Rust · January 14

can you share the partition expression you're using? And a sample event (post-processing) would be great too.

Steve Bennett · January 14

/${event_year}/${event_month}/${event_day}/${event_hour}/

Jon Rust · January 15

The key prefix field is not the path field. They serve different purposes. From the docs:

> Key prefix: Root directory to prepend to path before uploading. Enter either a constant, or a JS expression (enclosed in single quotes, double quotes, or backticks) that will be evaluated only at init time

The setting you have shown in the picture should be in the Partitioning expression, and also will ONLY work if you have event_year, event_month, etc fields available in the event. You'd be better off using _time as I showed above. Note the backticks are absolutely required.

`/${C.strftime(_time,"%Y/%m/%d/%H")}/`

I'd recommend reading this blog post from Cribler Ahmed Kira, too.

I have office hours available today at 08:30 PST, and Friday at 10:00 and 10:30 PST. Let me know if you'd like to get on a call to sort this out.

Creating New Fields from _raw

Comments

Categories