How to Ignore Timestamps in S3 Collector?
For the S3 collector if I have events with a _time field from 3/1/2022 in a file created today, so it is in a path of /dir1/<today’s_date>/dir2/. Will the earliest/latest values filter those events out if I select -24h? I do not want to filter those out, but do want to be able to collect older events that get put into newly created paths.
Answers
-
Add a filter, so it only applies to the directories of interest (e.g. source.startsWith(‘dir1) && source.includes(‘dir2).
Change the timestamp portion in the Event Breaker so that timestamp is NOT extracted.
Instead leave the timestamp processing to a pipeline.
To force the no time extraction in the event breaker, scan with a depth of 20 -
Id recommend changing the way the files are written to use the event date and time, not the current date and time, in the file path. Unless the event time is in the path, there is no way to use the timestamp constraints in the collector. Youd have to instead rely on pattern matching inside the event which is much more costly performance wise.
0