Cribl S3 Collector collect All the log files from S3 bucket everytime instead only new ones
Hi Mates,
I'm using cribl S3 collector to collect the logs from AWS S3 bucket. S3 bucket contains logs of Akamai datastream which has the log format as below.
eg: s3://BUCKETNAME/APPNAME/ENV/ak-913478-1701139960-008071-ds.gz
ak- akamai file prefix
913478 - random string
170xxxxxxxx - EPOCH timestamp
008071 - random string
ds- file suffix
When I schedule the S3 collector to run every 15 minutes, it collects all the log files from buckets every time.
Looking for a suggestion, how do I collect only the new files from S3 not all the files every time.
Answers
-
Hi @Dinesh Raja, what does the epoch timestamp in the filename represent? When the file was created? The first event timestamp? If you can provide this info, we can help with a solution.
0 -
Hello @Brendan Dalpe
Thanks for the response. Yes, the epoch timestamp represents "file creation time" .0 -
@Dinesh Raja, you can use a path extractor in the Collector to set _time on the filename.
Something like this should work for you based on this path:
s3://BUCKETNAME/APPNAME/ENV/ak-913478-1701139960-008071-ds.gz
Path: /${appname}/${env}/${file}
Token: file
Extractor Expression: {"_time": value.match(/ak-\d+-(\d+)-\w+/)[1]}
This is making some assumptions that the file path is consistent, but should help get you going in the right direction. With _time set to the filename, the earliest/latest in your collection job has something to key off and won't need to grab all files. Hope this helps.0