Does the S3 collector pull every file from the s3 bucket on each run?

Igor Gifrin · September 2023

Good morning all. Does the S3 Collector pull every file from the s3 bucket on each run? Or, does it have some state to tell what has/hasn't been pulled?

Igor Gifrin · September 2023

Looks like it grabs everything - but you can specify a prefix - and if you're lucky, your prefixes are timestamps: https://community.cribl.io/discussion/comment/158#Comment_158

Paul Dott · September 2023

Hi <@U02TBJ3P6CD> the S3 collector will pull every file based on the criteria you enter when running an adhoc collection or scheduling recurring collections. Are you looking for an option to "collect only new records since last job"?

Igor Gifrin · September 2023

Hey Ryan! We have a client looking to download Cisco Umbrella data from a Cisco-hosted S3 bucket, and to be honest, we're looking for reasons to tell them why it's a bad idea

Paul Dott · September 2023

Cribl S3 collector is certainly optimized for replaying data that you have partitioned and sent to S3 previously. That way you had full control of the prefix etc. With that said, there is still a lot of flexibility collecting and working with data that might not be ideally partitioned. But it takes a little trickery. Also Path Extractors should help. https://docs.cribl.io/stream/collectors-s3/#path-extractors

Franky Laarits · September 2023

we did an sqs notification whenever an addition was made to the s3 bucket, and that way only new data was pulled.

Raanan Dagan · September 2023

<@U02TBJ3P6CD> is there a way to pass the ' earliest ' and ' latest ' time as part of the bucket structure? Last time we tried to access it .. for example, looking at this: <s3://cisco-managed-us-west-1/2069997_6ff2802af17337def701c2e7816cf14913zf848a/> We created a Cribl S3 Collector with these flags: » Bucket name = cisco-managed-us-west-1 » Path = 2069997_6ff2802af17337def701c2e7816cf14913zf848a/ » Region = Same Region as the Cisco Managed » Authentication = manual ( Key / Secret ) » Verify bucket permissions = Turn it off

Does the S3 collector pull every file from the s3 bucket on each run?

Answers

Categories