We have updated our Terms of Service, Code of Conduct, and Addendum.

Looking for insight on the *Time Range* selection when running against an Amazon S3 bucket

Hi Cribl community. Could anyone provide some additional insight on the Time Range selection when running against an Amazon S3 bucket? The S3 bucket is full of .gz csv logs which are updated periodically (8 files per 10 minute period). See attached. Beginner's naivety suggested that setting a Relative time (-30m as an example) would only pull files in S3 last updated within that relative time. However, it pulls all files as if I had not set a time filter at all. Another thought is that this time range does not apply to the files themselves, but to the EVENT times contained within the files. In that case, Cribl would need to pull all files before it could filter on event time? https://docs.cribl.io/stream/collectors-schedule-run Thank you! (as an alternative, we could use the Amazon S3 specific collector which uses event notifications/SQS, but would like to at least understand the above)

Answers

  • David Maislin
    David Maislin Posts: 230 mod

    Jon, what does the Path look like to get to this folder?

  • Raanan Dagan
    Raanan Dagan Posts: 101 mod

    ```/s3_folder/dnslogs/YYYY-MM-DD/``` One folder per day

  • David Maislin
    David Maislin Posts: 230 mod

    So in your collector your Path is set to something like this? `/s3_folder/dnslogs/${_time:%Y}-${_time:%m}-${_time:%d}/`

  • Raanan Dagan
    Raanan Dagan Posts: 101 mod

    It is not...yet :slightly_smiling_face: Currently working on the best way to accomplish only collecting the most recent logs while learning cribl along the way. I suspect this is the correct thing to do. But what then does the time range do?

  • Raanan Dagan
    Raanan Dagan Posts: 101 mod

    I believe the path supplied in Cribl ended in dnslogs/

  • David Maislin
    David Maislin Posts: 230 mod

    IIRC, It will go to the correct folder based on your relative time and start collecting the events in that folder. If upon uncompressing the gz files it sees an event with a timestamp outside that range, then it will not collect that specific event.

  • Raanan Dagan
    Raanan Dagan Posts: 101 mod

    THAT makes more sense... which is why ALL files needed to be pulled down to be checked. Not that we need this, but is there a way to use tokens to only pull files based on last modified time?

  • David Maislin
    David Maislin Posts: 230 mod

    The Destination path to S3 bucket look like this?: ``/s3_folder/dnslogs/${C.Time.strftime(_time ? _time : Date.now()/1000, '%Y-%m-%d')}``?

  • Raanan Dagan
    Raanan Dagan Posts: 101 mod

    We haven't specified anything with tokens, etc. This is our first connection to S3 and just started using Cribl this month

  • David Maislin
    David Maislin Posts: 230 mod

    The examples above are not using tokens, just JavaScript expressions.

  • Raanan Dagan
    Raanan Dagan Posts: 101 mod

    We haven't used javascript for any paths.

  • David Maislin
    David Maislin Posts: 230 mod

    The Destination just writes out the event to the correct folder using the values abstracted from _time and the Collector does similar.

  • Raanan Dagan
    Raanan Dagan Posts: 101 mod

    Okay. We are only ingesting data from the S3 bucket. Do you know if there's a way to only pull in files based on last modified time?

  • Raanan Dagan
    Raanan Dagan Posts: 101 mod

    I didn't see anything like that

  • Raanan Dagan
    Raanan Dagan Posts: 101 mod

    Also wondering, assuming we configure cribl to only pull in the current day's logs, and then narrowing down to the current timeframe (hour, or 10 minute period), how we would go about ensuring that 1) duplicate events were not processed 2) events were not missed

  • Raanan Dagan
    Raanan Dagan Posts: 101 mod

    (I think this is where the Amazon S3 connector comes into play...using event notications/SQS instead of the standard S3 collector)

  • Raanan Dagan
    Raanan Dagan Posts: 101 mod

    Thanks for the assistance, David! Hopping off slack and will put in ticket for remaining Q's