We have updated our Terms of Service, Code of Conduct, and Addendum.

Filesystem Collector and Event Breaker Inconsistencies

amiller
amiller Posts: 21
edited September 2023 in Stream

I have created an event breaker rule that works in the knowledge area of cribl stream. But when I run a filesystem collection job to pick up the same file I used to create the event breaker, it does not work.

Event breaker rules in

Best Answer

  • amiller
    amiller Posts: 21
    Answer ✓

    I "solved" the issue. This may need to become an engineering ticket.

    I looked back at the file encoding…
    sample.tsv: UTF-8 Unicode (with BOM) text, with CRLF line terminators

    There may be an issue with the filesystem collector interpreting the byte order mark?

    I used notepad++ to remove the BOM and encoded it as just UTF8 and it worked.
    This is a band-aid fix for me, as converting the encoding of 50-100GB of files each day prior to ingest is not particularly scalable or effective.

    Thanks Dan for the assist!

Answers

  • Daniel Jordan
    Daniel Jordan Posts: 29 mod

    Andrew,
    I have a feeling your header line regex is matching the first #separator line. I can test it out but I think maybe you would want to change your header line to ^#[Ff] so we ignore the lines before the fields line.

  • Jon Rust
    Jon Rust Posts: 443 mod

    Can you validate youve committed and deployed?

  • Daniel Jordan
    Daniel Jordan Posts: 29 mod

    Yeah sorry I think the header line is actually excluding everything that starts with #. What do your first few events look like on the file import?

  • Daniel Jordan
    Daniel Jordan Posts: 29 mod

    Sorry, I mean what do the first few events look like in Stream when you run the job? You had a screenshot above but it started at event 7, just wondering how the first few events look.

  • amiller
    amiller Posts: 21

    The breaker works for me in the preview, but not when I run an actual collection.

    sample.tsv: UTF-8 Unicode (with BOM) text, with CRLF line terminators

  • amiller
    amiller Posts: 21

    It is being pulled from an NFS mount (Synology NAS) on a linux (ubuntu 20.04) vm.

  • Daniel Jordan
    Daniel Jordan Posts: 29 mod

    Which is strange since it is showing you are hitting the tsv-bro breaker.

  • Daniel Jordan
    Daniel Jordan Posts: 29 mod

    How big is the file you are collecting?

  • amiller
    amiller Posts: 21

    The filetype is just a variable on the path for a filesystem. I have about 15 different "filetypes"… Im hitting the right breaker as seen in my collection. Im on cribl 3.4.1 for both leader and worker.

  • amiller
    amiller Posts: 21

    Its 10,000 line sample, but when the file actually comes in its about 10-50 GB

  • Daniel Jordan
    Daniel Jordan Posts: 29 mod

    Can you try bumping your max event size to the max 134217728 (128MB). How big are your working IIS logs?

  • Daniel Jordan
    Daniel Jordan Posts: 29 mod

    You could also try to bump up the event breaker buffer timeout on the file system collector.

  • amiller
    amiller Posts: 21

    The IIS logs would have been about the same size, 10,000 line sample. Each event about the same size. Those whole files are anywhere from 100KB to 2-3GB. Any recommendation on the buffer timeout?

    I will recreate the whole event breaker tomorrow using the same config and max size you recommend, and test again in case I flubbed something else up somewhere. Ill let you know!

  • amiller
    amiller Posts: 21

    Im still having the issue. This was my order of operations to fix/recreate issue.

    Increased event breaker buffer timeout on collector source to 600000, commit, deploy.
    Unsuccessful.
    Delete event breaker, commit, deploy.
    Recreate event breaker with settings you recommended to try, commit, deploy.
    Filesystem collection has same results/symptoms.
    I connected to the worker ui from the leader, and verified the breaker exists on the worker. The file preview with the breaker on the worker works.

    I checked logs for the specific adhoc run and I only see 1 error.

    {
    "time": "2022-05-04T13:29:35.668Z",
    "cid": "api",
    "channel": "Job",
    "level": "error",
    "message": "failed to cancel task",
    "jobId": "1651670972.72.adhoc.NDCA-collector",
    "taskId": "collect.0",
    "reason": {
    "message": "Instance 1651670972.72.adhoc.NDCA-collector|collect.0 not registered",
    "stack": "RpcInstanceNotFoundError: Instance 1651670972.72.adhoc.NDCA-collector|collect.0 not registered\n at /opt/cribl/bin/cribl.js:14:13169102\n at /opt/cribl/bin/cribl.js:14:11427356\n at runMicrotasks ()\n at processTicksAndRejections (internal/process/task_queues.js:95:5)\n at async k.handleRequest (/opt/cribl/bin/cribl.js:14:13168338)",
    "name": "RpcInstanceNotFoundError",
    "req": {
    "instanceId": "1651670972.72.adhoc.NDCA-collector|collect.0",
    "method": "cancel",
    "args": []
    }
    },
    "source": "/opt/cribl/state/jobs/default/1651670972.72.adhoc.NDCA-collector/logs/job/job.log"
    }

  • Daniel Jordan
    Daniel Jordan Posts: 29 mod

    That just looks like the worker was restarting. Have you tried collecting a small file ~20 events?

  • amiller
    amiller Posts: 21

    Yeah, the error looked benign, but just providing info.

    I edited the file down to about 20 lines, changed EOL to LF from CRLF. Same symptoms.

  • amiller
    amiller Posts: 21
    Answer ✓

    I "solved" the issue. This may need to become an engineering ticket.

    I looked back at the file encoding…
    sample.tsv: UTF-8 Unicode (with BOM) text, with CRLF line terminators

    There may be an issue with the filesystem collector interpreting the byte order mark?

    I used notepad++ to remove the BOM and encoded it as just UTF8 and it worked.
    This is a band-aid fix for me, as converting the encoding of 50-100GB of files each day prior to ingest is not particularly scalable or effective.

    Thanks Dan for the assist!

  • Daniel Jordan
    Daniel Jordan Posts: 29 mod

    Yes encoding has hit me a few times. My next suggestion was to run head on your file and see if you had any extra stuff at the beginning. I will add this to a feature request I already had in for supporting additional encoding on file system collector.

  • amiller
    amiller Posts: 21
    edited July 2023

    Filesystem collection results (events do not have correct fields/values)

  • amiller
    amiller Posts: 21
    edited July 2023

    Event breaker rules out

  • Daniel Jordan
    Daniel Jordan Posts: 29 mod
    edited July 2023

    UTF-8 should be fine.

    This is using the breaker I posted above and pulling with a file system collector.

    The only difference is my filter. When I add a ‘filetype field on the collector and use bro with your filter it breaks it. Where are you adding filetype?

  • Daniel Jordan
    Daniel Jordan Posts: 29 mod
    edited July 2023

    I ran what you sent above through a collector with what I think is the same breaker as you and it worked.

    The next thing I would check is what type of encoding you have on your file, is this pulling from a linux machine? If so run a file testfile.tsv on your test file.

  • amiller
    amiller Posts: 21
    edited July 2023

    Sorry, the first few events are the commented rows from the log. Exactly as they appear in the log.

  • amiller
    amiller Posts: 21
    edited July 2023

    I want it to exclude the # lines since those are not events. The first real events are tab separated and the field names are in that field list. I tried changing the header line to "^#[Ff]" the event breaker preview completely fails.

    Heres the first few lines of the file. With my original settings, the import looks fine and field/value pairs look good. But when run with a filesystem collector, it fails. Im going to try with the header line changes that you recommended.

    #separator \x09#set_separator  ,#empty_field    (empty)#unset_field    -#path   conn#open   2021-11-12-12-45-00#fields ts  uid id.orig_h   id.orig_p   id.resp_h   id.resp_p   id.vlan id.vlan_inner   proto   service duration    orig_bytes  resp_bytes  conn_state  local_orig  local_resp  missed_bytes    history orig_pkts   orig_ip_bytes   resp_pkts   resp_ip_bytes   tunnel_parents  orig_cc resp_cc suri_ids    community_id#types  time    string  addr    port    addr    port    int int enum    string  interval    count   count   string  bool    bool    count   string  count   count   count   count   set[string] string  string  set[string] string2022-05-01 00:00:00.000012  CUaMDI3N3CtEwGXbX9  128.83.27.4 46210   170.114.10.87   443 4020    \N  tcp \N  65.207075   0   6218    SHR 1   0   0   ^hdf    0   0   9   6590    \N  US  US  \N  1:1nbEONdQpmuQtjlL3SSQbc28Wyo=2022-05-01 00:00:00.000320  CAZzJv4QRVv5Yek7Oh  128.83.130.204  54935   58.247.212.36   53  4020    \N  udp dns \N  \N  \N  SHR 1   0   0   ^d  0   0   1   156 \N  US  CN  \N  1:KJjQRZuB5bkT7+ebSf4FW7RJiL8=2022-05-01 00:00:00.000432  CdRza81SzhESDDyhI9  128.83.72.175   58632   192.111.4.106   443 4020    \N  tcp ssl 376.280685  1458    6534    S1  1   0   0   ShDd    3   1590    7   6826    \N  US  US  \N  1:ZqDFOlfGk/8wlEO1gmawxhE6YBg=2022-05-01 00:00:00.001140  CAcMyE40njQ2DatMNc  128.83.28.30    59755   205.251.197.3   53  4020    \N  udp dns \N  \N  \N  S0  1   0   0   D   1   140 0   0   \N  US  US  \N  1:SeSWa3fEVB/I60glsRug0PmDPys=
    

  • amiller
    amiller Posts: 21
    edited July 2023

    Dan, that does make sense, Ill reconfigure that one and reply back. I have an example where that would be inconsistent if thats the case.

    This event breaker works, collects, and extracts correctly. Event though the first few lines match ^# as well.

    Jon, yes I have. On multiple occasions, for each attempt. I had restarted the worker too just in case.