We have updated our Terms of Service, Code of Conduct, and Addendum.

Sending JSON data to Splunk from Cribl causing.30:1

I'm sending JSON data to Splunk from cribl (Cloudflare logs) and it causes the data to be a horrendous .30:1 index size to raw size ratio I'm sending _raw as text. There are no props for that sourcetype on the indexer side so there is no indexed extractions. Any help appreciated

Answers

  • David Maislin
    David Maislin Posts: 230 mod

    May I see a screenshot of the event in the OUT of Cribl?

  • David Maislin
    David Maislin Posts: 230 mod

    As an object it is more, but Splunk will restringify it anyway.

  • SonOfBuzi
    SonOfBuzi Posts: 12

    Update with the help of <@U01C35EMQ01&gt; <@U020E25MRU1&gt; we found a bug in splunk that if you have in _raw :: splunk creates indexed fields of it

  • David Maislin
    David Maislin Posts: 230 mod

    And did Splunk support validate it too?

  • SonOfBuzi
    SonOfBuzi Posts: 12

    No response yet from splunk support but I've validated with a few other people it's happening in there environment too

  • David Maislin
    David Maislin Posts: 230 mod

    Did you try the escaped colons?

  • David Maislin
    David Maislin Posts: 230 mod

    To summarize, if there are ten pairs of `sometext::randomtext` in a single event in `_raw` then Splunk is creating 10 additional index fields. That sound about right <@U020VPXGT34&gt; ?

  • David Maislin
    David Maislin Posts: 230 mod

    And does the same behavior happen when events are not sent with Cribl?

  • SonOfBuzi
    SonOfBuzi Posts: 12

    From my testing so far didn't try escaping it but did try replacing it with a single colon and the issue didn't appear issue was observed when sending via HF UF crible

  • Louise Tang
    Louise Tang Posts: 13

    Very late here, but wanted to say this is seen in many proxy logs as several advertisers use double colons. Splunk treats treats "::" as a field/value separator and indexes the data as such. This is a situation where it would be awesome if tools like Cribl would flag and fix the data automatically for us.