We have updated our Terms of Service, Code of Conduct, and Addendum.

Why is my parsed data different from my original _raw data?

Brian Yearwood
Brian Yearwood Posts: 14 ✭✭
edited October 2023 in Stream

When I use a parser function to parse my json _raw field, I see that my original data changes and it breaks my dashboards:

Why does it change?

Best Answer

  • Robbert Hink
    Robbert Hink Posts: 17
    edited October 2023 Answer ✓

    This is because the parser function uses Javascript to parse the string into JSON. The field has a numerical value and is recognized as a number, but due to a known limitation with numbers being bigger than the Number.MAX_SAFE_INTEGER, a constant of about 9 quadrillion (precisely {2^53}‑1), we round the number. We briefly mention this in our documentation here.

    From your example we see field report_id parsed from _raw with a value of 638325393026460898. This value is too big as it is bigger than the value of 9007199254740991, so it is being rounded.

    In this case the field actually is an ID field, but the values are numerical, whereas typically they should be enclosed with quotes to make them string values. If there is a possibility to change the data before sending it to Cribl and making sure these fields are sent in as strings, that would be best. This saves on compute resources as Cribl would otherwise have to spend compute on checking all your original data and making sure that these fields are not big integers in your identifying fields.

    If there is no way of changing the original data coming in, a workaround would be to use a Masking function. This example uses fields ending on _id and encloses any values found in quotes. Once again, this is not the preferred method, considering the additional overhead this may introduce:

Answers

  • Robbert Hink
    Robbert Hink Posts: 17
    edited October 2023 Answer ✓

    This is because the parser function uses Javascript to parse the string into JSON. The field has a numerical value and is recognized as a number, but due to a known limitation with numbers being bigger than the Number.MAX_SAFE_INTEGER, a constant of about 9 quadrillion (precisely {2^53}‑1), we round the number. We briefly mention this in our documentation here.

    From your example we see field report_id parsed from _raw with a value of 638325393026460898. This value is too big as it is bigger than the value of 9007199254740991, so it is being rounded.

    In this case the field actually is an ID field, but the values are numerical, whereas typically they should be enclosed with quotes to make them string values. If there is a possibility to change the data before sending it to Cribl and making sure these fields are sent in as strings, that would be best. This saves on compute resources as Cribl would otherwise have to spend compute on checking all your original data and making sure that these fields are not big integers in your identifying fields.

    If there is no way of changing the original data coming in, a workaround would be to use a Masking function. This example uses fields ending on _id and encloses any values found in quotes. Once again, this is not the preferred method, considering the additional overhead this may introduce: