Why is my parsed data different from my original _raw data?
Best Answer
-
This is because the parser function uses Javascript to parse the string into JSON. The field has a numerical value and is recognized as a number, but due to a known limitation with numbers being bigger than the
Number.MAX_SAFE_INTEGER
, a constant of about 9 quadrillion (precisely{2^53}‑1
), we round the number. We briefly mention this in our documentation here.From your example we see field
report_id
parsed from_raw
with a value of638325393026460898
. This value is too big as it is bigger than the value of9007199254740991
, so it is being rounded.In this case the field actually is an ID field, but the values are numerical, whereas typically they should be enclosed with quotes to make them string values. If there is a possibility to change the data before sending it to Cribl and making sure these fields are sent in as strings, that would be best. This saves on compute resources as Cribl would otherwise have to spend compute on checking all your original data and making sure that these fields are not big integers in your identifying fields.
If there is no way of changing the original data coming in, a workaround would be to use a Masking function. This example uses fields ending on
_id
and encloses any values found in quotes. Once again, this is not the preferred method, considering the additional overhead this may introduce:0
Answers
-
This is because the parser function uses Javascript to parse the string into JSON. The field has a numerical value and is recognized as a number, but due to a known limitation with numbers being bigger than the
Number.MAX_SAFE_INTEGER
, a constant of about 9 quadrillion (precisely{2^53}‑1
), we round the number. We briefly mention this in our documentation here.From your example we see field
report_id
parsed from_raw
with a value of638325393026460898
. This value is too big as it is bigger than the value of9007199254740991
, so it is being rounded.In this case the field actually is an ID field, but the values are numerical, whereas typically they should be enclosed with quotes to make them string values. If there is a possibility to change the data before sending it to Cribl and making sure these fields are sent in as strings, that would be best. This saves on compute resources as Cribl would otherwise have to spend compute on checking all your original data and making sure that these fields are not big integers in your identifying fields.
If there is no way of changing the original data coming in, a workaround would be to use a Masking function. This example uses fields ending on
_id
and encloses any values found in quotes. Once again, this is not the preferred method, considering the additional overhead this may introduce:0