Error while reading file, malloc of size N failed
I have a S3 Source configured, to collect Parquet Files, but I end up getting the following error:
{ "time": "2023-03-31T00:00:00.000Z", "cid": "w1", "channel": "input:my_parquet_source", "level": "error", "message": "failed to process file", "file": "s3://super-awesome-s3bucket/folder/year=2023/month=03/day=30/hour=00/part-0abcde12-456f-78a9-b01c-de23f4a56b0c.c000.parquet", "receiveCount": 1, "messageId": "0abcde12-456f-78a9-b01c-de23f4a56b0c", "error": { "message": "Error while reading file.", "stack": "Error: malloc of size 65536 failed" } }
How can I address this?
Answers
-
This issue occurs due to the parquet lib running into an issue with allocation the necessary memory. It does a mis-calculation on the size needed and fails. Under normal conditions we shouldn't run into these errors. There are other factors system wise that come into play. We recommend setting the following env var to switch to a different lib and restarting the Cribl service.
With it set to "system", it will use Basis Malloc instead of Jemalloc. It needs to be set within cribl's environment or unit file for the crib.service configured in systemd. At a minimum it will provide more details about the memory pool errors but in most scenarios it addresses the errors.
ARROW_DEFAULT_MEMORY_POOL="system"
1