We have updated our Terms of Service, Code of Conduct, and Addendum.

Error while reading file, malloc of size N failed

Jeremy Prescott
Jeremy Prescott Posts: 33 mod

I have a S3 Source configured, to collect Parquet Files, but I end up getting the following error:

{
  "time": "2023-03-31T00:00:00.000Z",
  "cid": "w1",
  "channel": "input:my_parquet_source",
  "level": "error",
  "message": "failed to process file",
  "file": "s3://super-awesome-s3bucket/folder/year=2023/month=03/day=30/hour=00/part-0abcde12-456f-78a9-b01c-de23f4a56b0c.c000.parquet",
  "receiveCount": 1,
  "messageId": "0abcde12-456f-78a9-b01c-de23f4a56b0c",
  "error": {
    "message": "Error while reading file.",
    "stack": "Error: malloc of size 65536 failed"
  }
}

How can I address this?

Answers

  • Martin Prado
    Martin Prado Posts: 27 mod
    edited April 24

    This issue occurs due to the parquet lib running into an issue with allocation the necessary memory. It does a mis-calculation on the size needed and fails. Under normal conditions we shouldn't run into these errors. There are other factors system wise that come into play. We recommend setting the following env var to switch to a different lib and restarting the Cribl service.

    With it set to "system", it will use Basis Malloc instead of Jemalloc. It needs to be set within cribl's environment or unit file for the crib.service configured in systemd. At a minimum it will provide more details about the memory pool errors but in most scenarios it addresses the errors.

    ARROW_DEFAULT_MEMORY_POOL="system"