We have updated our Terms of Service, Code of Conduct, and Addendum.

Is there an easy way to analyze the memory usage of specific procs and pipelines?

Is there an easy way to analyze the memory usage of specific procs and pipelines? I have been chasing an issue with a few workers that are getting hit with GC / JVM OOMs . I suspect there might be some room for improvement in some of my pipelines, but wasn't sure if there were any metrics within debug logs that we can enable to investigate.

Answers

  • you can do cpu for specific pipelines but not memory. memory snapshots are currently at the process level.

  • Example, in my cluster, i've actually managed to cut down the process count in half, and double the allocated JVM memory (im impressed that my ingest/thruput has remained the same, and actually improved in some cases). But catching a few workers logging to stderr: ```[662762:0x4b96770] 5906506 ms: Scavenge (reduce) 4084.1 (4100.7) -> 4083.4 (4101.7) MB, 7.5 / 0.0 ms (average mu = 0.087, current mu = 0.031) allocation failure [662762:0x4b96770] 5906521 ms: Scavenge (reduce) 4084.1 (4100.7) -> 4083.3 (4101.9) MB, 14.8 / 0.0 ms (average mu = 0.087, current mu = 0.031) allocation failure```

  • thanks <@U012ZP93EER&gt; - yea i had a feeling. Probs need a JVM heap dump to really get into the weeds

  • even then it can be difficult to isolate the issue. Sometimes it's obvious to what type of issue it is but still wouldn't tell you within the context of the application where the issue is such as being able to tell you which pipeline that a low level function of NodeJS is executing within.

  • file lookups, aggregations, bad regexes in general, suppressions, and similar types can use up a lot of memory

  • yea makes sense. I'm going to take a peek at my two largest routes and their connected pipelines and just do a spot check