Script source in a distributed environment - Run once to avoid duplicates?

Jeremy Prescott · September 2023

I created a script source in a distributed environment that pulls past x mins of logs from an API. Does it run on multiple worker nodes? If yes, is there a way to make sure it only runs on one node at a time to avoid duplicates?

Harry Gardner · September 2023

This depends on what is returned by the discover step. If multiple items are returned then there is the possibility that multiple tasks can run in parallel on different workers. If only a single item is returned, u will end up with a single task running on one worker. See this for more info: https://docs.cribl.io/stream/collectors-script/#how-the-collector-pulls-data-1

Jeremy Prescott · September 2023

I didn't actually put anything on the discover phase.. just echoing a string..

Harry Gardner · September 2023

ok good, as long the string is 1 line then it should result in a single collect task and u should be gtg.

Script source in a distributed environment - Run once to avoid duplicates?

Answers

Categories