I created a script source in a distributed environment that pulls past x mins of logs from an API. Does it run on multiple worker nodes? If yes, is there a way to make sure it only runs on one node at a time to avoid duplicates?
This depends on what is returned by the discover step. If multiple items are returned then there is the possibility that multiple tasks can run in parallel on different workers. If only a single item is returned, u will end up with a single task running on one worker.
See this for more info: https://docs.cribl.io/stream/collectors-script/#how-the-collector-pulls-data-1
I didn't actually put anything on the discover phase.. just echoing a string..
ok good, as long the string is 1 line then it should result in a single collect task and u should be gtg.