We have updated our Terms of Service, Code of Conduct, and Addendum.

Is there any way to force a pipeline to ONLY run on a single worker?

Abby Strong
Abby Strong Posts: 12 mod
edited September 2023 in General Discussions

I think I know the answer to this, but I need to ask anyway.I have an aggregation that MUST run over the entire event stream.Is there any way to force a pipeline to ONLY run on a single worker.

Answers

  • Brandon McCombs
    Brandon McCombs Posts: 150 mod
    edited September 2023

    No. All workers can process any input and all routes and pipelines. Only way to force it is to ensure the data arrives over a single connection all the time.

    And doing that , a single connection, is risky.

  • Louise Tang
    Louise Tang Posts: 13

    can you use a single worker group with only a single worker as a member?

  • Raanan Dagan
    Raanan Dagan Posts: 101 mod

    In addition to the above, few more ideas Edge - If .. if you have all of your events coming from a single server .. You may be able to do such a thing using Cribl Edge .. Redis - Put all of the events into Redis and run the aggregation based on the output from Redis

  • Jon Rust
    Jon Rust Posts: 455 mod

    building the mousetrap(tm): Multi-tiered Stream. First level does most of the work. Second level for the target agg stream, single process so everything is on one pid. I don't recommend it, but it sure is something

  • Abby Strong
    Abby Strong Posts: 12 mod

    Cheers. The input is a script getting data from redis. A single redis item for unique person + skill Need to aggr values(skill) by person. Output MUST contain the complete list. Sounds like the ONLY guaranteed way to get this to work properly is to have a an instance with a single worker. This is not really possible or scalable as Cribl is doing wore than just a single option. I guess I need to think more on whether Cribl is the right tool for this.

  • What do you mean by "doing more than just a single option "?

  • Abby Strong
    Abby Strong Posts: 12 mod
    edited September 2023

    Sorry. Cribl is doing a lot more than just this single process flow.Multiple other inputs/pipelines, etc.

    At the moment, there is a python script to query MS Dynamics 365 and produce a csv lookup. Was looking to move to a Cribl pipe and a redis lookup.

  • Ok. That's what I assumed you meant but wanted to confirm. My proposal would be then to have something similar to what weeb mentioned but as a separate worker node dedicated to this redis script collection. It can forward to the the existing worker nodes if necessary, or not. It would be in a group by itself. This is why groups exist so that each group has a separate configuration.

  • Abby Strong
    Abby Strong Posts: 12 mod

    Thanks Brandon. Unfortunately we only have the capacity for a single VM, and I need to ensure the solution is simple. Having multiple nodes, or even edge + stream results in a level of complexity that wont be maintainable for this environment. I will have to look at how we can move forward, or whether remaining with the current python script is the better option.

  • Louise Tang
    Louise Tang Posts: 13

    single instance installation is definitely possible (non-distributed mode)

  • Abby Strong
    Abby Strong Posts: 12 mod

    Hi weeb. Unfortunately, single instance does not equal single worker thread.

  • Louise Tang
    Louise Tang Posts: 13

    true unless you size it that way- which may impact performance

  • Abby Strong
    Abby Strong Posts: 12 mod
    edited September 2023

    On a side note, I might have a fairly brutal work-around.If I change to input from a collector using redis-cli, to a rest api call directly on the Dynamics Dataverse, I can force no event breaking at ingest time.If I parse and unroll, this may (yet to test) force the processing into a single worker thread.

    Testing this idea now.

  • Louise Tang
    Louise Tang Posts: 13
    edited September 2023

    … i think at this point we may all be curious as to why you want to do this?

  • Abby Strong
    Abby Strong Posts: 12 mod

    This is for a non-profit volunteer emergency service. We have Dynamics 365 which has various entities (volunteer, qualification, qual_type). Each user has a phone number, and a slack_id. We have a phone that you can prank call, or slack commands to indicate response to an event. Pipe 1 - slack_command -> lookup slack_id for name+skills -> post to slack Pipe 2 - phone call (cli) - -> lookup phone number for name+skills -> post to slack ie, i ring in or use slack command, there is a post with my skills (Crew Leader, Road Crash Rescue, Swift Water Rescue, etc) At the moment the lookup is csv based and maintained by a stand-alone python script run by cron Trying to get the flow into Cribl so that others can maintain it if something changes. The data coming out of Dynamics (or redis) is a single entry per skill/member combination. So if I look at myself, there are around 15 events, one for each skill. The post into slack MUST be a single line containing ALL skills for the member. eg. "4wd CL CM2 COX CS EVS LBSWR Map RCR SWAHS-H Truck"