We have updated our Terms of Service, Code of Conduct, and Addendum.

Persistent Queue Question and Upstream Block Signals

I have a question about persistent queues. I thought their behavior was such that if you have a persistent queue, blocking signals would only be sent upstream to sources if that PQ is full. I seem to be seeing behavior that does not match that. I am seeing blocked TCP JSON destination that keeps flapping due to output is experiencing increased load errors. It has a PQ and the PQ is empty. This is causing syslog source and elasticsearch sources to drop.

What should we expect to see when a destination is blocking and it has a configured PQ. Should sources be blocking?

Best Answer

  • Clint Sharp
    Clint Sharp Posts: 27 mod
    Answer ✓

    Persistent queueing only engages if the destination is marked down. Blocked is not the same as down. Backpressure will happen normally until a destination is marked down, then PQ will engage and sources will be unblocked until the PQ becomes full or the destination is available again.

    Note, that we rolled out source side queueing in 3.4 which does not suffer from the same limitation. If you configure persistent queueing on the syslog and elasticsearch source, you should see PQ engaging even on backpressure from the destination.

Answers

  • Clint Sharp
    Clint Sharp Posts: 27 mod
    Answer ✓

    Persistent queueing only engages if the destination is marked down. Blocked is not the same as down. Backpressure will happen normally until a destination is marked down, then PQ will engage and sources will be unblocked until the PQ becomes full or the destination is available again.

    Note, that we rolled out source side queueing in 3.4 which does not suffer from the same limitation. If you configure persistent queueing on the syslog and elasticsearch source, you should see PQ engaging even on backpressure from the destination.

  • jccurtis
    jccurtis Posts: 3

    Thanks for the response Clint! We are upgrading next week specifically for this feature so I hope that helps the situation. What I did observe through destination logs in Cribl and packet captures, when it said output is experiencing increased load, that was due to TCP receive window exhaustion (bytes in flight = calculated window size, without an ACK from the receiver). When this would happen Stream would generate this backpressure log, and the destination would go into blocking mode, but the PQ would not engage. So, this all aligns with what you are saying, with the PQ not kicking in. I wonder why TCP congestion would not trigger the PQ to engage. It seems to me that this would be a good indicator to use it. Just out of curiosity really. Looking forward to 3.4.