Back to blog

Webhooks at Scale: Lessons from Delivering Millions of Events

webhooksengineeringreliabilityscale

Sending a webhook is straightforward. Sending millions reliably, quickly, and without dropping any is a different kind of problem. The patterns that work fine for the first hundred consumer endpoints start to break down long before you reach the millionth event, and the failure modes shift from individual delivery errors to systemic ones where a single bad endpoint can starve every other delivery in flight.

This post covers the architectural decisions that matter at scale: how events flow through the system, how to keep failures contained, how to manage connections and concurrency, and what to monitor once there’s too much traffic to read individual logs.

Decouple Event Creation from Delivery

The most common starting point is also the one that breaks first. An event happens in the application, the application makes an HTTP request to the consumer, the application waits for the response before continuing.

This works for a handful of consumers. Past that, the application thread is coupled to the slowest consumer in the list. A consumer that takes ten seconds to respond stalls the application for ten seconds. A consumer whose domain stops resolving causes timeouts that cascade through the rest of the code. There’s no retry, no isolation, no backpressure.

The fix is to separate event creation from delivery. Write the event to a durable store the moment it occurs (a database row, a queue entry, often both) and return success to the application immediately. A separate worker pool reads from that store and makes the actual HTTP requests.

event occurs -> persist to event store -> return success to app
                       |
                       v
              delivery workers pick up events
                       |
                       v
              deliver to each consumer independently

Once the event store is in the path, three things change. The application is no longer blocked by a slow consumer. Events survive a worker crash because they were persisted before delivery was attempted. Workers can scale horizontally without touching the application code.

This is the foundation everything else sits on. Get it wrong and no amount of clever retry logic upstream will save you.

Per-Consumer Queues

A single shared delivery queue is easy to build and a problem at scale. One slow consumer can starve every other delivery. Twenty workers pulling from a shared queue can all end up blocked on requests to the same slow endpoint within seconds, while consumers responding in 50 ms wait their turn.

Per-consumer queues fix this. Each consumer endpoint gets its own queue, its own concurrency budget, and its own rate limit. A misbehaving endpoint can only ruin its own queue. Pausing or replaying for one customer becomes a lookup rather than a full-table scan.

The trade-off is operational complexity. You’re now managing potentially thousands of queues, balancing workers across them fairly, and avoiding the opposite failure mode where small queues never get any worker attention. The isolation is worth the complexity at any meaningful scale.

If the contract requires strict ordering per entity, where every event for order_123 arrives in sequence, partition further by entity ID inside the per-consumer queue. You give up parallelism for that entity, but the rest of the queue keeps moving. For most webhook use cases ordering isn’t actually required: consumers can sort by timestamp or sequence number on receipt. Don’t pay the throughput cost unless your contract demands it.

Connection Management

Every webhook crosses a DNS lookup, a TCP connection, and a TLS handshake. Doing that fresh on every request adds 50 to 100 ms of overhead before the consumer’s server has even seen the payload.

Pooling connections per consumer endpoint flips that calculation. Most HTTP clients support keep-alive out of the box. The harder part is sizing the pool. Too small and requests serialize behind a few sockets. Too large and idle connections start getting reaped by the consumer’s load balancer, and the reaping itself can surface as errors on the sender side. Start with a small pool per endpoint, watch wait times, grow as needed.

Concurrency limits per consumer matter just as much. A burst that turns into 500 simultaneous requests against a single endpoint will knock most consumers over even if the endpoint is otherwise healthy. Five to ten in-flight requests is a reasonable default. Make it configurable so customers can request more once they’ve sized their own infrastructure for it.

Set timeouts that are aggressive enough to matter:

  • Connection timeout around 5 seconds. If TCP doesn’t establish in 5 seconds, the host is unreachable for practical purposes.
  • Response timeout in the 15 to 30 second range, depending on what consumers actually need.
  • A hard cap on the entire attempt. A worker stuck on a single 60-second response is a worker not delivering anything else.

Fan-Out

When a single event needs to land on a thousand consumer endpoints, sequential delivery isn’t viable. At 200 ms per request, that’s past three minutes of latency for the last consumer in the list, and that’s only if every delivery succeeds on the first try.

Parallel delivery is the obvious answer, with the per-consumer concurrency limits described above. The less obvious problems show up around the edges:

  • Don’t enqueue all 1,000 delivery jobs in the same instant. Batch the fan-out so the queue subsystem doesn’t choke on the burst.
  • Some consumers care more about latency than others. Internal fan-out (a billing system that needs the event in milliseconds) and external fan-out (an end customer that can wait a few seconds) usually want different priority lanes.
  • Track tail latency, not average. Average looks fine when 999 of 1,000 deliveries took 100 ms and one took two minutes; the customer waiting on the slow one doesn’t care about your average.

Backpressure

What happens when events arrive faster than they can be delivered? In a well-designed system the queue grows for a while and workers catch up. In a poorly designed one the queue grows until memory runs out and the pipeline fails.

Three mechanisms work well together. Hard limits on queue depth, with sensible behavior when the limit is hit: slow the producer, drop low-priority events, or reject new ones outright depending on what consumers can tolerate. Adaptive delivery rates that respond to consumer health, since queueing more requests at a consumer that’s already responding slowly only makes the recovery longer. Circuit breakers per endpoint that stop delivery when an endpoint has failed enough times in a row, with periodic probes to determine when to resume.

The breaker logic itself is straightforward: closed means normal delivery, open means stop, half-open means try one and see what happens. The exact numbers (failure threshold, open duration, probe cadence) need tuning per system, but the shape is the same everywhere.

Failure Isolation

At small scale, a single misbehaving consumer is annoying. At large scale, it’s an existential threat to every other delivery in the system.

A poison pill is the classic version. An event that consistently fails at a specific consumer, perhaps because the handler crashes on a particular payload shape, will keep getting retried forever if nothing stops it. After a configurable number of attempts, move it to a dead letter queue and stop pretending it’s going to succeed. Replay from the DLQ once the underlying issue is fixed.

Resource isolation matters too. Separate worker pools for high-volume and low-volume customers prevent a single big customer from monopolizing infrastructure during a spike. Per-consumer rate limits prevent any single endpoint from consuming disproportionate worker time. None of this is glamorous, and it’s also what stops a bad day for one customer from becoming a bad day for every other customer.

If the platform is multi-tenant, isolation extends another layer up. Tenant A’s traffic spike must not affect tenant B’s delivery latency. That’s a planning problem (capacity headroom per tenant) as much as a runtime one (per-tenant rate caps and worker quotas).

Monitoring

At a few thousand events per day, logs are enough. At a few million they aren’t, and the question shifts from “what does this log line say” to “is this minute different from the last minute.”

Four categories of monitoring stop being optional at scale:

  • Per-consumer success rate, latency, retry rate, and queue depth. Customer-by-customer is the only resolution that lets you answer support tickets quickly.
  • Per-system throughput, worker utilization, and error rate broken out by error type. This is where you see whether the platform itself is healthy.
  • Anomaly detection on the per-consumer numbers. A customer dropping from 99.9% to 50% success might only be a few hundred events in absolute terms and is still a strong signal that something has changed.
  • Trend analysis on overall volume so capacity decisions get made before the next burst rather than during it.

Dashboards are a starting point. Alerts on the right thresholds are what keep an outage from being something a customer reports first.

Quick Reference

If you only remember a few things from this post:

  • Persist events before attempting delivery; never couple application threads to consumer response times
  • Give every consumer its own queue, concurrency budget, and rate limit
  • Pool connections per endpoint, set strict timeouts, cap concurrent in-flight requests
  • For fan-out, parallelize aggressively but track tail latency rather than averages
  • Use queue-depth limits, adaptive delivery rates, and circuit breakers to keep producers from outrunning consumers
  • Send events that can’t be delivered to a dead letter queue and replay from it later
  • Monitor per-consumer health, not just system-wide throughput

Build webhook infrastructure that does all of this and it will hold up under production load. Hookbridge ships with these patterns built in: per-endpoint concurrency and rate limits, exponential backoff with jitter, circuit breakers, dead letter queues, tenant isolation, and per-consumer monitoring. If webhook delivery isn’t the part of your product where you want to be spending engineering time, Hookbridge handles it.