Yesterday's event-driven vs polling piece left an open question: if webhooks are inherently more efficient than polling, why do production systems still produce duplicate side-effects? The answer: idempotency, or the lack of it.
If you've ever shipped a webhook handler and gotten a customer support ticket two months later that says "I was charged twice", you've met this pattern. This piece is the field guide.
The problem in one paragraph
Every webhook provider in production - Stripe, Shopify, GitHub, Twilio, Slack - retries failed deliveries. They have to. Networks fail, your server reboots, your handler timeouts. Without retries, webhook delivery would be best-effort, which is unacceptable for billing, inventory, or auth events. So providers retry, sometimes aggressively (Stripe retries up to 3 days for failed events).
The trap: a "failed" delivery from the provider's perspective often isn't actually failed. Your handler succeeded but returned a 5xx because of a transient internal error. Or your handler took too long and the provider gave up. Or the response packet got lost. In all these cases, the provider retries - and your handler runs the side-effect a second time.
What idempotency means in this context
An idempotent operation produces the same result regardless of how many times it's executed. charge_card($100) is not idempotent - call it twice, customer charged twice. set_user_status('active') is idempotent - call it ten times, status is still 'active'.
The webhook design goal: make your handler idempotent so retries are safe.
Three implementation patterns from production
Pattern 1: Event ID deduplication (the gold standard)
Every well-designed webhook payload includes a unique event ID. Stripe sends id: "evt_xyz123". Shopify sends X-Shopify-Webhook-Id. GitHub sends X-GitHub-Delivery. The pattern: maintain a table of processed event IDs, and refuse to process any ID you've already seen.
def webhook_handler(payload, headers):
event_id = payload['id'] # or headers['X-Webhook-Id']
# Atomic insert-or-fail
try:
db.execute(
"INSERT INTO processed_events (id, processed_at) VALUES (?, NOW())",
[event_id]
)
except IntegrityError:
# Already processed. Return success silently.
return 200
# Now run the side-effect, knowing it's the first time
process_event(payload)
return 200
This is the cleanest pattern. The atomic INSERT means even concurrent retries can't slip through.
Pattern 2: Natural-key idempotency
Sometimes you don't have a stable event ID, or your business logic makes the natural key better. Example: an "order paid" webhook. Even if you receive it three times, the result should be: order #1234 has status "paid". Your handler reads the order's current status before changing it - if it's already "paid", do nothing.
def order_paid_handler(order_id):
order = db.get_order(order_id)
if order.status == 'paid':
return 200 # Already paid, nothing to do
db.update_order(order_id, status='paid')
send_receipt_email(order)
return 200
The risk: side-effects (sending the receipt email) need to be guarded too. If you send the email before the status update, retries will spam the customer.
Pattern 3: Idempotency keys (Stripe-style outbound)
This is the inverse pattern - when your service makes outbound calls (e.g., charging a card), you generate an idempotency key and pass it. The downstream service uses it to dedupe. This protects you from your own retries.
key = f"webhook-{event_id}"
stripe.Charge.create(
amount=10000,
currency='usd',
source='tok_xxx',
idempotency_key=key # Stripe will return the same charge if called twice
)
Three subtle bugs to avoid
- Race conditions in your dedup table. If two webhook deliveries hit your service at the exact same millisecond, both could pass the "have I seen this?" check before either inserts. Use atomic insert-or-fail (as in pattern 1), not check-then-insert.
- Side-effects outside your transaction. If you write to the DB and then send an email, a crash between the two means the next retry will write again (failing dedup) but skip the email. Wrap critical side-effects so they're tied to the dedup record's commit, or use an outbox pattern.
- Long retention. Stripe's "we may retry for 3 days" means your dedup table needs at least 3 days of history. If you prune to 1 day, you'll see day-2 retries treated as new events.
The pattern, in one sentence
Treat every webhook delivery as if it might arrive twice, because it will.
Tomorrow
We cover Pattern #5: Webhook signing and verification. Why HMAC matters, why it's almost always implemented wrong on first try, and the three checks every production handler should run before doing anything with a payload.