← Back to Blog

Webhooks Explained the Way Nobody Explains Them

I have built a lot of automations. And there is one thing that breaks more projects than bad code, bad tools, or bad planning combined.

It is not understanding webhooks properly.

Not just what they are. Anyone can read a definition. I mean actually understanding how they work under the hood, why they fail in ways that look random but aren't, and what separates a webhook handler that holds up in production from one that quietly causes duplicate records, missed payments, and corrupted data.

This is going to start from the very beginning. If you already know what a webhook is, stick around, because the second half of this post is where most developers, even experienced ones, get things wrong.

Let's start with the dumb way

Before webhooks, there was polling.

Imagine you ordered a package and you want to know when it arrives. You have two options.

Option one: you go check your front door every five minutes. Every five minutes, you walk downstairs, open the door, look outside. Nothing there. Come back. Wait five minutes. Check again. Still nothing. You do this all day.

Option two: the delivery person rings your doorbell when the package arrives. You only go downstairs once. When it actually matters.

Polling is option one. Webhooks are option two.

In software terms, polling means your system sends a request to another system repeatedly on a schedule. "Hey, did anything change?" The other system responds. Usually with nothing. Then five minutes later you ask again. And again. And again.

This is how a lot of early automations work. You set up a workflow that checks your CRM every 15 minutes for new contacts. Every 15 minutes, regardless of whether anyone added a contact or not, your system sends a request. Most of the time the answer is no. Most of those requests are wasted.

Polling is not evil. There are places where it makes sense, like when the service you are connecting to does not support webhooks at all. But it has real problems. It is slow because you only find out about new data on your next check cycle. It is expensive because you are making API calls constantly, and most APIs have rate limits. And at scale it gets worse, not better.

Webhooks solve this entirely.

Polling vs. Webhooks
Polling
→ anything changed?
→ anything changed?
→ anything changed?
→ anything changed?
Webhook
event fired, your endpoint instantly notified

Okay so what is a webhook actually

A webhook is a way for one system to tell another system that something just happened.

When a client pays an invoice in Stripe, Stripe does not wait for you to ask. It immediately sends a message to a URL you configured saying "hey, this payment just happened, here are the details." Your system receives that message and does whatever it needs to do with it.

That is it. That is a webhook.

The URL that receives the message is your webhook endpoint. It is just a web address that accepts incoming data. And the message Stripe sends is an HTTP POST request with a JSON payload containing everything about the event.

Here is a simple example of what Stripe sends when a payment succeeds:

json
{
  "id": "evt_1OoX2tBzX2x9Y",
  "type": "payment_intent.succeeded",
  "data": {
    "object": {
      "id": "pi_3OoX2tBzX2x9Y",
      "amount": 24500,
      "currency": "usd",
      "status": "succeeded",
      "customer": "cus_Px9kLm2QrT",
      "metadata": {
        "order_id": "ORD-8821",
        "client_name": "Acme Corp"
      }
    }
  },
  "created": 1709234567
}

Your endpoint receives this. You read the type field to know what happened. You read the data to get the details. You do your thing, whether that is updating a Notion database, sending a Slack message, creating an invoice, whatever. And then you respond with a 200 OK to tell Stripe that you got it.

Simple, right?

The basics are simple. The problems start when you actually run this in production.

The anatomy of what is really happening

Let me slow this down because most people skim past the parts that matter.

When Stripe fires a webhook, it is making an HTTP POST request to your URL. That request has:

A body containing the JSON payload with the event data.

Headers containing metadata about the request. The most important one is the Stripe-Signature header, which we will get to in a minute.

A timeout. Stripe, like most webhook providers, expects your endpoint to respond within about 30 seconds. If it does not respond in time, Stripe considers the delivery failed and will retry.

And this is where the first real problem shows up.

Why webhooks fail and what actually happens when they do

There are a few common failure modes that catch people off guard.

Your server is down. If your webhook endpoint is unreachable when Stripe tries to send the event, Stripe will retry. Most providers retry with exponential backoff, meaning they try again after 1 minute, then 5 minutes, then 30 minutes, and so on. This sounds fine until you realize that when your server comes back online, you might receive the same event multiple times as retries come in alongside new events.

Your handler times out. If you are doing heavy processing synchronously inside your webhook handler, and it takes more than 30 seconds, Stripe will think the delivery failed and retry. Even if you actually processed the event successfully. Now you process it again.

Network hiccup. Sometimes Stripe sends the event, your server processes it, but the 200 OK response gets lost on the way back. Stripe never gets confirmation, so it retries. You process the same event twice.

The point here is this: webhooks are delivered at least once, not exactly once.

You will receive duplicates. Not always. Not on every event. But you will receive them, and you need to be ready for it.

This is why idempotency exists, and we are going to cover it in depth. But first, let's talk about something more urgent.

Security: anyone can send a fake webhook to your endpoint

Here is something that surprises a lot of people.

Your webhook endpoint is just a URL on the internet. Anyone who knows that URL can send a POST request to it. They can make the payload look exactly like a legitimate Stripe event. They can claim a payment was made when it was not. They can trigger your automation with fake data.

Without verification, your system has no way to know if the request actually came from Stripe or from someone pretending to be Stripe.

This is not a theoretical problem. It is a real attack vector, and it has caused real damage to real systems.

The solution is cryptographic signature verification.

How HMAC signature verification works

When you set up a webhook in Stripe, Stripe gives you a signing secret. It is a long random string that only you and Stripe know. Something like whsec_7xKjLp9mNqR2vT...

When Stripe sends a webhook, before sending it, it takes the raw payload body and your signing secret and runs them through an algorithm called HMAC-SHA256. This produces a signature string. Stripe includes this signature in the Stripe-Signature header of the request.

On your end, when you receive the request, you do the same calculation. You take the raw body and your signing secret and run HMAC-SHA256. If your result matches what is in the header, the request genuinely came from Stripe and the payload has not been tampered with. If it does not match, you reject it.

This works because HMAC-SHA256 is a one-way function. You cannot produce the correct signature without knowing the secret. An attacker who does not have your secret cannot produce a signature that will pass verification.

Here is how you actually implement this in Node.js:

HMAC-SHA256 Signature Verification
Stripe
payload + secret
HMAC-SHA256
Stripe-Signature header
──→
HTTP POST request
Your server
raw body + secret
HMAC-SHA256
compare signatures
node.js
const crypto = require('crypto');

function verifyStripeSignature(rawBody, signatureHeader, secret) {
  // Stripe sends the signature as "t=timestamp,v1=signature"
  const parts = signatureHeader.split(',');
  const timestamp = parts[0].split('=')[1];
  const receivedSignature = parts[1].split('=')[1];

  // Reconstruct the signed payload string
  const signedPayload = `${timestamp}.${rawBody}`;

  // Compute the expected signature
  const expectedSignature = crypto
    .createHmac('sha256', secret)
    .update(signedPayload, 'utf8')
    .digest('hex');

  // Use timing-safe comparison to prevent timing attacks
  const signaturesMatch = crypto.timingSafeEqual(
    Buffer.from(receivedSignature),
    Buffer.from(expectedSignature)
  );

  if (!signaturesMatch) {
    throw new Error('Signature verification failed');
  }

  // Check timestamp to prevent replay attacks
  const currentTime = Math.floor(Date.now() / 1000);
  const webhookTimestamp = parseInt(timestamp);
  const tolerance = 300; // 5 minutes

  if (Math.abs(currentTime - webhookTimestamp) > tolerance) {
    throw new Error('Webhook timestamp too old, possible replay attack');
  }

  return true;
}

There are a few things worth noting here.

Use crypto.timingSafeEqual, not ===. Regular string comparison in JavaScript exits early the moment it finds a mismatch. This means an attacker can figure out how many characters they got right by measuring how long the comparison took. timingSafeEqual always takes the same amount of time regardless of where the mismatch is.

Verify the raw body, not the parsed JSON. This is a mistake I have seen more than once. If you parse the body into a JavaScript object before verification, the string representation will be different from what Stripe signed. Verification will always fail. You need to capture the raw body string before any parsing happens.

Check the timestamp. Stripe includes a timestamp in the signature header. This lets you reject webhooks that are too old. If someone intercepts a valid webhook and replays it hours later, the timestamp check will catch it. A tolerance of 5 minutes is standard.

How Attio does it

Attio's signature verification works similarly but with slight differences in the header format. Attio sends the signature in an X-Attio-Signature header and the format is slightly different, but the underlying idea is identical: HMAC-SHA256 with a shared secret.

node.js
function verifyAttioSignature(rawBody, signatureHeader, secret) {
  const expectedSignature = crypto
    .createHmac('sha256', secret)
    .update(rawBody, 'utf8')
    .digest('hex');

  const receivedSignature = signatureHeader.replace('sha256=', '');

  return crypto.timingSafeEqual(
    Buffer.from(receivedSignature, 'hex'),
    Buffer.from(expectedSignature, 'hex')
  );
}

I built bidirectional syncs with Attio for production client environments. The signature verification is the first thing that runs when a webhook arrives, before any processing happens at all. If verification fails, the request is rejected immediately with a 400. Nothing else runs.

Idempotency: the concept most people skip

Remember how I said webhooks are delivered at least once, not exactly once?

This is where idempotency comes in.

Idempotency means that running the same operation multiple times produces the same result as running it once. It is the property that makes it safe to process duplicate webhook events.

Think about what happens without it.

A client pays an invoice. Stripe fires a webhook. Your handler processes it, marks the invoice as paid, sends a confirmation email to the client, and triggers the subcontractor payout.

Then Stripe fires the same webhook again because your 200 response got lost in transit.

Your handler processes it again. Marks the invoice as paid again. Sends another confirmation email to the client. Triggers the subcontractor payout again.

The client gets two emails. The subcontractor gets paid twice. Your records are a mess.

Idempotency prevents this. Here is how you implement it:

node.js
async function handleWebhook(event) {
  // Every Stripe event has a unique ID
  const eventId = event.id;

  // Check if we have already processed this event
  const existingEvent = await db.query(
    'SELECT id, processed_at FROM webhook_events WHERE event_id = $1',
    [eventId]
  );

  if (existingEvent.rows.length > 0 && existingEvent.rows[0].processed_at) {
    // Already processed, skip silently
    console.log(`Event ${eventId} already processed, skipping`);
    return { status: 'already_processed' };
  }

  // Record the event before processing
  // This prevents double processing if two requests come in simultaneously
  await db.query(
    `INSERT INTO webhook_events (event_id, event_type, payload, received_at)
     VALUES ($1, $2, $3, NOW())
     ON CONFLICT (event_id) DO NOTHING`,
    [eventId, event.type, JSON.stringify(event)]
  );

  try {
    // Do the actual processing here
    await processEvent(event);

    // Mark as successfully processed
    await db.query(
      'UPDATE webhook_events SET processed_at = NOW(), status = $1 WHERE event_id = $2',
      ['completed', eventId]
    );

  } catch (error) {
    // Mark as failed so you can retry or investigate
    await db.query(
      'UPDATE webhook_events SET status = $1, error = $2 WHERE event_id = $3',
      ['failed', error.message, eventId]
    );
    throw error;
  }
}

A few things to notice here.

Every Stripe event has a globally unique ID. That evt_1OoX2tBzX2x9Y string is unique. No two events will ever have the same ID. This is your idempotency key.

You record the event before processing, not after. If your server crashes halfway through processing, you want a record that the event was received. When Stripe retries, you can see it was received but not completed and handle accordingly.

The ON CONFLICT DO NOTHING clause. If two simultaneous requests come in for the same event (which can happen), only one will successfully insert. The other will be silently ignored at the database level. This prevents race conditions.

Log everything. Every event that comes in, every event you skip, every failure. You want to be able to trace exactly what happened for any event ID. When something goes wrong in production, these logs are what you investigate.

Queue systems: why you should never process synchronously

Here is something that surprised me the first time I understood it properly.

Your webhook endpoint should do almost no real work.

I mean this literally. When a webhook arrives, your endpoint should do three things. Verify the signature. Check idempotency. Acknowledge receipt with a 200. Then it should stop.

The actual processing should happen in a queue, asynchronously, after the HTTP response has been sent.

Here is why.

Stripe has a timeout of roughly 30 seconds. If your handler takes 35 seconds, Stripe considers it failed and retries. But more importantly, if your handler is doing heavy stuff, like querying a CRM, updating a Notion database, sending emails, anything that involves other network requests, any of those can fail or be slow. If they fail, your handler throws an error, you respond with a 500, and Stripe retries.

Now you have a partially processed event and an incoming retry.

The solution is this pattern:

Async Queue Architecture
Stripe
Endpoint
verify · queue · 200
Queue
Worker
✓ Done
node.js
app.post('/webhooks/stripe', async (req, res) => {
  // 1. Verify signature
  try {
    verifyStripeSignature(req.rawBody, req.headers['stripe-signature'], process.env.STRIPE_WEBHOOK_SECRET);
  } catch (err) {
    return res.status(400).json({ error: 'Invalid signature' });
  }

  const event = req.body;

  // 2. Check idempotency
  const alreadyProcessed = await checkIfProcessed(event.id);
  if (alreadyProcessed) {
    return res.status(200).json({ received: true, status: 'duplicate' });
  }

  // 3. Queue the event for async processing
  await queue.add('process-stripe-event', {
    eventId: event.id,
    eventType: event.type,
    payload: event
  });

  // 4. Respond immediately
  res.status(200).json({ received: true });
});

// In a separate worker process:
queue.process('process-stripe-event', async (job) => {
  const { eventId, eventType, payload } = job.data;

  // Now do the actual heavy work here
  // This runs in the background, no timeout from Stripe
  await updateNotionDatabase(payload);
  await sendConfirmationEmail(payload);
  await triggerSubcontractorPayout(payload);
  await markEventAsProcessed(eventId);
});

The webhook endpoint responds in milliseconds. Stripe is happy. The actual work happens in the background at whatever pace it needs to.

This is also where bulk edits become manageable. In the Attio sync work I have built, when someone updates 50 contacts simultaneously in the CRM, 50 webhooks arrive in rapid succession. Without a queue, you are trying to make 50 simultaneous Notion API calls. Rate limits kick in. Some fail. Retries cause more load. The whole thing starts to crumble.

With a queue, all 50 events get queued and processed one by one (or in small controlled batches). Nothing gets dropped. Nothing overwhelms the Notion API. Everything lands correctly.

What happens when things go very wrong: dead letter queues

Even with queues, individual events can fail to process. Maybe the Notion API is down for 10 minutes. Maybe there is a bug in your code for a specific edge case. Maybe the payload has a field you did not expect.

When a queued job fails, most queue systems will retry it with exponential backoff. Fail once, retry in 30 seconds. Fail again, retry in 2 minutes. And so on.

But if a job fails too many times, you do not want it retrying forever. You want it to land somewhere you can inspect and manually handle.

That place is called a dead letter queue.

A dead letter queue is just a separate queue where failed events go after exhausting all retries. You get alerted, you look at what failed, you fix the underlying issue, and you reprocess the event manually.

For anything touching payments or critical business data, you need a dead letter queue. A failed event that silently disappears is far more dangerous than one that sits in a dead letter queue waiting for you to deal with it.

If you are running this pattern inside n8n, error handling in n8n covers how to configure retries, wire up dead letter behavior, and keep idempotency intact across workflow executions.

Putting it all together: what production-grade actually looks like

A webhook handler that will hold up in production does all of the following:

Verifies the signature before doing anything else. If verification fails, it rejects the request with a 400. No exceptions.

Checks idempotency using the event ID. If the event has been processed before, it returns 200 and stops. No exceptions.

Acknowledges receipt immediately with a 200. Does not wait for processing to complete.

Queues the event for async processing. Does not block the HTTP response on any business logic.

Processes in a worker, not in the request handler. The worker does the actual work: database updates, API calls, notifications, whatever the system requires.

Logs everything. Every event received, every event skipped, every failure. With enough detail to trace any event ID from arrival to completion.

Has a dead letter queue for failed events. Failed processing never silently disappears.

That is the full picture. From an incoming HTTP request to a reliably processed event with no duplicates, no missing events, and no security vulnerabilities.

Why this matters more than most people realise

When you are connecting two tools with a basic workflow builder, you usually do not have to think about any of this. The platform handles it for you, more or less.

But when you are building production-grade systems, the kind that handle actual payments, actual CRM records, actual business operations, getting these wrong is not just a technical problem. It is a business problem.

A billing automation that processes a payment twice charges a client twice. A CRM sync that handles duplicates incorrectly gives you a database you cannot trust. A contract automation that fires twice sends a client two copies of the same contract.

I have built these systems for clients where the stakes are real. The reason I am thorough about signature verification, idempotency, queue systems, and dead letter queues is not because I like writing extra code. It is because I have seen what happens when you skip them.

The automation works fine in testing. It breaks in production in ways that are embarrassing and sometimes expensive.

The architecture described in this post is how you avoid that.

A quick summary if you are skimming

Webhooks are push-based notifications. A service sends you a message when something happens, rather than you asking repeatedly.

Webhooks are delivered at least once. You will receive duplicates. Your system must be built to handle them.

Always verify signatures. Use HMAC-SHA256 with your signing secret. Use timingSafeEqual. Check the timestamp. Verify against the raw body.

Implement idempotency using event IDs. Record every event before processing. Skip events you have already processed.

Acknowledge first, process second. Return 200 immediately. Do the actual work in a queue worker.

Use a dead letter queue. Failed events should never silently disappear.