when the bell rings

Emergency playbook

What to do during broken payments, data exposure, mass email errors, downtime, bad deploy, or unsafe content.

In an emergency, pretty explanations matter less than order: stop harm, understand scope, communicate honestly, rollback, record the lesson.

owner

Stop harm

Pause broken flow, hide unsafe copy, disable bad experiment, or rollback deploy.

Do not keep accepting payments through a broken flow.

owner

Communicate

Tell affected users what is known, what is being checked, and what they should do.

Do not promise details that are not yet verified.

technical owner

Learn

After fix, write cause, scope, fix, prevention, and monitoring rule.

Do not close incident without prevention note.

First 15 minutes

Do not look for someone to blame. Stop harm first.

  • Name incident owner.
  • Pause affected flow or rollback if needed.
  • Open incident log with facts only.

First hour

After stopping harm, determine scope and customer message.

  • List affected routes/users/payments/emails if known.
  • Draft customer/support response.
  • Decide refund, resend, rollback, or maintenance banner.

Aftercare

An incident leaves not just a fix, but a protective trace.

  • Write prevention rule.
  • Add monitoring signal or checklist item.
  • Update support macro if customers were affected.

checklist

  • Incident owner named.
  • Affected flow paused or verified safe.
  • Customer message is factual and not defensive.
  • Prevention note written.

handoff

  • Emergency handoff includes timeline, impact, current state, next decision, and owner.

red flags

  • Data exposure suspected.
  • Payments continue while checkout is broken.
  • Mass email went to wrong audience.
  • Unsafe content is still public after report.

related doors