black book of errors

Incident log

How to record failures so that a week later it is clear what happened, who was affected, how it was fixed, and what must not repeat.

Goal: turn launch chaos into a learning log, not a pile of phone messages.

what to ask first

Date and time.
Affected path: payment, reading, email, account, mobile, AI, admin.
Severity: low, medium, high, stop-launch.
How many users were affected.

Record the fact

Write the observable fact, not a guess: 'webhook failed 12 times', not 'Stripe broke'.

  • Timestamp.
  • Scope.
  • Evidence link.

Assess impact

Main question: were there money, private data, lost access, or widespread blocking involved?

  • Money impact.
  • Privacy impact.
  • User-facing impact.

Close and prevent

Every serious incident should end with a fix, owner note, and prevention item.

  • Fix deployed.
  • Affected users contacted if needed.
  • Regression test added or checklist updated.

reply templates

Internal entry

Any repeated failure.

[Incident] path / severity / date

What happened: ... Who was affected: ... Cause: unknown/confirmed ... Action: ... Clients contacted: yes/no ... Prevention: ...

Client-facing issue acknowledgement

Failure affected a client and has been checked.

We found the access issue

We found a technical issue that may have prevented opening your scroll. We are restoring access now. You do not need to pay again. We will write when the path is restored.

red flags

  • Money + no access affects more than one user.
  • Private reading visible to wrong person.
  • After deploy, checkout or account breaks.

closed when

  • Incident entry includes cause, action, owner, and prevention.
  • Affected users contacted when needed.
  • Checklist updated so the same error is harder to repeat.

related doors