This summary is created by Generative AI and may differ from the actual content.
Overview
On February 11, from 04:51 UTC to 06:40 UTC, PagerDuty experienced an incident affecting customers in the US service region, resulting in out-of-date or missing incident details due to replication delays in the database cluster.Impact
Customers in the US service region experienced out-of-date or missing incident details for nearly two hours.Trigger
A database migration started on February 10, combined with the decommissioning of a database server, led to replication lag detection failures.Detection
Replication lag monitors began alerting after 20 minutes of sporadic delays.Resolution
The migration was safely stopped, and within ten minutes, the backlog of writes was processed, resolving the issue.Root Cause
The replication lag was caused by the migration process interacting with infrastructure changes, leading to silent failures in lag detection.