Delayed Incident Notifications

Severity: MinorCategory: ScalabilityService: PagerDuty
This summary is created by Generative AI and may differ from the actual content.
Overview
On June 21st, between 06:25 UTC and 07:08 UTC, PagerDuty experienced delays in incident notifications and status updates due to a sudden increase in traffic caused by a widespread internet event. The system caught up shortly after the surge decreased, and all delayed notifications were delivered by 07:08 UTC.
Impact
1.33% of incident notifications and 5.73% of status updates were delayed. The incident affected customers in the US service region.
Trigger
A widespread internet event caused PagerDuty customers' monitoring systems to generate a significantly higher-than-expected level of alerts, leading to a sudden increase in traffic.
Detection
The issue was detected through monitoring systems that identified delays in incident notification delivery.
Resolution
The system was able to catch up to its expected processing levels shortly after the surge decreased. All queued incident notifications and status updates were processed and sent by 07:08 UTC.
Root Cause
The elevated traffic stressed the services responsible for scheduling incident notifications and status updates past expected surge levels.
;