Monitoring Issue
This summary is created by Generative AI and may differ from the actual content.
Overview
PagerDuty experienced an incident from January 27th, 17:18 UTC to January 30th, 19:46 UTC, affecting fewer than 1% of accounts in the US and EU regions, resulting in loss of UI and API access to Response Plays. The issue was due to a code change that inadvertently affected more accounts than intended.
Impact
Fewer than 1% of accounts were affected, with loss of UI and API access to Response Plays in both US and EU regions.
Trigger
A code change deployed to upgrade accounts from Response Plays to Incident Workflows was applied to a wider range of accounts than intended.
Detection
A customer report about inability to access Response Plays prompted an investigation, leading to the detection of the issue.
Resolution
Engineers reverted the code change and reversed the upgrade on affected accounts, restoring API and UI access by January 30th, 19:46 UTC.
Root Cause
The root cause was a code change that was applied to more accounts than intended, due to insufficient guard rails and documentation.
