Monitoring Issue

Severity: Minor
Category: Change Process
Service: PagerDuty

This summary is created by Generative AI and may differ from the actual content.

Overview

PagerDuty experienced an incident from January 27th, 17:18 UTC to January 30th, 19:46 UTC, affecting fewer than 1% of accounts in the US and EU regions, resulting in loss of UI and API access to Response Plays. The issue was due to a code change that inadvertently affected more accounts than intended.

Impact

Fewer than 1% of accounts were affected, with loss of UI and API access to Response Plays in both US and EU regions.

Trigger

A code change deployed to upgrade accounts from Response Plays to Incident Workflows was applied to a wider range of accounts than intended.

Detection

A customer report about inability to access Response Plays prompted an investigation, leading to the detection of the issue.

Resolution

Engineers reverted the code change and reversed the upgrade on affected accounts, restoring API and UI access by January 30th, 19:46 UTC.

Root Cause

The root cause was a code change that was applied to more accounts than intended, due to insufficient guard rails and documentation.