Issues with Workflow Automations (formerly Catalytic), Scribe Agent

Severity: Critical
Category: Dependencies
Service: PagerDuty

This summary is created by Generative AI and may differ from the actual content.

Overview

PagerDuty Workflow Automation (formerly Catalytic) experienced an incident from October 20, 2025, 07:00 UTC to October 21, 2025, 05:55 UTC. The incident initially caused page load failures on pages listing components such as workflows and runs, and new runs could fail to start. This was followed by an outage of the web interface. Workflow Automation's APIs remained functional but were slower than usual. The Scribe Agent feature also experienced issues adding to conference bridge calls, recovering by Oct 21, 2025, 5:52 AM GMT+9.

Impact

All Workflow Automation users were impacted, experiencing an inability to reliably use the application portal, page or list component load failures, and new runs failing to start. The web interface became unavailable, and Workflow Automation APIs, while functional, experienced slower processing times. The Scribe Agent feature also had issues adding to conference bridge calls.

Trigger

The incident was triggered by a significant internet outage on October 20, 2025, which led to higher failure rates and slower response times across several internal services.

Detection

The initial page load failures were first recognized at 09:52 UTC on October 20, 2025. Investigation into the incident began shortly after, by 10:18 PM GMT+9 on Oct 20, 2025.

Resolution

On October 21, 2025, at 05:55 UTC, a default set of feature flags was deployed. This action bypassed the initial failing checks and resolved the web interface outage. System stability was prioritized, and the application began serving all web traffic normally. The Scribe Agent feature fully recovered by Oct 21, 2025, 5:52 AM GMT+9.

Root Cause

The initial application issues were rooted in a significant internet outage that caused internal services to experience higher failure rates and slower response times. The subsequent web interface outage was caused by the Workflow Automation web portal's inability to resolve feature flags due to its dependency on an unavailable external subsystem, compounded by the lack of a graceful degradation mechanism to handle such a scenario.