Latency Issues in Web and Mobile Applications

Severity: Major
Category: Misconfiguration
Service: PagerDuty

This summary is created by Generative AI and may differ from the actual content.

Overview

Between 21:01 UTC and 23:25 UTC on March 24, 2026, PagerDuty customers in the US service region experienced delays and intermittent timeouts across the REST API, Web, and Mobile platforms. The issue was caused by authentication services reaching resource limits, resulting in increased latency for login and session validation requests. The PagerDuty EU region and core notification/event ingestion pipelines remained fully operational.

Impact

PagerDuty customers in the US service region experienced increased latency and intermittent timeouts when using the Web and Mobile applications and the REST API. Specifically, login and session validation requests were delayed due to authentication services reaching resource limits.

Trigger

The cumulative volume of traffic associated with a gradual migration of session timeout handling reached a threshold on March 24 that exceeded the default memory configurations of a downstream service responsible for managing the new session timeout capabilities.

Detection

Internal monitoring detected increased response times for login and session validation requests starting at 21:01 UTC. By 22:20 UTC, the latency levels reached a threshold that triggered a Major Incident response.

Resolution

After an initial precautionary rollback of authentication pipeline changes failed to resolve the issue, responders identified high memory consumption in a downstream service. They increased the number of computing instances for the service and cycled out unhealthy instances, resulting in a full recovery by 23:25 UTC.

Root Cause

A change to session timeout handling, which was migrated gradually over several weeks, caused the cumulative traffic volume to eventually exceed the default memory configurations of a downstream service. This led to memory exhaustion, sustained high CPU usage, and a high rate of service restarts.