Incident with Actions

Severity: Major
Category: Change Process
Service: GitHub

This summary is created by Generative AI and may differ from the actual content.

Overview

On October 21, 2025, GitHub Actions experienced degraded performance between 07:55 UTC and 12:20 UTC. This resulted in 2.11% of workflow runs failing to start within 5 minutes, with an average delay of 8.2 minutes. The incident was triggered by increased latency on a node in a Redis cluster due to resource contention caused by a stuck patching event. Recovery began once the patching process was unstuck and normal connectivity to the Redis cluster was restored at 11:45 UTC, with full resolution after clearing the backlog of queued work by 12:20 UTC.

Impact

GitHub Actions experienced degraded performance, with 2.11% of workflow runs failing to start within 5 minutes and an average delay of 8.2 minutes for affected runs.

Trigger

The incident was triggered by resource contention on a node in one of the Redis clusters, which was caused by a patching event that became stuck.

Detection

The incident was detected through reports of degraded performance for Actions and internal monitoring observing delays in starting some Actions runs, with approximately 10% of runs taking longer than 5 minutes to start.

Resolution

The resolution involved unstucking the patching process, restoring normal connectivity to the Redis cluster at 11:45 UTC, and subsequently clearing the backlog of queued work by 12:20 UTC.

Root Cause

The root cause was increased latency on a node in one of the Redis clusters, which was due to resource contention triggered by a patching event that became stuck.