Incident with CodeQL, Webhooks, Notifications, and Slack Integration

Severity: Major
Category: Change Process
Service: GitHub

This summary is created by Generative AI and may differ from the actual content.

Overview

Incident with CodeQL, Webhooks, Notifications, and Slack Integration causing processing delays due to replication lag and insufficient worker capacity.

Impact

53% of CodeQL check runs took >15 min; notifications avg 22 min delivery; Slack webhooks avg 20 min delivery; services experienced degraded performance.

Trigger

Replication lag from an internal database migration leading to insufficient worker capacity for high job enqueue rate.

Detection

Degraded performance was observed as delays in CodeQL actions, notifications, webhooks, and Slack integration, prompting investigation and updates.

Resolution

Scaled processing workers to handle increased load; added capacity; plan to create dedicated worker pools for high‑usage queues.

Root Cause

Replication lag from database migration caused queue backlogs and worker shortage, resulting in delayed processing across services.