Disruption with some GitHub services

Severity: CriticalCategory: Change ProcessService: GitHub
This summary is created by Generative AI and may differ from the actual content.
Overview
On August 27, 2025, between 20:35 and 21:17 UTC, GitHub's Copilot, Web, and REST API services experienced degraded performance. This incident was triggered by a production database migration to drop a column that, while no longer in direct use, was still referenced by the ORM, leading to a high volume of 5xx errors. This was a recurrence of a similar incident on August 5th, for which preventative repairs were not completed in time. The issue was resolved by applying a fix to the production schema. As immediate preventative measures, a temporary block for all drop column operations has been implemented, and plans are in place to add more safeguards and implement graceful degradation to prevent Copilot issues from impacting other product features.
Impact
Copilot experienced an average of 36% of requests failing, with a peak failure rate of 77%. Approximately 2% of all non-Copilot Web and REST API traffic requests failed. The incident lasted for 42 minutes, affecting Copilot, Web, and REST API traffic.
Trigger
The incident was triggered by initiating a production database migration to drop a column from a table backing Copilot functionality. Although the column was no longer in direct use, the Object-Relational Mapper (ORM) continued to reference it.
Detection
The team became aware of the degraded performance through internal monitoring systems, which likely detected a high level of 5xx responses shortly after the issue began. This led to an 'Investigating' status at 20:41 UTC, followed by updates indicating awareness of the root cause by 20:55 UTC and discovery of the cause by 21:25 UTC.
Resolution
At 21:15 UTC, a fix was applied to the production schema, leading to full recovery of all services by 21:17 UTC. As immediate solutions, a temporary block for all drop column operations has been implemented. Further plans include adding more safeguards to prevent similar issues and implementing graceful degradation to ensure Copilot issues do not impact other features of the product.
Root Cause
The root cause was a production database migration that attempted to drop a column from a table backing Copilot functionality. Despite the column being out of direct use, the Object-Relational Mapper (ORM) still referenced it, causing a large number of 5xx responses. A contributing factor was that this issue was similar to a previous incident on August 5th, and the repairs intended to prevent its recurrence were not completed quickly enough.
;