Disruption with Grok Code Fast 1 in Copilot

Severity: Critical
Category: Dependencies
Service: GitHub

This summary is created by Generative AI and may differ from the actual content.

Overview

The Copilot service experienced degradation for 2 hours and 30 minutes, from October 20th 14:10 UTC until 16:40 UTC. This incident specifically impacted the Grok Code Fast 1 model, leading to a spike in errors that initially affected 30% of users, later improving to 6%. Other Copilot models were not affected. The incident was caused by an outage with an upstream provider.

Impact

The Copilot service experienced degradation, specifically impacting the Grok Code Fast 1 model. This resulted in a spike in errors affecting up to 30% of users initially, which later improved to 6%. No other Copilot models were impacted.

Trigger

The incident was triggered by an infrastructure issue which impacted the Grok Code Fast 1 model. This infrastructure issue was a direct result of an outage with an upstream provider.

Detection

Awareness of the incident began with reports of degraded performance for Copilot, prompting an investigation by the team.

Resolution

The resolution involved working collaboratively with the upstream model provider. The provider implemented fixes, which led to the gradual improvement and eventual stabilization of the Grok Code Fast 1 model in Copilot Chat, VS Code, and other Copilot products.

Root Cause

The root cause of the incident was an outage experienced by an upstream model provider, which in turn caused an infrastructure issue impacting the Grok Code Fast 1 model within the Copilot service.