Cloudflare outage on December 5, 2025

Severity: Critical

Category: Bug

Service: Cloudflare

This summary is created by Generative AI and may differ from the actual content.

Overview

On December 5, 2025, a portion of Cloudflare's network experienced significant failures for approximately 25 minutes, impacting about 28% of HTTP traffic. The incident was triggered by a configuration change made to disable an internal WAF testing tool. This change was a secondary action taken during the rollout of an increased WAF buffer size, which was intended to mitigate an industry-wide vulnerability in React Server Components (CVE-2025-55182). The disabling of the WAF testing tool, propagated globally, exposed a latent bug in the FL1 proxy's rules module, causing HTTP 500 errors for affected customers. The issue was resolved by reverting the problematic configuration change.

Impact

The incident lasted approximately 25 minutes (08:47 UTC to 09:12 UTC). A subset of customers, accounting for approximately 28% of all HTTP traffic served by Cloudflare, were impacted. Specifically, customers whose web assets were served by the older FL1 proxy AND had the Cloudflare Managed Ruleset deployed experienced HTTP 500 errors. Customers not meeting these criteria, or those served by Cloudflare's China network, were not impacted.

Trigger

The incident was triggered by changes being made to Cloudflare's body parsing logic to detect and mitigate an industry-wide vulnerability (CVE-2025-55182) in React Server Components. An initial change to increase the WAF buffer size to 1MB was being rolled out. During this rollout, an internal WAF testing tool was found not to support the increased buffer size. A secondary change was then made to turn off this internal WAF testing tool using a global configuration system, which does not perform gradual rollouts. This second change, upon propagation, caused an error state in the FL1 version of the proxy.

Detection

Cloudflare became aware of the incident through automated alerts, which declared the incident at 08:50 UTC, shortly after the problematic configuration change fully propagated across the network. This was evidenced by a significant increase in HTTP 500 errors served by their network.

Resolution

The incident was resolved by identifying the problematic configuration change and reverting it. The revert process began at 09:11 UTC and was fully propagated by 09:12 UTC, at which point all traffic was restored. A well-defined Standard Operating Procedure for the killswitch subsystem was followed during the resolution.

Root Cause

The root cause was a latent bug in the FL1 version of Cloudflare's proxy, specifically within the rulesets system. This bug manifested when a killswitch was applied to a rule with an 'execute' action. The code expected the 'rule_result.execute' object to exist if the action was 'execute', but because the rule had been skipped by the killswitch, this object did not exist, leading to a Lua exception: 'attempt to index field 'execute' (a nil value)'. This code error had existed undetected for many years. A contributing factor was the use of a global configuration system for the problematic change, which lacks gradual rollout and health validation features, allowing the bug to impact the entire fleet rapidly.

See the original post