Incomplete pull request results in repositories
This summary is created by Generative AI and may differ from the actual content.
Overview
On April 28, 2026 at 14:07 UTC, a manually invoked repair job intended for a single repository was run without required safety flags. While the MySQL query was correctly scoped, the Elasticsearch reconciliation logic mis‑interpreted the PR ID range as continuous, causing deletion of 1,789,756,838 pull‑request documents (~49 % of indexed PRs) across other repositories. The incident manifested as missing pull‑request entries in global and repository /pulls pages. Detection occurred ~10 minutes after customer reports. Mitigation involved a MySQL‑backed search fallback for high‑traffic repos, a snapshot restore and reindex of Elasticsearch, and a degradation notice to users. The reindex completed and full service was restored on May 1 2026 at 04:15 UTC.
Impact
The impact was limited to the search and indexing layer for pull requests. Approximately half of the indexed PR documents were removed, resulting in incomplete PR search results and list discoverability. Primary storage remained intact; opening, updating, or merging PRs was unaffected. No revenue loss was reported, but users experienced degraded PR discoverability.
Trigger
The incident was triggered by a manual execution of a repair job without safety flags. The job’s Elasticsearch reconciliation component incorrectly treated the min and max PR IDs as a continuous range, leading to mass deletions of PR documents in the search index across unrelated repositories.
Detection
The issue was first reported by customers noticing missing PRs on search pages. The problem was identified by the on‑call team roughly 10 minutes after the initial reports. Existing monitoring did not flag the incident because it affected search completeness rather than service availability.
Resolution
Resolution was achieved through three parallel actions: (1) deployment of a MySQL‑backed search fallback for the most active repositories to restore PR visibility; (2) initiation of a snapshot restore followed by a full reindex of the Elasticsearch PR documents; and (3) addition of a degradation notice on PR pages to inform users of incomplete results. The reindexing completed and validation confirmed full restoration by May 1 2026 at 04:15 UTC.
Root Cause
The root cause was a flaw in the search document repair framework that allowed a scoped reconciliation to run without enforcing a matching Elasticsearch query scope. This mismatch, combined with the ability to trigger the job from the production console without safety defaults, caused destructive deletions. Insufficient testing (only safe backfill scenarios) and lack of automated detection for large‑volume deletions in Elasticsearch further contributed to the incident.
