Long Jeli loading times

Severity: Critical
Category: Dependencies
Service: PagerDuty

This summary is created by Generative AI and may differ from the actual content.

Overview

On October 20, 2025, from 19:17 to 22:33 UTC, the Jeli product experienced long load times, timeouts, and errors across various features due to third-party vendor outages. This impacted the Jeli web app, public API, Post-Incident Review creation, Slack channel data import, and Slack app commands.

Impact

All Jeli customers attempting to use features such as the web app, public API, Post-Incident Review creation, Slack channel import, and Jeli Slack commands were impacted by long load times, timeouts, and errors. Most critical features failed consistently.

Trigger

The incident was triggered by an outage from a critical third-party vendor responsible for checking customer entitlements and permissions across the Jeli infrastructure.

Detection

The team became aware of the errors at 19:49 UTC when a customer reported failures with the Jeli Slack app, due to an initial monitoring issue that prevented earlier detection.

Resolution

The incident was resolved by monitoring the third-party vendor's mitigation efforts. The vendor posted a status update at 21:06 UTC, with partial recovery observed by 21:43 UTC and full recovery by 22:33 UTC. After additional testing and monitoring, the incident was officially resolved at 23:48 UTC.

Root Cause

The root cause was an outage experienced by a critical third-party vendor that Jeli relies on for customer entitlements and permissions. This dependency meant that when the vendor's service failed, most Jeli product features also failed.