Publishing problem 2014-02-18

Severity: Major
Category: Dependencies
Service: npm

This summary is created by Generative AI and may differ from the actual content.

Overview

For ~35 minutes from 4.14pm to 4.50pm, if you were attempting to publish an updated version of an existing package, there is a 1 in 3 chance you would have seen an error.

Impact

a 1 in 3 chance you would have seen an error something like this: http 409 <url> error Error: conflict Document update conflict.: ftp-deploy

Trigger

attempting to publish an updated version of an existing package

Detection

Per our previous blog post, we had set up alerting on replication status, so we were already addressing the issue when the first user reported the problem.

Resolution

We resolved user-facing errors by taking the affected read-only replica out of rotation, and then 15 minutes later we permanently resolved the issue by replacing the version of Pound used on our write master, which allowed replication to resume. We then returned the read-only replica to rotation.

Root Cause

This was caused by a data inconsistency between our master CouchDB server and one of the read-only replicas. The root cause was a known bug in Couch replication??(which has bitten us before).