Crates.io is down [fixed]

I've wondered why cargo gave me the following error message: error: unable to get packages from source.
The reason is that crates.io is down, at least for me.
This is the server error:

I've created a github issue

Same here...

Something Went Wrong!

The proper people have been alerted, and they're looking into it. It would have been fixed already, but @alexcrichton is on a plane right now, and we didn't realize it, so our initial pings weren't answered. @brson is on the case!

Everything's back to normal.

1 Like

OK, a quick post-mortem:

At 9:45 AM PST I got a ping that crates.io was down and started looking into it. Connections via the website and from the 'cargo' command were timing out. From Heroku's logs it looks like the timeouts began around 9:10 AM.

From looking at logs (1 2 ) it's clear that connections were timing out, and that a number of postgres queries were blocked updating the download statistics. These queries were occupying all available connections.

After killing outstanding queries the site is working again. It's not clear yet what the original cause was.

1 Like

More information from Heroku:

Around 16:15 UTC (9:15am PDT) or so, the underlying instance of web.1
dyno started having some serious network issue. We detected the issue
and evicted that instance, and moved web.1 dyno to the another server.
This part worked well and web.1 dyno started running normally on the new
server, however, likely during the process, we left some stale Postgres
connections holding a lock in the middle. This can unfortunately happen
very occasionally when we can't terminate the dyno well (meaning the
app/dyno likely didn't have a chance to close all connections before
die).

Such stale connection was causing any update version_downloads
queries to be stuck, which means that the requests that involve this
would result H12 whereas the other requests might just have gone well.

As I mentioned in the previous command, further dyno restarts don't help
this situation because it won't do anything with already-stale
connections. To solve this, you'd either need to kill such stale
connections (using heroku pg:kill if you can
figure out pid of it), or just kill all connections (which I took,
because it was faster to do so than trying to figure out the exact
blocking query at that point).

So it seems that in migrating a dyno, there was a lock left held on the database.

2 Likes

Isn't there a connection timeout set?

@Diggsey I don't know enough about how postgres works to say, but the connections all did time out on the client side; the server though did not release the resources involved in the deadlocked queries.

Apparently it's not supported out of the box, but you could use one of the solutions here: timeout - How to close idle connections in PostgreSQL automatically? - Stack Overflow

I don't think Heroku gives apps access to cron or other scheduled events, so it'd need to be scheduled outside of the server application. I don't know if a pool or bouncer is available on Heroku, however.

Heroku actually recommend using pgbouncer in their documentation: Heroku Postgres Database Tuning | Heroku Dev Center

2 Likes