At approximately 2:14pm PT on Oct 24, 2013, Tddium’s DB master server experienced a CPU usage spike that cascaded into to a server stoppage. No data was lost.
Examining data (thanks New Relic!) and logs, our conclusion is that though average usage hovers around 20-30%, our DB master has burst CPU usage close to 100%. Once postgres crosses into “queue backup” territory, it never comes back.
Tonight, we will upgrade our DB cluster to use faster servers. This upgrade should only take a few minutes, but it will require the app to be down.
We appreciate your patience as we address these infrastructure issues.
– The Solano Labs Team