First and foremost, we want to apologize for delays that some customers have recently experienced with Solano CI. We are in the process of addressing a chain of related issues that emerged over the last week. We’d like to explain what these issues are and how we’ve been addressing them.
The first issue emerged on Tuesday, Nov. 1, when Solano CI began to experience capacity constraints that limited the number of sessions we could support in our SaaS production environment. To the best of our understanding, this capacity issue was due to an unusually high demand for the AWS instance type that Solano CI commonly uses for its production workloads. For some time, Solano was unable to allocate new, healthy AWS instances with our preferred type and region.
In an attempt to mitigate this issue, the Solano team migrated capacity to different instance types that we determined as suitable backups when our preferred types are unavailable. These instance types required enough additional storage that using many of them bumped against an account-level volume limit size imposed by AWS; this killed some instances in use. Because of decreased supply (fewer instances available) and increased demand (restarting killed sessions), build queues backed up. We manually managed them during peak hours to limit slowness and worked with AWS to increase that storage limit, but it took multiple days to complete the process. With the higher limit, we are again able to spin up sufficient capacity for our full peak load.
Coupled with that, the high number of restarts affected other parts of the Solano system. Last week’s unusual load stressed Solano’s infrastructure, including the database, in unprecedented ways. Although most issues are resolved, we are working on one component that seems to be causing some customers’ builds to restart unexpectedly. This is still affecting queue throughput, and is now the critical priority for our engineering team.
Again, we apologize for the effect this has had. We understand how valuable fast builds are to your productivity, and are working to correct the problem as quickly as possible. We will continue to update this blog as more information is available. In the meantime, please don’t hesitate to contact firstname.lastname@example.org if you have any questions or concerns.
The Solano Support Team