Summary
On August 7th, 2023, between 10:23 PDT and 11:14 PDT, AutoFi experienced a service disruption due to a database connectivity failure. The disruption caused service timeouts & errors for 100% of our API and tier-3 customers.
What Happened
At 10:24 PDT, AutoFi’s operations team was alerted about a service failure of one of our core services and began an initial triage. At 10:28 PDT, our team was notified of several additional service alarms for increased service latency of both front-end and back-end services. At that time, our operations team initiated an internal incident and escalated the issue to our engineers. During troubleshooting, our team identified an increase in application-level database errors for several of our backend systems, with a corresponding drop in database connections to one of our core databases starting at 10:23 PDT. Further analysis identified a missing virtual private cloud (VPC) peering connection between our backend service networks and our data network. The team determined that the VPC connection was accidentally removed as it was misclassified as no longer in use. The team began recovery of the network configuration at 11:03 PDT, and all services fully functional by 11:14 PDT.
Mitigation Steps
As a result of this incident, AutoFi is taking the following actions: