Platform timeouts
Incident Report for AutoFi
Postmortem

Summary

On August 7th, 2023, between 10:23 PDT and 11:14 PDT, AutoFi experienced a service disruption due to a database connectivity failure.  The disruption caused service timeouts & errors for 100% of our API and tier-3 customers. 

What Happened

At 10:24 PDT, AutoFi’s operations team was alerted about a service failure of one of our core services and began an initial triage.  At 10:28 PDT, our team was notified of several additional service alarms for increased service latency of both front-end and back-end services.  At that time, our operations team initiated an internal incident and escalated the issue to our engineers.  During troubleshooting, our team identified an increase in application-level database errors for several of our backend systems, with a corresponding drop in database connections to one of our core databases starting at 10:23 PDT.  Further analysis identified a missing virtual private cloud (VPC) peering connection between our backend service networks and our data network.  The team determined that the VPC connection was accidentally removed as it was misclassified as no longer in use.  The team began recovery of the network configuration at 11:03 PDT, and all services fully functional by 11:14 PDT.

Mitigation Steps

As a result of this incident, AutoFi is taking the following actions:

  • We have added additional technical review controls for future core maintenance activities
  • We are adding additional monitors to improve data-layer triaging and reduce time to issue resolution
  • We are updating our incident response procedure to reduce time between initial triage and notifications to our customers
Posted Aug 07, 2023 - 18:20 PDT

Resolved
This incident has been resolved.
Posted Aug 07, 2023 - 11:20 PDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Aug 07, 2023 - 11:12 PDT
Identified
The problem was identified, we're working on the fix.
Posted Aug 07, 2023 - 11:03 PDT
Investigating
Users are experiencing issues trying to log in.
Posted Aug 07, 2023 - 10:53 PDT
This incident affected: Consumer Apps (Website CTAs, Deal Maker) and Dealer Apps (Deal Center, Admin Center).