What happened?
An update from our upstream provider was deployed on a database that was running a different version to their other platforms. The update included some performance fixes which relied on a new database function which was not on the version they had running for our HotDrops.
Deploying this feature broke the read requests to their API - and as such all data looked to be missing from the platform.
How was it fixed?
The database was upgraded to the newer version with the function required for reads. Once this was done the API started returning data again.
What data was lost?
The only data that was lost was during the period the database upgrade was underway - approx 20 minutes, or approx 1-2 hotdrop readings (per hotdrop). The rest of the HotDrop data was saved during the outage window and is available.
What steps are being made to stop this from happening again?
We have been underway with some massive infrastructure changes with HotDrops for the past few months. These changes are moving away from this fragile system and onto a new platform that will be faster and more reliable. This change is due to happen in the second half of this year. More details around timelines will be shared closer to the time.