24th May 2022
On 24th May a large number of incorrect emails were sent to merchants between 09:00 UTC and 10:38 UTC. The emails contained incorrect notifications of:
Mandate failed
Mandate cancelled
Payment failed
Payment cancelled
Payment charged back
The incorrect emails caused significant confusion for many of our merchants and resulted in our support teams handling a much larger volume of queries than usual, which in turn led to delays in responding to our customers.
We understand that payment notification emails are a critical part of many customers’ workflows and we apologise for the disruption caused as a result of this incident.
The incorrect emails were stopped at 10:38 UTC on 24th May. We notified all affected merchants of the error and sent corrected notifications by:
English locales: 25th May 15:30 UTC
French and German locales: 26th May 16:20 UTC
Both our sandbox and production environments were affected.
Payments, uptime and payer emails were unaffected by this incident.
As part of efforts to support increasing transaction volumes, we are making changes to our infrastructure for handling payment events.
On the afternoon of 23rd May, we updated the job responsible for sending daily payment notification emails to use the new events infrastructure. This change contained a bug where the dates that specify the time period for which to send notifications were ignored.
The job ran next on the morning of 24th May. Due to the bug, the job did not get the event count for the last 24 hours but instead returned the total historic counts for each event. This resulted in merchants receiving incorrect and misleading emails.
We stopped sending emails at 10:38 UTC. We removed all incorrect pending emails from the queue. These actions prevented any further impact on merchants.
We reverted the change that contained the bug at 11:32 UTC. This ensured the job ran correctly the next day \(25th May\).
A large number of merchants were affected by this incident and we wanted to be certain we sent the correct data in our follow up communications. Therefore it took us some time to put together an individual response for each merchant. We notified all affected merchants of the error and sent corrected notifications by:
English locales: 25th May 15:30 UTC
French and German locales: 26th May 16:20 UTC
As a result of this incident we are making a number of changes:
Changing the default behaviour of our events queries to require start and end dates
Adding validations so unexpected parameters cause errors rather than being ignored
Running the new infrastructure and old infrastructure in parallel for a period of time to check they are behaving identically before switching traffic to the new. In the past we have used this approach on a case by case basis. As a result of this incident we have made it the standard for doing migrations at GC.
All times in UTC
2022-05-23
15:01 We merged and deployed the change containing the bug
2022-05-24
09:00 The daily payment notification job started running in sandbox
10:00 The daily payment notification job started running in production
10:02 We were alerted that the job was sending incorrect emails
10:38 We stopped sending any further incorrect emails
11:32 We reverted the change that caused the problem
2022-05-25
09:00 The daily payment notification job started running correctly in sandbox
10:00 The daily payment notification job started running correctly in production
15:30 We notified all affected English speaking merchants
2022-05-26
16:08 We notified all affected German speaking merchants
16:20 We notified all affected French speaking merchants