The incident is now resolved and the system is full operational. This was related to an issue caused by the part of the system powering the debounce feature. The internal event backlog is fully caught up and the two mitigations deployed have addressed the issue. The team is preparing a post-mortem to ensure this issue does not reoccur.
After an extended monitoring period, we are resolving this incident. The system is full operational.
After an extended monitoring period, function execution has returned to normal rates across all queue shards. During this issue, only a subset of users were affected on part of our infrastructure. Our infrastructure team is in the midst of rolling out additional system capacity going forward.
Runs, traces and events data are all caught up from their temporary backlog. The dashboard metrics are all being processed with no backlog.