Function scheduled delays should be caught up. With events processed and new runs scheduled, your system my still see backlogs based on your function's flow control (e.g. concurrency) config and your account's concurrency.
We still see backlogs in processing step.waitForEvent and cancelOn event expressions. We are continuing to work on this.
We also are continuing our rollout of isolated batch processing as previously mentioned to further isolate parts of our system.
Identified
Function scheduled delays should be caught up. With events processed and new runs scheduled, your system my still see backlogs based on your function's flow control (e.g. concurrency) config and your account's concurrency.
We still see backlogs in processing step.waitForEvent and cancelOn event expressions. We are continuing to work on this.
We also are continuing our rollout of isolated batch processing as previously mentioned to further isolate parts of our system.
Identified
The system is consuming the event backlog as fast as possible, with an ETA of ~10-15 minutes until function scheduling is caught up.
After function scheduling is caught up, function execution in your account may still be limited by your account concurrency or a given function's own flow control settings (concurrency, rate limit, etc.).
We will continue to share more updates as soon as we can.
Identified
There is an increase in throughput since 15:37 UTC (~15 min ago). We are continuing to apply changes and prepare a larger change to decouple parts of the system.
Event observability: Events may be delayed when appearing in the dashboard as the database ingestion for these events is also related to this part of the system that handles function scheduling.
Events continue to be ingested and the Event API remains unaffected.
Identified
Changes have increase throughput, but not yet to typical levels. We are actively testing the new system change to decouple batch processing before enabling it for all accounts.
Identified
We're deploying an in-memory optimization within the the part of the system the schedules new function runs. This optimization will alleviate pressure on underlying systems and increase throughput. The change will be rolled out momentarily.
We're also working in parallel on a system change to create a dedicated service for processing for event batching which is the cause of the overall backlog on the system.
Identified
We have scaled up several resources across the system and to handle a large increase in scale within the system. Services are scaled up and we have also added new function state shards, but rollout of those new shards can take up to ~30m.
We are also working on networking improvements to improve efficiency of the system with this significantly higher load.
Investigating
We are actively investigating delays with function run scheduling for a subset of customers. We will provide further updates as we identify the cause and resolve the issue.