The fix has been running stable for 30 minutes and all system metrics are healthy. Run traces and metrics ingestion should be near realtime again.
The incident is now resolved and the system is full operational.
During this incident the system experienced a backlog in scheduling new runs and event data ingestion for the Inngest dashboard was delayed.
The root cause was due to an issue with our internal event streams brokers which power internal event ingestion, run trace data ingestion and metrics ingestion. This caused delays in other data in the dashboard as well.