Earlier today our run trace data pipeline in one of our regional clusters experienced some issues with writing data. This pipeline asynchronously ingests run traces and statuses into our database. Runs were executing successfully, but the traces could not be ingested, resulting in outdated run statuses in the Inngest dashboard (stuck in 'queued'). These runs continued to be executed, but the dashboard observability is out of date and in some cases, incorrect.
The system is fully operational now.
We are analyzing our system's data to determine the most effective way to prevent this case in all parts of our system that rely on this and similar data pipelines.
Resolved
Earlier today our run trace data pipeline in one of our regional clusters experienced some issues with writing data. This pipeline asynchronously ingests run traces and statuses into our database. Runs were executing successfully, but the traces could not be ingested, resulting in outdated run statuses in the Inngest dashboard (stuck in 'queued'). These runs continued to be executed, but the dashboard observability is out of date and in some cases, incorrect.
The system is fully operational now.
We are analyzing our system's data to determine the most effective way to prevent this case in all parts of our system that rely on this and similar data pipelines.
Monitoring
The team released a fix for run statuses over the last 90 minutes and has been monitoring the system. Run status looks to be up to date for new runs. Earlier runs that were effected may not have their status updated yet. We are looking for a fix to address this.
Investigating
Runs status in the Inngest dashboard is delayed for some runs. We have identified an issue in our observability data pipeline.