Event API partial downtime, run history pipeline delay

Resolved·Partial outage

The incident is now resolved and the system is full operational.

The Event API partial downtime occurred between 06:26 UTC and 07:58 UTC with degraded performance continuing until 09:20 UTC (see incident component timeline).

During this time window, events may have failed to have been sent successfully.

Thu, Oct 16, 2025, 09:20 AM

(5 days ago)

Affected components

Oct 16, 2025, 06:26 AM

09:20 AM

Event API

Function execution

Inngest Dashboard

Updates

Resolved

The incident is now resolved and the system is full operational.

The Event API partial downtime occurred between 06:26 UTC and 07:58 UTC with degraded performance continuing until 09:20 UTC (see incident component timeline).

During this time window, events may have failed to have been sent successfully.

Thu, Oct 16, 2025, 09:20 AM

Monitoring

We’re still addressing complications with some brokers that remain in a broken state and are actively working to restore them.

We’re also observing increased execution lag, partially caused by the current system pressure. We expect latency to start decreasing once the issue is resolved and will continue to closely monitor the situation.

Thu, Oct 16, 2025, 08:45 AM(35 minutes earlier)

Monitoring

We have spun up two additional Kafka brokers and are working to rebalance the cluster to reduce pressure on the existing brokers.

We’re closely monitoring the system to ensure it remains fully operational.

Thu, Oct 16, 2025, 07:58 AM(47 minutes earlier)

Identified

We’ve identified the root cause of the issue: two of our Kafka brokers are experiencing disk pressure, which is preventing our Event API from publishing to Kafka. We are actively working to restore the affected brokers as quickly as possible.

Thu, Oct 16, 2025, 07:57 AM

Investigating

We are actively investigating an issue with increased error rates with our Event API and a portion of our data pipeline that handles the function run status deliveries.

Investigation is showing errors started appearing ~06:26 UTC and started increasing from there.

We will provide further updates as we identify the cause and resolve the issue.

Thu, Oct 16, 2025, 07:28 AM(28 minutes earlier)