Incident Timeline

Every incident has an immutable timeline that logs all events, actions, and state changes. Use the timeline for forensic analysis, compliance, and postmortems.

What is the Timeline?

The incident timeline is an immutable, append-only log of all events related to an incident:

Immutable: Events cannot be edited or deleted after creation
Append-Only: New events are always added to the end
Timestamped: Every event has precise timestamp (milliseconds)
Attributed: Tracks who/what triggered each event

Event Types

Event Type	Description	Triggered By
`created`	Incident created	System (auto) or User (manual)
`acknowledged`	Engineer acknowledged incident	User
`note_added`	Investigation note added	User
`status_changed`	Component status updated	System or User
`alert_sent`	Alert notification sent	System
`escalated`	Alert escalated to next level	System
`resolved`	Incident resolved	System (auto) or User (manual)
`reopened`	Incident reopened	System or User

Timeline Example

Incident: Production API Down (inc_abc123)

┌─────────────────────────────────────────────────────────────┐
│ 10:30:00.245  [created]                                     │
│ Actor: System                                               │
│ Details: Incident created after 3 consecutive failures      │
│ Severity: critical                                          │
├─────────────────────────────────────────────────────────────┤
│ 10:30:00.567  [alert_sent]                                  │
│ Actor: System                                               │
│ Details: Alert sent to Slack (#incidents)                  │
│ Channel: chan_slack_123                                     │
├─────────────────────────────────────────────────────────────┤
│ 10:30:00.892  [alert_sent]                                  │
│ Actor: System                                               │
│ Details: Alert sent to Email (team@example.com)            │
│ Channel: chan_email_456                                     │
├─────────────────────────────────────────────────────────────┤
│ 10:32:15.123  [acknowledged]                                │
│ Actor: John Doe (usr_john_123)                              │
│ Details: Incident acknowledged                              │
│ Time to Acknowledge: 2m 14s                                 │
├─────────────────────────────────────────────────────────────┤
│ 10:33:00.456  [note_added]                                  │
│ Actor: John Doe                                             │
│ Note: "Checking API service logs, seeing 500 errors"       │
├─────────────────────────────────────────────────────────────┤
│ 10:34:30.789  [note_added]                                  │
│ Actor: John Doe                                             │
│ Note: "Database connection pool exhausted (50/50)"         │
├─────────────────────────────────────────────────────────────┤
│ 10:35:00.012  [note_added]                                  │
│ Actor: John Doe                                             │
│ Note: "Root cause: long-running queries blocking pool"     │
├─────────────────────────────────────────────────────────────┤
│ 10:36:00.345  [note_added]                                  │
│ Actor: Jane Smith (usr_jane_456)                            │
│ Note: "Restarting service to clear connection pool"        │
├─────────────────────────────────────────────────────────────┤
│ 10:38:42.678  [resolved]                                    │
│ Actor: System (auto-recovery)                               │
│ Details: Monitor recovered after 3 successful checks        │
│ Downtime: 8m 42s                                            │
│ MTTR: 6m 27s                                                │
└─────────────────────────────────────────────────────────────┘

Accessing the Timeline

Via Dashboard

Navigate to Incidents
Click on an incident
Scroll to Timeline section
View chronological event log

Via API

GET /v1/incidents/:id/timeline

# Response:
{
  "events": [
    {
      "type": "created",
      "timestamp": "2026-02-13T10:30:00.245Z",
      "actor": "System",
      "details": "Incident created after 3 consecutive failures",
      "metadata": {
        "severity": "critical",
        "monitorId": "mon_abc123"
      }
    },
    {
      "type": "acknowledged",
      "timestamp": "2026-02-13T10:32:15.123Z",
      "actor": "John Doe",
      "actorId": "usr_john_123",
      "details": "Incident acknowledged",
      "metadata": {
        "timeToAcknowledge": 134
      }
    }
  ]
}

Use Cases

1. Postmortem Analysis

Review timeline to understand what happened:

When was the incident first detected?
How long until someone acknowledged?
What investigation steps were taken?
How long to resolve?
Were alerts escalated?

2. Compliance & Audit

Timeline provides immutable audit trail:

Who accessed the incident?
What actions were taken?
When was each status change made?
Satisfies SOC 2, ISO 27001 requirements

3. Performance Metrics

Calculate SLA metrics from timeline:

MTTD: Detection timestamp - first failure timestamp
MTTA: Acknowledgment timestamp - detection timestamp
MTTR: Resolution timestamp - acknowledgment timestamp

4. Forensic Investigation

Debug complex incidents:

Correlate events across multiple incidents
Identify patterns in failures
Understand cascading failures

Timeline Metadata

Each event includes rich metadata:

{
  "type": "alert_sent",
  "timestamp": "2026-02-13T10:30:00.567Z",
  "actor": "System",
  "details": "Alert sent to Slack",
  "metadata": {
    "channelId": "chan_slack_123",
    "channelName": "Engineering Team",
    "channelType": "slack",
    "webhookUrl": "https://hooks.slack.com/...",
    "deliveryStatus": "success",
    "responseTime": 234
  }
}

Event Filters

Filter timeline events by type:

# Show only notes
GET /v1/incidents/:id/timeline?type=note_added

# Show only status changes
GET /v1/incidents/:id/timeline?type=status_changed

# Show events by specific user
GET /v1/incidents/:id/timeline?actor=usr_john_123

# Time range filter
GET /v1/incidents/:id/timeline?from=2026-02-13T10:30:00Z&to=2026-02-13T11:00:00Z

Exporting Timeline

JSON Export

GET /v1/incidents/:id/timeline?format=json

# Downloads timeline.json with full event log

CSV Export

GET /v1/incidents/:id/timeline?format=csv

# CSV format:
timestamp,type,actor,details
2026-02-13T10:30:00.245Z,created,System,"Incident created"
2026-02-13T10:32:15.123Z,acknowledged,John Doe,"Incident acknowledged"

Markdown Export

GET /v1/incidents/:id/timeline?format=markdown

# Markdown format for postmortems:
## Incident Timeline

**10:30:00** - Incident created (System)
- Severity: critical
- Monitor: Production API

**10:32:15** - Acknowledged (John Doe)
- Time to acknowledge: 2m 14s

**10:38:42** - Resolved (System)
- Downtime: 8m 42s

Timeline Retention

Plan	Retention	Export
Free	30 days	JSON only
Developer	90 days	JSON, CSV
Pro	1 year	JSON, CSV, Markdown
Enterprise	Unlimited	All formats + API access

Best Practices

1. Add Context-Rich Notes

Include commands, logs, findings:

Good Note:
"Database connection pool at 50/50. Ran `SHOW PROCESSLIST`, found 12 queries >30s. 
Top offender: analytics dashboard query (45s avg). Killed PID 12345."

Bad Note:
"Checked database"

2. Link to External Resources

Datadog dashboard URLs
Sentry error IDs
CloudWatch log streams
GitHub PR/commit links

3. Export for Postmortems

Use Markdown export as postmortem starting point:

Export timeline as Markdown
Add root cause analysis
Add impact metrics
Add action items
Publish to wiki/docs

4. Review Metrics Regularly

Analyze timeline data to improve:

Time to acknowledge trending up? → Improve on-call
Frequent escalations? → Adjust alert thresholds
Long MTTR? → Better runbooks needed

Next Steps

Incident Lifecycle: Full incident flow
Incident Management: Overview and features
Incidents API: Programmatic access
Best Practices: Incident response tips