Skip to main content

Alert Rules

Alert Rules define when and how you should be notified about monitor failures. Configure conditions, thresholds, and notification channels.

Creating an Alert Rule

1

Navigate to Alert Rules

Go to AlertsAlert Rules in the sidebar and click New Rule.

2

Select Monitors

Choose which monitors this rule applies to. You can select:

  • Specific monitors (e.g., "Production API")
  • All monitors with a tag (e.g., tag:production)
  • All monitors in the team
3

Configure Conditions

Set the conditions that trigger the alert. Common conditions include:

  • Consecutive failures threshold
  • Response time threshold
  • SSL certificate expiry days
4

Select Alert Channels

Choose where notifications should be sent (Email, Slack, PagerDuty, etc.). You must have at least one alert channel configured.

Rule Conditions

ConditionTypeDescription
consecutiveFailuresNumberAlert after N failed checks in a row (recommended: 3)
latencyThresholdMsNumberAlert if response time exceeds threshold (milliseconds)
notifyOnRecoveryBooleanSend notification when monitor recovers
sslExpiryDaysNumberAlert N days before SSL certificate expires
muteUntilTimestampSilence alerts until specific time (maintenance window)

Example Configurations

Basic Alert Rule

Simple rule that alerts after 3 consecutive failures:

Basic Configuration
json
{
  "name": "Production API Down",
  "monitorIds": ["550e8400-e29b-41d4-a716-446655440000"],
  "conditions": {
    "consecutiveFailures": 3,
    "notifyOnRecovery": true
  },
  "channels": ["email-channel-id", "slack-channel-id"]
}

Performance Degradation Alert

Alert when response times are slow, even if the service is still up:

Latency Alert
json
{
  "name": "API Slow Response",
  "monitorIds": ["550e8400-e29b-41d4-a716-446655440000"],
  "conditions": {
    "latencyThresholdMs": 2000,
    "consecutiveFailures": 2
  },
  "channels": ["slack-channel-id"]
}

SSL Certificate Expiry

Get notified before your SSL certificate expires:

SSL Expiry Alert
json
{
  "name": "SSL Certificate Expiring",
  "monitorIds": ["all-https-monitors"],
  "conditions": {
    "sslExpiryDays": 30
  },
  "channels": ["email-channel-id"]
}

Critical Service with Escalation

Route critical alerts to PagerDuty with on-call escalation:

Critical Alert with Escalation
json
{
  "name": "Payment API Critical",
  "monitorIds": ["payment-api-id"],
  "conditions": {
    "consecutiveFailures": 2,
    "notifyOnRecovery": true
  },
  "channels": ["pagerduty-oncall-id", "slack-incidents-id"],
  "priority": "critical"
}

Consecutive Failures

The consecutiveFailures condition is the most important for preventing false alarms. It ensures your service has truly failed, not just experienced a transient network issue.

How It Works

  1. Monitor check fails (e.g., timeout or 500 error)
  2. Counter increments: 1 failure
  3. Monitor check fails again (e.g., 60 seconds later)
  4. Counter increments: 2 failures
  5. Monitor check fails again
  6. Counter reaches 3 → Alert triggered

If any check succeeds, the counter resets to 0. This means the service must fail N times in a row before an alert is sent.

Notify on Recovery

The notifyOnRecovery option sends a notification when your monitor recovers after being down. This is useful for:

  • Knowing when an issue is resolved without checking manually
  • Measuring Mean Time To Recovery (MTTR)
  • Confirming deployments didn't break anything
  • Closing incident loops
Recovery Notification
json
{
  "event": "monitor.up",
  "monitor": {
    "id": "550e8400-e29b-41d4-a716-446655440000",
    "name": "Production API",
    "status": "up"
  },
  "downtime": "5m 32s",
  "timestamp": "2024-02-14T10:05:32Z"
}

Rule Priority

Assign priority levels to rules for better organization and routing:

PriorityUse CaseExample
CriticalRevenue-impacting servicesPayment API, Checkout flow
HighCore product functionalityUser authentication, Core API
MediumImportant but not criticalAnalytics, Background jobs
LowNon-critical servicesMarketing site, Blog

Tag-Based Rules

Instead of selecting specific monitors, you can create rules based on tags. This is powerful for managing many monitors:

Tag-Based Rule
json
{
  "name": "All Production Services",
  "tags": ["production", "critical"],
  "conditions": {
    "consecutiveFailures": 2,
    "notifyOnRecovery": true
  },
  "channels": ["pagerduty-oncall-id"]
}

When you add a new monitor with the production tag, this rule automatically applies to it. No need to update the rule!

Maintenance Windows

Use the muteUntil condition to temporarily silence alerts during planned maintenance:

Muted Rule
json
{
  "name": "Database Maintenance",
  "monitorIds": ["database-monitor-id"],
  "conditions": {
    "consecutiveFailures": 3,
    "muteUntil": "2024-02-15T02:00:00Z"
  }
}

Alerts will be silenced until the specified timestamp. After that, alerting automatically resumes.

Rule Testing

Before deploying an alert rule, test it to ensure notifications are working:

  1. Create the alert rule with your desired conditions
  2. Click Send Test Alert in the rule details
  3. Verify you receive the test notification in all configured channels
  4. Check that the message format and content are correct

Best Practices

Start Conservative

Begin with higher consecutive failure thresholds (5-10) and gradually reduce them as you gain confidence in your monitoring setup. It's better to miss an alert initially than to wake up the on-call engineer at 3 AM for a false alarm.

Use Multiple Rules

Create different rules for different scenarios:

  • Rule 1: Critical failures (2 consecutive) → PagerDuty
  • Rule 2: Performance degradation (5s latency) → Slack
  • Rule 3: SSL expiry (30 days) → Email

Review Alert History

Regularly check your alert history to identify:

  • False positives (adjust consecutive failures threshold)
  • Missing alerts (lower thresholds or add more rules)
  • Noisy monitors (silence or adjust intervals)

API Access

You can manage alert rules programmatically via the API:

Create Alert Rule via API
bash
curl -X POST "https://api.blacktide.xyz/v1/alert-rules" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Production API Down",
    "monitorIds": ["550e8400-e29b-41d4-a716-446655440000"],
    "conditions": {
      "consecutiveFailures": 3,
      "notifyOnRecovery": true
    },
    "channels": ["email-channel-id"]
  }'

See the Alert Rules API Reference for full documentation.

Next Steps