Alert Rules
Alert Rules define when and how you should be notified about monitor failures. Configure conditions, thresholds, and notification channels.
Creating an Alert Rule
Navigate to Alert Rules
Go to Alerts → Alert Rules in the sidebar and click New Rule.
Select Monitors
Choose which monitors this rule applies to. You can select:
- Specific monitors (e.g., "Production API")
- All monitors with a tag (e.g., tag:production)
- All monitors in the team
Configure Conditions
Set the conditions that trigger the alert. Common conditions include:
- Consecutive failures threshold
- Response time threshold
- SSL certificate expiry days
Select Alert Channels
Choose where notifications should be sent (Email, Slack, PagerDuty, etc.). You must have at least one alert channel configured.
Multiple Channels
Rule Conditions
| Condition | Type | Description |
|---|---|---|
consecutiveFailures | Number | Alert after N failed checks in a row (recommended: 3) |
latencyThresholdMs | Number | Alert if response time exceeds threshold (milliseconds) |
notifyOnRecovery | Boolean | Send notification when monitor recovers |
sslExpiryDays | Number | Alert N days before SSL certificate expires |
muteUntil | Timestamp | Silence alerts until specific time (maintenance window) |
Example Configurations
Basic Alert Rule
Simple rule that alerts after 3 consecutive failures:
{
"name": "Production API Down",
"monitorIds": ["550e8400-e29b-41d4-a716-446655440000"],
"conditions": {
"consecutiveFailures": 3,
"notifyOnRecovery": true
},
"channels": ["email-channel-id", "slack-channel-id"]
}Performance Degradation Alert
Alert when response times are slow, even if the service is still up:
{
"name": "API Slow Response",
"monitorIds": ["550e8400-e29b-41d4-a716-446655440000"],
"conditions": {
"latencyThresholdMs": 2000,
"consecutiveFailures": 2
},
"channels": ["slack-channel-id"]
}Latency Alerts
SSL Certificate Expiry
Get notified before your SSL certificate expires:
{
"name": "SSL Certificate Expiring",
"monitorIds": ["all-https-monitors"],
"conditions": {
"sslExpiryDays": 30
},
"channels": ["email-channel-id"]
}Critical Service with Escalation
Route critical alerts to PagerDuty with on-call escalation:
{
"name": "Payment API Critical",
"monitorIds": ["payment-api-id"],
"conditions": {
"consecutiveFailures": 2,
"notifyOnRecovery": true
},
"channels": ["pagerduty-oncall-id", "slack-incidents-id"],
"priority": "critical"
}Consecutive Failures
The consecutiveFailures condition is the most important for preventing false alarms. It ensures your service has truly failed, not just experienced a transient network issue.
How It Works
- Monitor check fails (e.g., timeout or 500 error)
- Counter increments: 1 failure
- Monitor check fails again (e.g., 60 seconds later)
- Counter increments: 2 failures
- Monitor check fails again
- Counter reaches 3 → Alert triggered
If any check succeeds, the counter resets to 0. This means the service must fail N times in a row before an alert is sent.
Recommended Thresholds
- Critical production services: 2-3 failures (2-3 minutes with 60s interval)
- Standard services: 3-5 failures (3-5 minutes)
- Non-critical services: 5-10 failures (5-10 minutes)
Notify on Recovery
The notifyOnRecovery option sends a notification when your monitor recovers after being down. This is useful for:
- Knowing when an issue is resolved without checking manually
- Measuring Mean Time To Recovery (MTTR)
- Confirming deployments didn't break anything
- Closing incident loops
{
"event": "monitor.up",
"monitor": {
"id": "550e8400-e29b-41d4-a716-446655440000",
"name": "Production API",
"status": "up"
},
"downtime": "5m 32s",
"timestamp": "2024-02-14T10:05:32Z"
}Rule Priority
Assign priority levels to rules for better organization and routing:
| Priority | Use Case | Example |
|---|---|---|
| Critical | Revenue-impacting services | Payment API, Checkout flow |
| High | Core product functionality | User authentication, Core API |
| Medium | Important but not critical | Analytics, Background jobs |
| Low | Non-critical services | Marketing site, Blog |
Tag-Based Rules
Instead of selecting specific monitors, you can create rules based on tags. This is powerful for managing many monitors:
{
"name": "All Production Services",
"tags": ["production", "critical"],
"conditions": {
"consecutiveFailures": 2,
"notifyOnRecovery": true
},
"channels": ["pagerduty-oncall-id"]
}When you add a new monitor with the production tag, this rule automatically applies to it. No need to update the rule!
Scalable Alert Management
Maintenance Windows
Use the muteUntil condition to temporarily silence alerts during planned maintenance:
{
"name": "Database Maintenance",
"monitorIds": ["database-monitor-id"],
"conditions": {
"consecutiveFailures": 3,
"muteUntil": "2024-02-15T02:00:00Z"
}
}Alerts will be silenced until the specified timestamp. After that, alerting automatically resumes.
Rule Testing
Before deploying an alert rule, test it to ensure notifications are working:
- Create the alert rule with your desired conditions
- Click Send Test Alert in the rule details
- Verify you receive the test notification in all configured channels
- Check that the message format and content are correct
Test in Production
Best Practices
Start Conservative
Begin with higher consecutive failure thresholds (5-10) and gradually reduce them as you gain confidence in your monitoring setup. It's better to miss an alert initially than to wake up the on-call engineer at 3 AM for a false alarm.
Use Multiple Rules
Create different rules for different scenarios:
- Rule 1: Critical failures (2 consecutive) → PagerDuty
- Rule 2: Performance degradation (5s latency) → Slack
- Rule 3: SSL expiry (30 days) → Email
Review Alert History
Regularly check your alert history to identify:
- False positives (adjust consecutive failures threshold)
- Missing alerts (lower thresholds or add more rules)
- Noisy monitors (silence or adjust intervals)
API Access
You can manage alert rules programmatically via the API:
curl -X POST "https://api.blacktide.xyz/v1/alert-rules" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Production API Down",
"monitorIds": ["550e8400-e29b-41d4-a716-446655440000"],
"conditions": {
"consecutiveFailures": 3,
"notifyOnRecovery": true
},
"channels": ["email-channel-id"]
}'See the Alert Rules API Reference for full documentation.
Next Steps
- Configure Alert Channels - Set up notification destinations
- Alert Silencing - Manage maintenance windows
- Integrations - Detailed setup guides for each channel type
- Best Practices - Advanced alerting strategies