Core Concepts
Understand the fundamental building blocks of BlackTide. This guide explains key concepts that you'll encounter throughout the platform.
Monitor
A Monitor is a configured check that runs periodically to verify if a service is operational. Monitors continuously test your endpoints and record the results.
Monitor Types
Traditional Monitors
- HTTP - Web APIs and endpoints
- TCP - Port connectivity
- ICMP - Ping checks
- TLS - SSL certificate validity
- Heartbeat - Cron job verification
Web3 Monitors
- Gas Price - EVM gas tracking
- Whale Wallet - Balance monitoring
- Contract Events - Smart contract logs
- Liquidation - DeFi health factor
- DeFi Protocol - TVL and pool health
- Subgraph - The Graph monitoring
- Bridge - Cross-chain transfers
Multi-Location Checks
Check
A Check is a single execution of a monitor. For example, if your HTTP monitor runs every 60 seconds, it creates 1,440 checks per day.
Check Results
- Success - Service responded as expected
- Failure - Service is down or responded incorrectly
- Timeout - Service didn't respond within the timeout period
Each check records metadata: response time, status code, location, timestamp, and error details (if any).
Incident
An Incident is created automatically when a monitor fails multiple consecutive checks. Incidents track the lifecycle of an outage from detection to resolution.
Incident Lifecycle
- Open - Incident detected, monitor is down
- Acknowledged - Team member is investigating
- Resolved - Service recovered and incident closed
Manual Incidents
Incident Timeline
Every incident has an immutable timeline that records all events:
- Incident created (monitor down)
- Team member acknowledged
- Notes and updates added
- Monitor recovered
- Incident resolved
This timeline is crucial for post-mortem analysis and understanding Mean Time To Resolution (MTTR).
Alert Rule
An Alert Rule defines when and how you should be notified about monitor failures. Rules contain conditions and associated alert channels.
Rule Conditions
| Condition | Description |
|---|---|
consecutiveFailures | Alert after N failed checks in a row (prevents false alarms) |
notifyOnRecovery | Send notification when service recovers |
latencyThreshold | Alert if response time exceeds threshold (milliseconds) |
silenceWindow | Don't alert during maintenance windows |
Consecutive Failures Best Practice
consecutiveFailures to 3 or higher to avoid false alarms from transient network issues. This means 3 checks must fail before an alert is sent.Alert Channel
An Alert Channel is a destination where notifications are sent. You can have multiple channels and assign them to different alert rules.
Available Channels
- Email - Send to individual addresses or distribution lists
- Slack - Post to specific channels with threaded updates
- Discord - Send to community or engineering channels
- Telegram - Instant mobile notifications
- PagerDuty - Trigger incidents with on-call rotation
- Opsgenie - Alert Opsgenie schedules
- Webhooks - Custom integrations with any HTTP endpoint
Status Page
A Status Page is a public-facing dashboard that shows the real-time health of your services to end users. Status pages help reduce support tickets during outages.
Status Page Features
- Component Status - Display monitored services as components (operational, degraded, down, maintenance)
- Incident History - Show recent and ongoing incidents
- Uptime Metrics - 90-day uptime percentage per component
- Email Subscriptions - Users can subscribe to updates
- Custom Branding - Logo, colors, and domain customization
Public & Private Pages
Uptime Percentage
Uptime is calculated as the ratio of successful checks to total checks over a time period:
Uptime % = (Successful Checks / Total Checks) × 100For example, if 1,430 out of 1,440 checks succeeded in 24 hours, the uptime is 99.31%.
Industry Standards
| Uptime % | Downtime/Year | Classification |
|---|---|---|
| 99.9% ("three nines") | 8.76 hours | Good |
| 99.95% | 4.38 hours | Very Good |
| 99.99% ("four nines") | 52.56 minutes | Excellent |
| 99.999% ("five nines") | 5.26 minutes | Enterprise |
Check Interval
The Check Interval determines how frequently a monitor runs. BlackTide supports intervals from 30 seconds to 24 hours.
Interval Selection
- 30-60s - Critical production services (APIs, databases)
- 5 min - Standard web services
- 15-30 min - Non-critical services or batch jobs
- 1-24 hrs - Daily health checks or scheduled tasks
Next Steps
Now that you understand the core concepts, explore how to use them:
- Quick Start Guide - Create your first monitor
- HTTP Monitor Guide - Configure HTTP checks
- Alert Rules - Set up smart notifications
- Status Pages - Build public status dashboards