Core Concepts

Understand the fundamental building blocks of BlackTide. This guide explains key concepts that you'll encounter throughout the platform.

Monitor

A Monitor is a configured check that runs periodically to verify if a service is operational. Monitors continuously test your endpoints and record the results.

Monitor Types

Traditional Monitors

HTTP - Web APIs and endpoints
TCP - Port connectivity
ICMP - Ping checks
TLS - SSL certificate validity
Heartbeat - Cron job verification

Web3 Monitors

Gas Price - EVM gas tracking
Whale Wallet - Balance monitoring
Contract Events - Smart contract logs
Liquidation - DeFi health factor
DeFi Protocol - TVL and pool health
Subgraph - The Graph monitoring
Bridge - Cross-chain transfers

Multi-Location Checks

All monitors can run from multiple locations (US East, US West, EU West, EU Central, Asia Pacific, South America). This provides geographic redundancy and accurate global uptime measurement.

Check

A Check is a single execution of a monitor. For example, if your HTTP monitor runs every 60 seconds, it creates 1,440 checks per day.

Check Results

Success - Service responded as expected
Failure - Service is down or responded incorrectly
Timeout - Service didn't respond within the timeout period

Each check records metadata: response time, status code, location, timestamp, and error details (if any).

Incident

An Incident is created automatically when a monitor fails multiple consecutive checks. Incidents track the lifecycle of an outage from detection to resolution.

Incident Lifecycle

Open - Incident detected, monitor is down
Acknowledged - Team member is investigating
Resolved - Service recovered and incident closed

Manual Incidents

You can also create manual incidents for scheduled maintenance or issues detected outside of monitoring.

Incident Timeline

Every incident has an immutable timeline that records all events:

Incident created (monitor down)
Team member acknowledged
Notes and updates added
Monitor recovered
Incident resolved

This timeline is crucial for post-mortem analysis and understanding Mean Time To Resolution (MTTR).

Alert Rule

An Alert Rule defines when and how you should be notified about monitor failures. Rules contain conditions and associated alert channels.

Rule Conditions

Condition	Description
`consecutiveFailures`	Alert after N failed checks in a row (prevents false alarms)
`notifyOnRecovery`	Send notification when service recovers
`latencyThreshold`	Alert if response time exceeds threshold (milliseconds)
`silenceWindow`	Don't alert during maintenance windows

Consecutive Failures Best Practice

Set consecutiveFailures to 3 or higher to avoid false alarms from transient network issues. This means 3 checks must fail before an alert is sent.

Alert Channel

An Alert Channel is a destination where notifications are sent. You can have multiple channels and assign them to different alert rules.

Available Channels

Email - Send to individual addresses or distribution lists
Slack - Post to specific channels with threaded updates
Discord - Send to community or engineering channels
Telegram - Instant mobile notifications
PagerDuty - Trigger incidents with on-call rotation
Opsgenie - Alert Opsgenie schedules
Webhooks - Custom integrations with any HTTP endpoint

Status Page

A Status Page is a public-facing dashboard that shows the real-time health of your services to end users. Status pages help reduce support tickets during outages.

Status Page Features

Component Status - Display monitored services as components (operational, degraded, down, maintenance)
Incident History - Show recent and ongoing incidents
Uptime Metrics - 90-day uptime percentage per component
Email Subscriptions - Users can subscribe to updates
Custom Branding - Logo, colors, and domain customization

Public & Private Pages

Status pages can be public (accessible to anyone) or private (password-protected for internal teams).

Uptime Percentage

Uptime is calculated as the ratio of successful checks to total checks over a time period:

Uptime % = (Successful Checks / Total Checks) × 100

For example, if 1,430 out of 1,440 checks succeeded in 24 hours, the uptime is 99.31%.

Industry Standards

Uptime %	Downtime/Year	Classification
99.9% ("three nines")	8.76 hours	Good
99.95%	4.38 hours	Very Good
99.99% ("four nines")	52.56 minutes	Excellent
99.999% ("five nines")	5.26 minutes	Enterprise

Check Interval

The Check Interval determines how frequently a monitor runs. BlackTide supports intervals from 30 seconds to 24 hours.

Interval Selection

30-60s - Critical production services (APIs, databases)
5 min - Standard web services
15-30 min - Non-critical services or batch jobs
1-24 hrs - Daily health checks or scheduled tasks

Next Steps

Now that you understand the core concepts, explore how to use them:

Quick Start Guide - Create your first monitor
HTTP Monitor Guide - Configure HTTP checks
Alert Rules - Set up smart notifications
Status Pages - Build public status dashboards