Skip to main content
Intelligent Alerting

Intelligent Alerting That Doesn't Wake You Up at 3 AM

The average on-call engineer receives 500+ alerts per week. 95% are noise. ML-based deduplication groups related alerts from across your infrastructure into single actionable incidents - so when your phone rings, it actually matters.

90%
Alert noise reduction
95%+
Deduplication accuracy
6
Supported alert channels
<30s
P0 response time

Alert fatigue is an engineering retention problem

  • 500+ alerts per week with 95% noise trains engineers to ignore alerts - including the ones that matter
  • On-call burnout from false positives at 2 AM erodes team morale and accelerates attrition. The Google SRE Workbook chapter on being on-call covers the systemic cost of noisy alerting in detail - it is one of the most impactful problems in infrastructure engineering. Google SRE Workbook: Being On-Call
  • No severity routing means every alert goes to every channel - P3 gas spikes alongside P0 outages
  • Manual silencing rules take hours to configure and expire at the wrong moment during planned maintenance

Fewer alerts, higher signal, better on-call experience

  • ML deduplication groups correlated alerts from across all monitors into single, contextualized incidents
  • Automatic severity routing sends P0 to phone and Telegram immediately, P3 to email digest at business hours
  • Maintenance windows and smart silencing rules suppress expected alerts during planned downtime automatically
  • P0 alerts auto-create incidents with full monitor context - no manual triage step between alert and response
  • The Google SRE Workbook on alerting defines actionable alerts as those that require immediate human action and have a documented runbook. BlackTide applies this principle at the routing layer - P0 alerts are only those that meet both criteria for your specific blockchain infrastructure. Google SRE Workbook on alerting

Capabilities

Alerting that works with your team, not against it

ML deduplication, severity routing, and smart silencing - designed to restore trust in your alert stream.

ML-based deduplication

Related alerts from across monitors are automatically grouped into single incidents using temporal and semantic correlation - drastically reducing notification volume without hiding real problems. For Web3 infrastructure, a single RPC provider failure can generate dozens of correlated alerts across oracle monitors, block height checks, and DeFi health monitors simultaneously. Without deduplication, that becomes 20 pages for what is effectively one incident.

Severity-based routing

P0 goes to phone and Telegram in under 30 seconds. P1 goes to Slack. P3 goes to an email digest. Routing rules are configurable per team, per service, and per severity level. Routing configuration is per-team and per-service, so the DeFi team's oracle monitor routes differently from the validator team's sync monitor. Each team defines its own severity thresholds and channel preferences without affecting other teams.

Smart silencing and maintenance windows

Schedule maintenance windows to suppress expected alerts during planned downtime. Create silence rules based on monitor, chain, or alert type - all without touching config files. Maintenance windows support recurring schedules (weekly maintenance at 02:00–04:00 UTC every Tuesday), so you configure once and the silence applies automatically on every occurrence - no manual re-configuration each week.

Auto-incident creation

P0 alerts automatically create incidents pre-populated with monitor context, affected chain, block height, and alert timeline - the triage step happens before your phone rings. Auto-created incidents include the monitor's alert timeline (all prior state changes), the triggering metric and threshold, the affected chain and block height, and a direct link to the monitor configuration - so your engineer has full context before opening the first runbook step.

Per-user channel preferences

Each team member configures their own notification preferences: which channels to use, which severities to receive, and quiet hours for non-critical alerts. Quiet hours allow engineers to mute non-critical alerts during off-hours without missing P0 escalations. P0 alerts always bypass quiet hours - ensuring that on-call engineers are never unreachable for true emergencies.

Use Cases

Who benefits most from intelligent alerting

SRE team receiving 20 pages per night from correlated alerts

Each alert was individually valid, but they all traced back to a single upstream RPC failure. ML deduplication collapsed 20 alerts into 1 incident - and the team slept through the night.

Validator operator distinguishing slashing risk from routine restarts

Not every node restart is an emergency. Severity routing and ML context detection correctly classifies routine maintenance restarts as P3 while flagging genuine slashing risk as P0.

DeFi protocol needing P0 on oracle failures, P3 on gas spikes

Oracle failures block trades and require immediate response. Gas spikes are informational. Severity-based routing gives each signal the attention it deserves without polluting the P0 channel.

BlackTide vs dedicated alerting platforms

Enterprise alerting tools add complexity. BlackTide adds signal.

FeatureBlackTidePagerDutyOpsgenie
ML-based alert deduplicationpartial
Web3 / chain context in alerts
Severity-based multi-channel routing
Auto-incident creation from P0partial
Smart maintenance windows
Quiet hours with P0 bypasspartialpartial
Recurring maintenance window schedulespartial
Pricing for small teamsAffordableExpensiveModerate

Frequently asked questions

How does the ML deduplication work?
BlackTide's deduplication engine analyzes temporal proximity, shared monitor attributes (chain, service, region), and alert type correlation to group related alerts into a single incident. The model is continuously refined based on how your team interacts with grouped alerts - acknowledging, merging, or splitting them to improve future accuracy.
Does it integrate with PagerDuty or Opsgenie?
Yes. BlackTide can forward P0 incidents to PagerDuty or Opsgenie via webhook integration if your team uses those platforms for on-call scheduling or compliance requirements. Many teams use BlackTide for deduplication and routing while keeping PagerDuty for on-call paging to legacy systems.
Can I set up on-call rotations?
Yes. On-call rotation schedules, escalation policies, and override windows are all configurable directly within BlackTide. You can define primary and secondary responders per team, set escalation timeouts, and allow engineers to block quiet hours for non-critical severities.
What channels are supported?
BlackTide supports Slack, Telegram, Discord, email, SMS (via Twilio), PagerDuty webhooks, and Opsgenie webhooks. Each user and team can configure independent channel preferences per severity level.
How do maintenance windows work?
You can schedule a maintenance window for any monitor, chain, or service group with a defined start and end time. During the window, alerts matching the scope are suppressed and logged but not delivered. If the window expires while an alert is still active, the alert fires immediately.
What makes an alert "actionable" in BlackTide?
Following the Google SRE definition, an actionable alert is one that requires immediate human intervention and has a documented response path. BlackTide enforces this at configuration time: every alert rule requires you to define a severity level and a notification channel before it can be activated. This prevents the most common alerting antipattern - creating alerts without thinking about who receives them and what they should do. Combined with ML deduplication, this ensures that every alert your on-call engineer receives represents a real decision point.
Is it GDPR compliant?
Yes. BlackTide is GDPR compliant and processes alert data within EU infrastructure by default. Data retention periods are configurable, personal data (user contact details) is stored encrypted, and a Data Processing Agreement is available for enterprise customers upon request.

Sleep through the night. Wake up to fewer, better alerts.

ML-powered deduplication that understands your infrastructure.