Skip to main content

System Architecture

Learn how BlackTide executes millions of checks per day with low latency, reliable alerting, and real-time data aggregation.

High-Level Overview

BlackTide is built on a modern microservices architecture with 12 specialized services:

Core Services

  • Core API Service (Port 8080): Authentication, CRUD operations, rate limiting, CSRF protection
  • Scheduler Service (Port 8082): Manages check scheduling and publishes to NATS queue
  • Check Runners (Ports 8083-8084): Execute checks (HTTP, TCP, ICMP, TLS, Web3, etc.)
  • Ingestion & Alerts (Port 8085): Store results and evaluate alert rules
  • Incident & Notifications (Port 8086): Manage incidents and send notifications
  • Status Pages (Port 8087): Serve public/private status pages
  • Aggregation Timer: NATS publisher for materialized views

P0 Differentiation Services (Web3)

  • Transaction Indexer (Port 8090): Whale tracking and smart money analytics
  • MEV Detector (Port 8089): Sandwich attack detection
  • Security Analyzer (Port 8088): Exploit detection and auto-pause circuit breaker

Content Services

  • Blog Service Backend (Port 8091): FastAPI blog API (Python 3.9)
  • Blog Service Frontend (Port 3001): Next.js blog (SSR + SSG)

Data Flow

1. Check Scheduling

Scheduling Flow
text
User creates monitor via frontend
Core API validates and stores monitor config
Scheduler Service picks up monitor (via polling or NATS)
Scheduler publishes "check.execute" event to NATS JetStream

2. Check Execution

Execution Flow
text
Check Runners subscribe to NATS queue
Runner fetches monitor config
Execute check from configured location (US East, EU West, etc.)
Publish "check.completed" event to NATS with result

3. Result Ingestion

Ingestion Flow
text
Ingestion Service receives "check.completed" event
Store result in PostgreSQL (partitioned by time range)
Update Redis cache with latest status
Evaluate alert rules for this monitor
If rule triggers → publish "alert.triggered" event

4. Alert & Notification

Notification Flow
text
Incident Service receives "alert.triggered" event
Create or update incident
Notification Service sends alerts to configured channels
Email, Slack, Discord, PagerDuty, etc.

Database Architecture

PostgreSQL Partitioning

We use PostgreSQL 15+ native partitioning for time-series data to maintain high performance as data grows:

  • check_results - Partitioned by week (PARTITION BY RANGE on created_at)
  • incidents_timeline - Partitioned by month
  • notifications_log - Partitioned by month

Retention Policies

Data TypeRetentionAggregation
Raw check results90 days1-minute granularity
Hourly aggregates1 yearAvg/min/max/p95 latency
Daily aggregatesForeverUptime percentage
IncidentsForeverFull timeline

Scalability

Horizontal Scaling

All services are stateless and can scale horizontally:

  • Check Runners - Scale to 50+ instances for high check volume
  • Ingestion Service - Multiple instances consume from NATS queue
  • API Service - Load balanced across multiple instances

Load Distribution

Checks are distributed across 6 global locations:

  • US East (Virginia)
  • US West (California)
  • EU West (Ireland)
  • EU Central (Frankfurt)
  • Asia Pacific (Singapore)
  • South America (São Paulo)

High Availability

Redundancy

  • Database - PostgreSQL with streaming replication (primary + 2 replicas)
  • Redis - Sentinel mode with automatic failover
  • NATS - JetStream cluster (3 nodes)
  • Services - Multiple instances behind load balancer

Failure Handling

  • NATS retries - Failed checks are retried with exponential backoff
  • Circuit breaker - Services auto-pause when downstream dependencies fail
  • Health checks - Consul monitors all services and removes unhealthy instances

Security

Multi-Layer Protection

  • Authentication - httpOnly cookies (XSS-immune) + JWT
  • CSRF Protection - Double Submit Cookie pattern
  • Security Headers - CSP, X-Frame-Options, HSTS, etc.
  • CORS - Configured with credentials for specific origins only
  • Rate Limiting - Redis token bucket per user/IP

Observability

Monitoring & Metrics

  • Prometheus - Metrics collection from all services
  • Grafana - Real-time dashboards and alerts
  • Consul - Service health and discovery
  • Logs - Structured JSON logging with correlation IDs

Key Metrics

MetricTargetPurpose
API Response Time (p95)<200msUser experience
Check Execution Time<30sTimely alerts
NATS Queue Depth<1000Processing backlog
Database Query Time (p95)<50msData access speed

Deployment

Infrastructure

  • Container Orchestration - Docker + Docker Compose (production)
  • CI/CD - GitHub Actions for automated builds and deployments
  • Image Registry - GitHub Container Registry (GHCR)
  • Reverse Proxy - Nginx with SSL/TLS termination

Deployment Process

Deployment Commands
bash
# Build all services (multi-platform: amd64 + arm64)
make build-and-push

# Deploy via Ansible
ansible-playbook deploy.yml

# Health check all services
make health-check

Performance Optimizations

Caching Strategy

  • Monitor configs - Cached in Redis for 5 minutes
  • Latest check results - Cached for real-time dashboard
  • Uptime aggregates - Cached for 1 hour
  • Status page data - Cached for 30 seconds (public)

Database Optimizations

  • Indexes - B-tree on monitor_id, created_at, status
  • Partial indexes - Index only failed checks for faster incident queries
  • Connection pooling - Max 100 connections per service
  • Query optimization - Use EXPLAIN ANALYZE for slow queries

Future Roadmap

Planned Improvements

  • Kubernetes - Migrate from Docker Compose to K8s for better orchestration
  • Multi-region - Deploy services in multiple AWS regions for global HA
  • GraphQL API - Add GraphQL endpoint alongside REST API
  • Real-time WebSocket - Live dashboard updates without polling
  • Machine Learning - Anomaly detection for unusual patterns

Technical Documentation

For more technical details, see:

  • backend/ARCHITECTURE_MIGRATION.md - TimescaleDB → PostgreSQL migration
  • backend/docker-compose.yml - Service configuration
  • backend/README.md - Backend development guide
  • API Reference - Complete API documentation