High Availability (HA) Deployment: Eliminate SPOF via Replicated Deployment (on Docker Swarm & Kubernetes)

**Problem**
Peekaping can become a single point of failure in a single node deployment. If the instance running the monitoring service goes down — for example due to hardware issues, restarts, updates, or network outages — all monitoring functionality is immediately lost. In production environments this is critical, because the monitoring system must remain reliable even if parts of the infrastructure fail.

**Proposed solution**
Enable a high availability setup with ≥3 replicas where exactly one replica performs monitoring checks. Use a leader-election mechanism for scheduler ownership and automatic failover so a standby replica takes over when the active one fails. Prevent duplicate checks during transitions.

**Scope & Non-Goals**
* In scope: multi-replica operation; single active scheduler; leader election; clean failover behavior
* Out of scope: provisioning/operation of external reverse proxies or load balancers — users handle these

**Suggestions for documentation**
* Provide reference deployments for Docker Swarm (simple HA entry point) and Kubernetes (for advanced setups)
* Include guidance on running multiple replicas, health/readiness checks, and safe configuration of any shared state
* Suggest Traefik as an example reverse proxy in the docs

**Technical notes (informative)**
* Centralize scheduler state (e.g., datastore/coordination backend) to avoid split-brain and ensure idempotent job execution
* Define reasonable leader heartbeat and takeover intervals; document expected failover timing
* Clarify rolling-update behavior to avoid unnecessary leadership churn during upgrades

**Alternatives**
To the best of my knowledge, there is no comparable open-source monitoring tool with built-in HA. Practical alternatives are only cloud-hosted services (e.g. UptimeRobot) for enterprise use cases (or a different kind solution like a observability platform like New Relic).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

High Availability (HA) Deployment: Eliminate SPOF via Replicated Deployment (on Docker Swarm & Kubernetes) #175

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

High Availability (HA) Deployment: Eliminate SPOF via Replicated Deployment (on Docker Swarm & Kubernetes) #175

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions