K8SPG-859 [POC] Percona Server MySQL Hibernation Feature #1092

hors · 2025-09-23T12:25:03Z

CHANGE DESCRIPTION

Problem:
This PR implements a hibernation feature for Percona Server MySQL clusters that allows automatic pausing and unpausing based on cron schedules. This is particularly useful for development environments, test clusters, or any scenario where you want to automatically stop MySQL clusters during off-hours to save resources.

🎯 Key Features

✅ Core Hibernation Functionality

Automatic Pause/Unpause: Schedule-based hibernation using cron expressions
Manual Override: Manual pause/unpause via spec.pause field
State Synchronization: Hibernation state automatically syncs with cluster state
Health Checks: Only allows hibernation when cluster is in Ready state
Backup/Restore Awareness: Prevents hibernation during active backups or restores

✅ Smart Scheduling Logic

Next Window Scheduling: If cluster is unhealthy during scheduled time, automatically schedules for next window
Schedule Change Detection: Automatically updates next pause/unpause times when schedules change
First-time Evaluation: Handles initial hibernation setup correctly
Proactive Scheduling: Prevents immediate pausing when cluster becomes ready after being unready

✅ Robust Error Handling

Invalid Schedule Handling: Gracefully handles invalid cron expressions
Cluster State Management: Proper handling of Initializing, Error, Stopping, Paused, and Ready states
Race Condition Prevention: Prevents state flipping during cluster startup/recovery

🏗️ Architecture

New Controller: `PerconaServerMySQLHibernationReconciler`

Dedicated controller for hibernation logic
Registered in cmd/manager/main.go
RBAC permissions for PS objects and backup/restore resources

Enhanced CRD Fields

spec:
  hibernation:
    enabled: true
    schedule:
      pause: "0 18 * * 1-5"    # 6 PM Mon-Fri
      unpause: "0 8 * * 1-5"   # 8 AM Mon-Fri
  pause: false  # Manual override

Status Fields

status:
  hibernation:
    state: "Active"  # Active, Paused, Scheduled, Blocked, Disabled
    nextPauseTime: "2025-09-24T18:00:00Z"
    nextUnpauseTime: "2025-09-25T08:00:00Z"
    lastPauseTime: "2025-09-23T18:00:00Z"
    lastUnpauseTime: "2025-09-24T08:00:00Z"
    reason: "Cluster not ready during scheduled time"

CHECKLIST

Jira

Is the Jira ticket created and referenced properly?
Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

Is an E2E test/test case added for the new feature/change?
Are unit tests added where appropriate?

Config/Logging/Testability

Are all needed new/changed options added to default YAML files?
Are all needed new/changed options added to the Helm Chart?
Did we add proper logging messages for operator actions?
Did we ensure compatibility with the previous version or cluster upgrade process?
Does the change support oldest and newest supported PS version?
Does the change support oldest and newest supported Kubernetes version?

JNKPercona · 2025-09-24T17:43:01Z

Test Name	Result	Duration
version-service	passed	00:00:00
async-ignore-annotations	passed	00:00:00
async-global-metadata	passed	00:00:00
auto-config	passed	00:00:00
config	passed	00:00:00
config-router	passed	00:00:00
demand-backup-minio	passed	00:00:00
demand-backup-cloud	passed	00:00:00
async-data-at-rest-encryption	passed	00:00:00
gr-global-metadata	failure	00:15:24
gr-data-at-rest-encryption	failure	00:17:33
gr-demand-backup-minio	failure	00:13:37
gr-demand-backup-cloud	failure	00:13:06
gr-demand-backup-haproxy	passed	00:00:00
gr-finalizer	passed	00:00:00
gr-haproxy	passed	00:00:00
gr-ignore-annotations	passed	00:00:00
gr-init-deploy	passed	00:00:00
gr-one-pod	failure	00:10:08
gr-recreate	failure	00:06:22
gr-scaling	passed	00:00:00
gr-scheduled-backup	passed	00:00:00
gr-security-context	passed	00:00:00
gr-self-healing	passed	00:00:00
gr-tls-cert-manager	passed	00:00:00
gr-users	passed	00:00:00
haproxy	passed	00:00:00
init-deploy	passed	00:00:00
limits	passed	00:00:00
monitoring	passed	00:00:00
one-pod	passed	00:00:00
operator-self-healing	passed	00:00:00
recreate	passed	00:00:00
scaling	passed	00:00:00
scheduled-backup	passed	00:00:00
service-per-pod	passed	00:00:00
sidecars	passed	00:00:00
smart-update	passed	00:00:00
storage	passed	00:00:00
telemetry	passed	00:00:00
tls-cert-manager	passed	00:00:00
users	passed	00:00:00
pvc-resize	passed	00:00:00
We run 43 out of 43		01:16:12

commit: 78dcbca
image: perconalab/percona-server-mysql-operator:PR-1092-78dcbca6

hors added 2 commits September 19, 2025 19:38

init

c736cf7

fix bugs

6bc0160

pull-request-size bot added the size/XXL 1000+ lines label Sep 23, 2025

hors changed the title ~~POC: Percona Server MySQL Hibernation Feature~~ K8SPG-859 POC: Percona Server MySQL Hibernation Feature Sep 23, 2025

hors added 3 commits September 23, 2025 20:34

fix go lint

b4e399e

fix tests

12d917e

Merge branch 'main' into hibernation

5536c88

hors changed the title ~~K8SPG-859 POC: Percona Server MySQL Hibernation Feature~~ K8SPG-859 [POC] Percona Server MySQL Hibernation Feature Sep 23, 2025

hors force-pushed the hibernation branch from 7c897a6 to 62e3c89 Compare September 23, 2025 18:24

fix lint

0258525

hors force-pushed the hibernation branch from 62e3c89 to 0258525 Compare September 23, 2025 18:45

hors mentioned this pull request Sep 23, 2025

[K8SPS-554] Implement automated hibernation feature for Percona Server MySQL clusters percona/roadmap#121

Open

fix cr

78dcbca

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

K8SPG-859 [POC] Percona Server MySQL Hibernation Feature #1092

K8SPG-859 [POC] Percona Server MySQL Hibernation Feature #1092

hors commented Sep 23, 2025 •

edited by atlassian bot

Loading

Uh oh!

JNKPercona commented Sep 24, 2025

Uh oh!

Uh oh!

K8SPG-859 [POC] Percona Server MySQL Hibernation Feature #1092

Are you sure you want to change the base?

K8SPG-859 [POC] Percona Server MySQL Hibernation Feature #1092

Conversation

hors commented Sep 23, 2025 • edited by atlassian bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CHANGE DESCRIPTION

🎯 Key Features

✅ Core Hibernation Functionality

✅ Smart Scheduling Logic

✅ Robust Error Handling

🏗️ Architecture

New Controller: PerconaServerMySQLHibernationReconciler

Enhanced CRD Fields

Status Fields

CHECKLIST

Uh oh!

JNKPercona commented Sep 24, 2025

Uh oh!

Uh oh!

hors commented Sep 23, 2025 •

edited by atlassian bot

Loading

New Controller: `PerconaServerMySQLHibernationReconciler`