etcd Backup
This guide explains how to safely back up etcd, the core database of your Kubernetes cluster.
etcd serves as the "brain" of your Kubernetes cluster. All information about Pods, Services, ConfigMaps, Secrets, and other resources running in the cluster is stored in etcd. If etcd is corrupted, you may need to rebuild the entire cluster, making regular backups critically important.
Data Stored in etcd
An etcd snapshot can back up all of the following Kubernetes resources.
-
Pod/Deployment: Stores workload definitions and runtime state. If lost, applications need redeployment.
-
ConfigMap/Secret: Stores environment settings and credentials. If lost, configuration rebuild is required.
-
Service/Ingress: Stores network routing settings. If lost, services become inaccessible.
-
PV/PVC: Stores storage binding information. If lost, data volume connections are broken.
An etcd backup only saves the definitions of Kubernetes resources. Actual data stored in PersistentVolumes needs to be backed up separately. For complete recovery, use Velero for volume backups alongside etcd backups.
Backup Procedure
You can easily create etcd snapshots from the [Backup Management] page in KIWI.
Step 1: Navigate to the Backup Management Page
- Click [Backup Management] in the left menu
- Click the Create Backup button at the top
Step 2: Select Backup Target
- Cluster: Select the Kubernetes cluster to back up
- Backup type: Select
etcd Snapshot
Check that your Kubernetes cluster is properly registered on the [Runtime Environment] page. The cluster connection status must be "Connected" to enable backups.
Step 3: Configure Backup Options
Configure each option according to your situation.
-
Compression (Recommended: Enabled)
- Compresses the snapshot file to save storage space .
- Compression can reduce backup file size by 50% or more
-
Encryption (Recommended: Enabled)
- Encrypts the backup file for enhanced security .
- Recommended since Secrets and other sensitive information are included .
-
Retention period (Recommended: 30 days)
- Backups are automatically deleted after this period
- Set based on storage capacity and recovery needs .
Step 4: Run Backup and Verify
- Review settings and click the Start Backup button .
- Backup progress will be displayed on screen
- Check the status in the backup list when complete
.
- Success: Snapshot file was created successfully .
- Failure: Check error logs to identify the cause .
For important backups, always run the Backup File Verification feature after creation to confirm the snapshot is valid. Corrupted backup files cannot be used for recovery.
Automatic Backup Setup
It's easy to forget to back up manually every day. Set up automatic backup schedules to build a reliable backup system.
Schedule Options Explained
-
Frequency: Set the backup execution time using a Cron expression .
- Example:
0 2 * * *- Run daily at 2 AM - Example:
0 3 * * 0- Run every Sunday at 3 AM
- Example:
-
Retention count: Set the maximum number of backups to keep
- Example:
7- Keep only the 7 most recent backups, automatically delete older ones .
- Example:
- Development: Daily backups with retention of 3. Last 3 days are recoverable.
- Staging: Daily backups with retention of 7. Last 1 week is recoverable.
- Production: Daily backups with retention of 14. Last 2 weeks are recoverable.
Troubleshooting
Backup Failure: "etcdctl snapshot save failed"
etcdctl snapshot save failed
Why does this happen?
A problem occurred while connecting to the etcd cluster or saving the snapshot.
How to check and resolve
-
Check etcd cluster status
- Verify that etcd is running properly .
- Check that all etcd members in the cluster are healthy
-
Verify certificate validity
- Ensure the certificates used for etcd connection haven't expired .
- Verify the certificate paths are correct
-
Check disk space
- Ensure there's enough disk space to save the snapshot
- Maintain at least 2x the current etcd data size in free space .
Snapshot Corruption: "snapshot file corrupted"
snapshot file corrupted
Why does this happen?
A disk error occurred during backup, or the stored backup file was corrupted.
Resolution
- Use a previous backup file: Use a valid backup from a different date .
- Try backup from another etcd member: In high-availability clusters, you can create backups from other members .
- Check storage: Inspect the disk status of the backup storage
- Set up automatic backups to secure multiple recovery points .
- Enable the auto-verification option after backup completion .
- For critical environments, replicate backups to different locations .