Skip to main content

Recovery

This guide explains how to restore your system to a previous state using backup data.

When do you need recovery?
  • System failure: Data loss due to server failure, disk corruption, etc.
  • Accidental deletion: Important resources or data were mistakenly deleted .
  • Update rollback: Rolling back a problematic update to a previous state .
  • Environment replication: Using backups to set up an identical environment elsewhere

Recovery Types

KIWI provides two recovery methods depending on your environment.

  • etcd Recovery: For Kubernetes, restores full cluster state. Time required: minutes to tens of minutes.
  • Docker Recovery: For Docker, restores containers, volumes, and images. Time required: varies by item.
Before you restore
  • Recovery will overwrite current data. Create a backup of the current state if needed.
  • If possible, perform recovery during low-traffic hours.
  • For production environments, validate recovery in a test environment first.

etcd Recovery

Restore your Kubernetes cluster from an etcd snapshot.

Step 1: Select Backup

  1. Click [Backup Management] in the left menu
  2. Find the etcd backup you want to restore in the backup list
  3. Click to select it
Choosing a restore point

If you have multiple backups, select the one from just before the problem occurred. Selecting a backup that's too old will result in losing all changes made after that point.

Step 2: Verify the Restore Target

Before restoring, verify the following information.

  • Backup date/time: The cluster will be reverted to this point in time
  • Cluster: Confirm this is the correct target cluster .
  • Status: Verify the backup file shows "Normal" status .
What if the backup status shows "Corrupted"?

Corrupted backups cannot be used for restoration. Select another valid backup or re-verify the backup file integrity.

Step 3: Execute Restore

  1. Click the Restore button .
  2. Click Start Restore in the confirmation dialog
  3. Restore progress will be displayed on screen
Important: Cluster Downtime

During etcd restoration, the cluster will be temporarily unavailable. All cluster operations including Pod scheduling and API calls will be impossible during this time. Restore time can range from minutes to tens of minutes depending on data size.

Step 4: Verify Cluster Status

After restoration completes, verify that the cluster is operating normally.

# Check node status - all nodes should be Ready
kubectl get nodes

# Check Pod status in all namespaces
kubectl get pods --all-namespaces

# Check system Pod status
kubectl get pods -n kube-system
Post-restore checklist
  1. Are all nodes in Ready state?
  2. Are all kube-system Pods in Running state?
  3. Are application Pods running normally?
  4. Can you access services normally?

Docker Recovery

Restore Docker volumes, images, and containers from backups.

Volume Recovery

Restore volumes containing application data.

Step 1: Select Backup

  1. Click the Docker backup tab on the [Backup Management] page
  2. Select the backup containing the volume you want to restore

Step 2: Select Items to Restore

  1. Volume: Select the volume to restore
  2. Path: Specify the restore location .
    • Original location: Overwrite the existing volume
    • New location: Restore with a different name (preserves existing data)
Safe restoration method

When restoring volumes in production, it's safer to first restore to a new location to verify the data, then copy to the original location after validation.

Step 3: Execute Restore

  1. Click the Restore button .
  2. Monitor the restore progress .

Image Recovery

Restore backed-up Docker images.

Step 1: Select Image Backup

  1. Select the image backup on the [Backup Management] page
  2. Select the image tar file to restore

Step 2: Load the Image

KIWI internally executes the following command to restore the image:

# KIWI internal operation
docker load -i image_backup.tar
Check image tags

Restored images retain their original tags from the backup point. If an image with the same tag already exists, it will be overwritten.


Recovery Verification

After recovery, always verify that the system is operating correctly.

Verification Checklist

  • Service status: Check Pod/container status. Expected state: Running.
  • Data integrity: Verify application data. Expected state: Data exists from backup point.
  • Network: Test service access. Expected state: Normal response.
  • Logs: Check application logs. Expected state: No errors.
Automate recovery verification

For critical systems, prepare scripts that automatically perform health checks after recovery to reduce verification time.


Troubleshooting

Restore Failure: "snapshot file corrupted"

restore failed: snapshot file corrupted

Why does this happen?

The backup file is corrupted and cannot be used for restoration. A disk error may have occurred during backup creation, or the file was corrupted after storage.

Resolution

  1. Use a different backup file: Select a valid backup from a different date .
  2. Verify backup file integrity: Check the checksum in the backup details .
  3. Check backup copies: If backups were replicated to another location, use that file .

Data Mismatch

This occurs when restored data differs from expectations, or applications produce errors.

Why does this happen?

  • Data schema changed between the backup point and current time
  • Linked data with external systems not included in the backup is inconsistent .

How to check and resolve

  1. Verify backup point: Confirm the restored data matches the backup point state .
  2. Check schema compatibility: Verify the application version is compatible with the data schema
  3. Sync external systems: Check if databases or other external systems also need restoration .
Caution with partial restoration

Restoring only part of a system can cause data inconsistencies. When possible, restore all related components together.