Configure disaster recovery

This guide explains how to configure disaster recovery for Chef 360 Platform.

Chef 360 Platform’s built-in disaster recovery lets you recover your deployment after a complete cluster failure. It creates full system snapshots that include embedded PostgreSQL data, application configuration, secrets, and Kubernetes cluster state, then restores them to a target environment running the same Chef 360 Platform version.

Disaster recovery uses an active-passive model and doesn’t support active-active replication or continuous synchronization between clusters. Backups are full backups, not incremental, and can be scheduled or initiated manually. During recovery, the restored cluster is brought online and access is resumed through a DNS or load balancer cutover using the same tenant FQDN.

Prerequisites

Before configuring disaster recovery, ensure you have:

A license that supports disaster recovery. If the Disaster Recovery tab isn’t visible in the Admin Console, then your license doesn’t include disaster recovery.
PostgreSQL database backup configuration (optional, but required for backup functionality).
AWS S3 object storage with the following details:
- Bucket name
- Storage path prefix
- AWS region
- Access key ID and secret key
- S3 endpoint URL
Administrative permissions to modify configuration and deploy changes.

Configure PostgreSQL database backup

PostgreSQL backup configuration is optional. Configure this section only if you want database backup and disaster recovery support.

Chef 360 Platform supports two PostgreSQL deployment options:

PostgreSQL: A self-managed database running inside your cluster. Chef 360 Platform manages the database, so you’re responsible for configuring backups using the S3 settings in this section.
PostgreSQL RDS: Amazon’s managed database service. AWS handles backups natively through RDS automated backups and snapshots, so the backup configuration in this section isn’t required.

Note

If you select PostgreSQL RDS, skip this section. Backup is managed by AWS RDS.

To configure PostgreSQL database backup, follow these steps:

In the Admin Console, select Application > Config.
In the Managed Services section, locate PostgreSQL Type.
Select PostgreSQL to enable backup functionality.

Configure backups using PostgreSQL

To enable backups with PostgreSQL, follow these steps:

In the Admin Console, select Application > Config > Backup Object Storage Configuration.
Select the Enable Backup checkbox.
When enabled, all storage configuration fields become required.
Configure storage settings by entering the following:
- Destination Path: The S3 path where database backups are stored, in the format s3://{bucket-name}/{postgres-backup-path}. For example, s3://backup-bucket/chef-360-backups/postgresql.
- Region: The AWS region for your S3 storage. For example, us-east-2.
- Access Key: The access key for your S3 storage service. The key must have write permissions to the destination path.
- Secret Key: The secret key paired with the access key, used for authentication to your storage service.
Select Save Config to persist your backup settings.
Select Deploy to enable backups.
The system validates AWS S3 connectivity and enables continuous database backup operations.

Configure disaster recovery

To configure disaster recovery settings for the cluster, follow these steps:

In the Admin Console, select the Disaster Recovery tab.
Select Settings & Schedule.
In the Backup settings section, configure backup storage by entering the following:
- Destination: Select AWS S3 Storage.
- Bucket: Your S3 bucket name.
- Prefix: The path prefix under the bucket. For example, chef-360-backups/velero-backup.
  Use a different path prefix for PostgreSQL backups and disaster recovery backups to avoid conflicts.
- Access Key ID: Your S3 access key.
- Access Key Secret: Your S3 secret key.
- Endpoint: The full S3 endpoint URL. For example, https://s3.us-east-2.amazonaws.com.
- Region: The AWS region. For example, us-east-2.
- Optional: Add a CA Certificate: A certificate for a custom certificate authority.
Select Update storage settings to save the configuration.

Schedule automatic backups

To schedule automatic backups:

In the Scheduled backups pane, select Enable scheduled backups.
Choose a schedule: Hourly, Daily, Weekly, or enter a custom cron expression.
Select Update schedule to save.

Scheduled backups run automatically according to your cron schedule. All backups appear in the Backups list with completion status and timing.

Configure retention policy

To set how long backups are retained, follow these steps:

In the Retention policy pane, enter a retention value and select a time unit: Minutes, Hours, Days, Weeks, or Months.
Select Update retention policy to save.
This policy applies to both manual and scheduled backups.

Configure disaster recovery in air-gapped environments

Air-gapped clusters don’t have public internet access, so disaster recovery requires additional setup:

You must configure your private network to allow access to AWS S3.
You must manually pre-load Velero’s backup and restore plugin images into the air-gapped Kubernetes image store.

Prerequisites

Before setting up disaster recovery in an air-gapped environment, you need:

Jump host: An internet-connected workstation with Docker installed. You’ll download all the required assets on this machine.
Air-gap instance: An air-gapped workstation that allows incoming traffic from the jump host, so you can transfer the required assets to it.

S3 access requirements

S3 must be accessible from within your VPC for disaster recovery to work. Configure access using one of the following methods:

A VPC endpoint for S3 (recommended)
Custom network routing to S3

Key points for S3 configuration:

Public internet access isn’t required.
S3 must be reachable privately from within the VPC.
Proper IAM and network configuration must be in place.

Pre-load Velero plugin images

Chef 360 Platform uses the following Velero plugins to backup and restore itself:

docker.io/progressofficial/chef360:1.0.3: PostgreSQL backup and restore plugin for Velero.
docker.io/velero/velero-plugin-for-aws:v1.12.1: AWS S3 storage plugin for Velero.

In an air-gapped environment, an embedded Kubernetes Cluster can’t pull container images from the internet, so you must manually pre-load them before installing Chef 360 Platform. If you skip this step, Velero fails to start because it can’t pull the initContainer plugin images.

To pre-load the Velero plugin images, follow these steps:

On an internet-connected machine, pull and save both plugin images:

# Switch to root user
sudo su

# Pull and save the PostgreSQL restore plugin
docker pull --platform linux/amd64 docker.io/progressofficial/chef360:1.0.3
docker save docker.io/progressofficial/chef360:1.0.3 -o velero-plugin-cnpg-restore.tar

# Pull and save the AWS plugin
docker pull --platform linux/amd64 docker.io/velero/velero-plugin-for-aws:v1.12.1
docker save docker.io/velero/velero-plugin-for-aws:v1.12.1 -o velero-plugin-for-aws.tar

Transfer both .tar files to the target machine using scp, USB, or your organization’s approved transfer method.

In the air-gapped environment, copy the plugin images to the Kubernetes image store:

# Create the k0s images directory
sudo mkdir -p /var/lib/embedded-cluster/k0s/images/

# Copy the plugin images into the k0s image store
sudo cp velero-plugin-cnpg-restore.tar /var/lib/embedded-cluster/k0s/images/
sudo cp velero-plugin-for-aws.tar /var/lib/embedded-cluster/k0s/images/

On an internet-connected machine, download the air-gapped installer bundle:
```
curl https://<DOWNLOAD_DOMAIN>/embedded/chef-360/<RELEASE_CHANNEL>/<RELEASE_VERSION>?airgap=true \
  -H "Authorization: <AUTHORIZATION_CODE>" \
  -o chef-360.tgz
```
Replace the following:
- <DOWNLOAD_DOMAIN> with the domain provided by Progress Chef.
- <RELEASE_CHANNEL> with the release channel (for example, stable).
- <RELEASE_VERSION> with the Chef 360 Platform version (optional; omit for latest).
- <AUTHORIZATION_CODE> with your Chef 360 Platform authorization code.
Transfer chef-360.tgz to the air gap machine, then extract it:
```
tar -xvzf chef-360.tgz
```
Run the air gap installation:
```
sudo ./chef-360 install --license license.yaml --airgap-bundle chef-360.airgap
```
This runs the Replicated Embedded Cluster installer. k0s automatically imports images from /var/lib/embedded-cluster/k0s/images/ during startup.

Operational guidelines

Document Chef 360 Platform build versions for each backup to ensure that you can restore from each backup.
Monitor storage usage with frequent scheduled backups.
Update retention policies based on compliance requirements.
Keep credentials secure but accessible during disaster scenarios.

Troubleshooting

If configuration fails, check for the following:

Missing s3:// prefix in the PostgreSQL destination path.
Incorrect region for the specified endpoint.
Endpoint missing protocol (must include https://).
Credentials that lack the required permissions for the storage paths.

If backups fail, check the following:

S3 connectivity and credentials.
Storage quotas and permissions.
Whether the PostgreSQL and DR backup paths are unique.

Configure disaster recovery

Prerequisites

Configure PostgreSQL database backup

Configure backups using PostgreSQL

Configure disaster recovery

Schedule automatic backups

Configure retention policy

Configure disaster recovery in air-gapped environments

Prerequisites

S3 access requirements

Pre-load Velero plugin images

Operational guidelines

Troubleshooting

More information