AWS Deployment using S3

Note

Chef Automate 4.10.1 released on 6th September 2023 includes improvements to the deployment and installation experience of Automate HA. Please read the blog to learn more about key improvements. Refer to the pre-requisites page (On-Premises, AWS) and plan your usage with your customer success manager or account manager.

Note

If the user chooses backup_config as s3 in config.toml, backup is already configured during deployment, the below steps are not required. If we have kept the backup_config blank, then the configuration needs to be configured manually.
Encrypted S3 bucket are supported with only Amazon S3 managed keys (SSE-S3).

Overview

To Communicate with Amazon S3 we need an IAM Role with the required policy.

Attach the IAM Role to the All the OpenSearch Node and Frontend Node.

Note

In case of if you are using the Managed AWS Service you need to create a snapshot-role for OpenSearch.

Configuration in Provision host

Create a .toml say, automate.toml.

Refer to the content for the automate.toml file below:

[global.v1]
  [global.v1.external.opensearch.backup]
    enable = true
    location = "s3"

  [global.v1.external.opensearch.backup.s3]

    # bucket (required): The name of the bucket
    bucket = "bucket-name"

    # base_path (optional):  The path within the bucket where backups should be stored
    # If base_path is not set, backups will be stored at the root of the bucket.
    base_path = "opensearch"

    # name of an s3 client configuration you create in your opensearch.yml
    # see https://www.open.co/guide/en/opensearch/plugins/current/repository-s3-client.html
    # for full documentation on how to configure client settings on your
    # OpenSearch nodes
    client = "default"

  [global.v1.external.opensearch.backup.s3.settings]
    ## The meaning of these settings is documented in the S3 Repository Plugin
    ## documentation. See the following links:
    ## https://www.open.co/guide/en/opensearch/plugins/current/repository-s3-repository.html

    ## Backup repo settings
    # compress = false
    # server_side_encryption = false
    # buffer_size = "100mb"
    # canned_acl = "private"
    # storage_class = "standard"
    ## Snapshot settings
    # max_snapshot_bytes_per_sec = "40mb"
    # max_restore_bytes_per_sec = "40mb"
    # chunk_size = "null"
    ## S3 client settings
    # read_timeout = "50s"
    # max_retries = 3
    # use_throttle_retries = true
    # protocol = "https"

  [global.v1.backups]
    location = "s3"

  [global.v1.backups.s3.bucket]
    # name (required): The name of the bucket
    name = "bucket-name"

    # endpoint (required): The endpoint for the region the bucket lives in for Automate Version 3.x.y
    # endpoint (required): For Automate Version 4.x.y, use this https://s3.amazonaws.com
    endpoint = "https://s3.amazonaws.com"

    # base_path (optional):  The path within the bucket where backups should be stored
    # If base_path is not set, backups will be stored at the root of the bucket.
    base_path = "automate"

  [global.v1.backups.s3.credentials]
    access_key = "<Your Access Key>"
    secret_key = "<Your Secret Key>"

Execute the command given below to trigger the deployment.
```
chef-automate config patch --frontend automate.toml
```

Note

IAM Role: Assign the IAM Role to all the OpenSearch instances in the cluster created above.

Backup and Restore

Backup

To create the backup, by running the backup command from bastion. The backup command is as shown below:
```
chef-automate backup create
```

Restore

Pre-Restore Validation

Run the restore command with the –verify-restore-config flag to validate the configuration settings before initiating the restore process. To perform the pre-check, run the following command from the bastion host:

chef-automate backup restore s3://bucket_name/path_to_backups/BACKUP_ID --verify-restore-config

The verification process ensures that the backup and restore configurations are correct and identifies potential issues so they can be addressed in advance.

Run Restore

To restore backed-up data of the Chef Automate High Availability (HA) using External AWS S3, follow the steps given below:

Check the status of all Chef Automate and Chef Infra Server front-end nodes by executing the chef-automate status command.
Log in to the same instance of Chef Automate front-end node from which backup is taken.
Execute the restore command from bastion chef-automate backup restore s3://bucket_name/path_to_backups/BACKUP_ID --skip-preflight --s3-access-key "Access_Key" --s3-secret-key "Secret_Key".
In case of Airgapped Environment, Execute this restore command from bastion chef-automate backup restore <object-storage-bucket-path>/backups/BACKUP_ID --airgap-bundle </path/to/bundle> --skip-preflight.

Note

If you are restoring the backup from an older version, then you need to provide the --airgap-bundle </path/to/current/bundle>.
If you have not configured S3 access and secret keys during deployment or if you have taken backup on a different bucket, then you need to provide the --s3-access-key <Access_Key> and --s3-secret-key <Secret_Key> flags.
Large Compliance Report is not supported in Automate HA

Troubleshooting

Follow the steps below if Chef Automate encounters an error during data restoration.

Check the Chef Automate status.
```
chef-automate status
```
Check the status of your Habitat service on the Automate node.
```
hab svc status
```
If the deployment services are not healthy, reload them.
```
hab svc load chef/deployment-service
```
Check the status of the Automate node, and then attempt to run the restore command from the bastion host.

To change the base_path or path, follow the steps below for performing a backup.

File System

During deployment, the backup_mount is default set to /mnt/automate_backups.
The deployment process will automatically apply the updated path if you update the backup_mount value in the config.toml file before deployment.
If the backup_mount value is changed after deployment (e.g., to /bkp/backps), you must manually patch the configuration on all frontend and backend nodes.
Update the FE nodes using the template below. To update the configuration, use the command chef-automate config patch fe.toml --fe.

   [global.v1.backups]
      [global.v1.backups.filesystem]
         path = "/bkp/backps"
   [global.v1.external.opensearch.backup]
      [global.v1.external.opensearch.backup.fs]
         path = "/bkp/backps"

Update the OpenSearch nodes using the template provided below. Use the chef-automate config patch os.toml --os command to update the Opensearch node configs.

[path]
   repo = "/bkp/backps"

Run the curl request against one of the Automate frontend nodes.

curl localhost:10144/_snapshot?pretty

If the response is an empty JSON object {}, no changes are required to the snapshot settings in the OpenSearch cluster.
If you see a JSON response similar to the example below, check that the backup_mount setting is correctly configured. Use the location value in the response to verify. It should start with /bkp/backps.

{
 "chef-automate-es6-event-feed-service" : {
 "type" : "fs",
 "settings" : {
 "location" : "/mnt/automate_backups/opensearch/automate-elasticsearch-data/chef-automate-es6-event-feed-service"
       }
    },
 "chef-automate-es6-compliance-service" : {
 "type" : "fs",
 "settings" : {
 "location" : "/mnt/automate_backups/opensearch/automate-elasticsearch-data/chef-automate-es6-compliance-service"
       }
    },
 "chef-automate-es6-ingest-service" : {
     "type" : "fs",
 "settings" : {
 "location" : "/mnt/automate_backups/opensearch/automate-elasticsearch-data/chef-automate-es6-ingest-service"
       }
    },
 "chef-automate-es6-automate-cs-oc-erchef" : {
 "type" : "fs",
 "settings" : {
 "location" : "/mnt/automate_backups/opensearch/automate-elasticsearch-data/chef-automate-es6-automate-cs-oc-erchef"
       }
    }
 }

If the prefix in the location value does not match the backup_mount, the existing snapshots must be deleted. Use the script below to delete the snapshots from the one of the Automate frontend nodes.

   snapshot=$(curl -XGET http://localhost:10144/_snapshot?pretty | jq 'keys[]')
   for name in $snapshot;do
       key=$(echo $name | tr -d '"')
      curl -XDELETE localhost:10144/_snapshot/$key?pretty
   done

The above script requires jq to be installed, You can install it from the airgap bundle. To locate the jq package, run the command below on one of the Automate frontend nodes.

ls -ltrh /hab/cache/artifacts/ | grep jq

-rw-r--r--. 1 ec2-user ec2-user  730K Dec  8 08:53 core-jq-static-1.6-20220312062012-x86_64-linux.hart
-rw-r--r--. 1 ec2-user ec2-user  730K Dec  8 08:55 core-jq-static-1.6-20190703002933-x86_64-linux.hart

If multiple versions of jq are available, install the latest one. Use the command below to install the jq package on one of the Automate frontend nodes.

hab pkg install /hab/cache/artifacts/core-jq-static-1.6-20190703002933-x86_64-linux.hart -bf

Object Storage

During deployment, the backup_config should be set to object_storage.
To use object_storage, we use the following template during deployment.

   [object_storage.config]
    google_service_account_file = ""
    location = ""
    bucket_name = ""
    access_key = ""
    secret_key = ""
    endpoint = ""
    region = ""

If you configured it before deployment, then you are all set.
If you want to change the bucket or base_path, use the following template for Frontend nodes.

[global.v1]
  [global.v1.external.opensearch.backup.s3]
      bucket = "<BUCKET_NAME>"
      base_path = "opensearch"
   [global.v1.backups.s3.bucket]
      name = "<BUCKET_NAME>"
      base_path = "automate"

You can assign any value to the base_path variable. The base_path configuration is required only for the Frontend nodes.
Use the command chef-automate config patch frontend.toml --fe to apply the above template and update the configuration.
Use the following curl request to validate the configuration.
```
curl localhost:10144/_snapshot?pretty
```
If the response is an empty JSON object ({}), the configuration is valid.

If the response contains a JSON output similar to the example below, it should have the correct value for the base_path.

{
    "chef-automate-es6-event-feed-service" : {
      "type" : "s3",
      "settings" : {
        "bucket" : "MY-BUCKET",
        "base_path" : "opensearch/automate-elasticsearch-data/chef-automate-es6-event-feed-service",
        "readonly" : "false",
        "compress" : "false"
      }
    },
    "chef-automate-es6-compliance-service" : {
      "type" : "s3",
      "settings" : {
        "bucket" : "MY-BUCKET",
        "base_path" : "opensearch/automate-elasticsearch-data/chef-automate-es6-compliance-service",
        "readonly" : "false",
        "compress" : "false"
      }
    },
    "chef-automate-es6-ingest-service" : {
      "type" : "s3",
      "settings" : {
        "bucket" : "MY-BUCKET",
        "base_path" : "opensearch/automate-elasticsearch-data/chef-automate-es6-ingest-service",
        "readonly" : "false",
        "compress" : "false"
      }
    },
    "chef-automate-es6-automate-cs-oc-erchef" : {
      "type" : "s3",
      "settings" : {
        "bucket" : "MY-BUCKET",
        "base_path" : "opensearch/automate-elasticsearch-data/chef-automate-es6-automate-cs-oc-erchef",
        "readonly" : "false",
        "compress" : "false"
      }
    }
}

If the base_path value does not match, you must delete the existing snapshots. Please take a look at the File System troubleshooting steps for guidance.

For Disaster Recovery or AMI upgrade, while running the restore in the secondary cluster which is in a different region follow the steps given below.

Make a curl request in any OpenSearch nodecurl -XGET https://localhost:9200/_snapshot?pretty --cacert /hab/svc/automate-ha-opensearch/config/certificates/root-ca.pem --key /hab/svc/automate-ha-opensearch/config/certificates/admin-key.pem --cert /hab/svc/automate-ha-opensearch/config/certificates/admin.pem -k
Check the curl request response if the region is not matching with the primary cluster follow the below steps:

Modify the region in FrontEnd nodes by patching the below configs with command, chef-automate config patch <file-name>.toml --fe
```
[global.v1.external.opensearch.backup.s3.settings]
              region = "<FIRST-CLUSTER-REGION>"
```

Make a PUT request in an OpenSearch node by running this script:

indices=(
chef-automate-es6-automate-cs-oc-erchef
chef-automate-es6-compliance-service
chef-automate-es6-event-feed-service
chef-automate-es6-ingest-service
)
for index in ${indices[@]}; do
curl -XPUT -k -H 'Content-Type: application/json' https://<IP>:9200/_snapshot/$index --data-binary @- << EOF
{
  "type" : "s3",
    "settings" : {
      "bucket" : "<YOUR-PRIMARY-CLUSTER-BUCKET-NAME>",
      "base_path" : "elasticsearch/automate-elasticsearch-data/$index",
      "region" : "<YOUR-PRIMARY-CLUSTER-REGION>",
      "role_arn" : " ",
      "compress" : "false"
    }
}
EOF
done