EKS Back Up: How to Back Up and Restore EKS with Velero

Written by Sudip Sengupta, Technical Consultant | Feb 19, 2023 3:09:10 PM

With all the advantages that Elastic Kubernetes Service (EKS) provides, the importance of backing up your cluster can’t be overstated. Regular backups of your EKS cluster, including the EKS storage, are an essential part of protecting your deployment of Kubernetes on AWS.

In this article, we delve into the various approaches of backing up an EKS cluster using the open-source tool, Velero. We also show the various steps involved in installing and configuring Velero to backup an EKS cluster.

Use the links here to jump down to the sections on:

How Do I Backup My EKS Cluster in AWS?
EKS Backup and Restore with Velero
How to Back Up and Restore an EKS Cluster with Velero Step by Step
- Prerequisites
Leveraging BlueXP Cloud Backup for Enhanced EKS Backup and Storage Protection
FAQs

How Do I Backup My EKS Cluster in AWS?

There are several approaches to backup an EKS cluster in AWS, including:

Using the AWS CLI to create snapshots of the AWS EBS volumes attached to the worker nodes in the cluster
Using AWS Backup to create automated backups of the EBS volumes and other resources in the cluster, such as RDS databases
Using the Kubernetes persistent volume snapshot feature to backup the data stored in persistent volumes in the cluster
Using a third-party backup solution, such as Velero, to create backups of the cluster and its resources, and store them in an S3 storage bucket

The choice you pick may come down to your specific AWS EKS architecture and the components required to be backed up. When compared to other options, Velero provides a more comprehensive and versatile solution for backing up and restoring EKS clusters, making it the defacto choice for backing up EKS.

EKS Backup and Restore with Velero

Velero is an open-source tool that allows for easy backup and restore of an EKS cluster at the application level. The process begins with installing Velero on the EKS cluster using the Helm package manager and then configuring it to connect to the storage backend (for storing backups) and to the etcd key-value store (for storing the Kubernetes cluster state).

Creating a backup through Velero leverages the Kubernetes API to query the current state of the cluster and create Custom Resource Definition (CRD) objects to manage the backup process. The restoration process also relies on the Kubernetes API for retrieving the backup data stored in a secure location and using it to recreate the cluster or specific resources such as Deployments, Services, ConfigMaps, and Persistent Volume Claims.

Using Velero as backup solution over other options has several advantages, including:

Velero can backup not only the EBS volumes but also other resources in the cluster such as ConfigMaps, Secrets and Persistent Volume Claims
Backup multiple clusters across different regions and accounts, making it a good option for multi-cluster and multi-cloud environments
Offering more granular control over your backup processes by allowing backup and restoration of entire clusters or specific resources within a cluster
Perform disaster recovery by restoring a cluster to a different cluster or cloud provider
Velero leverages a Kubernetes native design that allows it to integrate seamlessly with other Kubernetes resources and workflows
The approach also supports backing up and restoring to and from different storage classes, which can be useful in scenarios where you need to move data between regions or cloud providers

How to Back Up and Restore an EKS Cluster with Velero Step by Step

The following demo outlines the steps to back up and restore an EKS cluster using Velero and an AWS S3 bucket. The walkthrough involves backing up and restoring both the application definition and cluster state using a PVC.

Prerequisites

AWS account with permissions to create and manage service accounts
EKS clusters (Two EKS clusters: the primary and recovery)
S3 bucket for backup storage
eksctl v0.123.0 or higher
AWS CLI version 2
kubectl, Helm v3

1. Storage Bucket & Policy Setup

In this step we will show how to create an S3 bucket and set up an IAM policy for Velero backups on an Elastic Kubernetes Service (EKS) cluster. While the S3 bucket will act as the storage location for Velero backups, the IAM policy will grant Velero the necessary permissions to access and manage the backups in the S3 bucket.

Create S3 bucket to store backups

Here we’ll create the S3 bucket that will store our EKS backup copy.

To declare the unique S3 bucket name and appropriate AWS region as environment variables in a Linux or MacOS terminal, you can use the command:

export BUCKET=<YOUR_BUCKET_NAME>
export REGION=<YOUR_AWS_REGION>

For the purpose of this demo, we use the following:

export BUCKET=darwin-eks-velero-backups
export REGION=ap-northeast-1

Create the S3 bucket by using this command:

$ aws s3 mb s3://darwin-eks-velero-backups$BUCKET --region $REGION

Once the S3 bucket is created successfully, the following output is returned:

make_bucket: darwin-eks-velero-backups

Create an IAM policy to grant Velero permissions

To perform snapshots and save backups, Velero needs to make various API calls to the S3 bucket and EC2 instances. In this step we’ll create the IAM policy to grant Velero permission to do that.

Quick note: There are different ways to write an IAM policy to grant Velero the necessary permissions. The exact policy you need may depend on your specific requirements and use case. For example, you can use wildcard characters (“*”) to grant permissions on all resources. You can also limit the permissions based on specific resource ARNs by tagging those directly in the policy document.

You can also use AWS's predefined policies, for example, the AmazonS3FullAccess policy, which grants Velero full access to the S3 bucket and its objects.

Create the IAM policy document named velero_policy.json in a JSON format and add the following policy to grant Velero necessary permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeVolumes",
        "ec2:DescribeSnapshots",
        "ec2:CreateTags",
        "ec2:CreateVolume",
        "ec2:CreateSnapshot",
        "ec2:DeleteSnapshot"
      ],
      "Resource": [
                "arn:aws:ec2:REGION:ACCOUNT_ID:volume/*",
                "arn:aws:ec2:REGION:ACCOUNT_ID:snapshot/*"
            ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:ListMultipartUploadParts",
        "s3:AbortMultipartUpload",
        "s3:DeleteObject"
      ],
      "Resource": [
        "arn:aws:s3:::darwin-eks-velero-backups/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::darwin-eks-velero-backups"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "iam:PassRole"
      ],
      "Resource": [
        "arn:aws:iam::ACCOUNT_ID:role/ROLE_NAME"
      ]
    }
  ]
}

Please ensure to replace BUCKET_NAME with the actual name of the S3 bucket where you want to store the backups, ACCOUNT_ID with your account ID, and ROLE_NAME with the name of the role you want Velero to assume.

By granting Velero the ability to assume the specific role to perform operations, this policy grants Velero the permissions to access the EC2 volumes and snapshots with specific ARNs and S3 bucket objects.

Once you have the policy file ready, you can create the policy by running the command:

$ aws iam create-policy --policy-name VeleroAccessPolicy --policy-document file://velero_policy.json

Which returns an output similar to:

{
    "Policy": {
        "PolicyName": "VeleroAccessPolicy",
        "PolicyId": "ABCDEFGHIJKLMNOPQRST",
        "Arn": "arn:aws:iam::123456789012:policy/VeleroAccessPolicy",
        "Path": "/",
        "DefaultVersionId": "v1",
        "AttachmentCount": 0,
        "IsAttachable": true,
        "CreateDate": "2023-01-20T12:00:00Z",
        "UpdateDate": "2023-01-20T12:00:00Z"
    }
}

Once the IAM policy is created, copy the policy ARN and use it to attach it to the IAM role/user using the following command:

$ aws iam attach-role-policy --role-name <ROLE_NAME> --policy-arn <POLICY_ARN>

Quick note: Attaching the IAM policy to an IAM user or role is an important step as it grants the user or role the necessary permissions to perform actions on the AWS resources specified in the policy. In the case of Velero, the policy grants permissions for actions such as creating and deleting snapshots and volumes, getting and putting objects in an S3 bucket, and passing a role. Without attaching the policy to an IAM user or role, the user or role would not have the necessary permissions to perform these actions and Velero would not be able to function properly.

2. Configuring Velero

Velero requires access to both the S3 bucket for storing backups and the EKS cluster for managing Kubernetes resources. In order to access these resources, Velero makes API requests using the IAM permissions that are granted to the Velero server service account.

That makes it necessary to use IAM roles for Velero service accounts to implement AWS policies that grant Velero the necessary permissions to access the S3 bucket and the EKS cluster.

In this section we will show you how to set up these Velero service accounts and install Velero on both the primary and recovery cluster.

Create service accounts for Velero

In this step we will create the required IAM role and enforce a trust relationship to the Velero service account using the eksctl tool.

Before creating the service account, create environment variables for the cluster names and account identities used in the demo.

PRIMARY_CLUSTER=primary
RECOVERY_CLUSTER=recovery
ACCOUNT=$(aws sts get-caller-identity --query Account --output text)

Once the environment variables are defined, create the IAM service account for the Velero server on the primary cluster by running the command below.

$ eksctl create iamserviceaccount --cluster=$PRIMARY_CLUSTER --name=velero-server --namespace=velero --role-name=eks-velero-backup --role-only --attach-policy-arn=arn:aws:iam::$ACCOUNT:policy/VeleroAccessPolicy --approve

In the above command:

--cluster=$PRIMARY_CLUSTER specifies the name of the EKS cluster that the service account will be associated with. The $PRIMARY_CLUSTER is the environment variable that you have set with the name of your primary cluster.
--name=velero-server specifies the name of the service account that will be created.
--namespace=velero specifies the namespace in which the service account will be created.
--role-name=eks-velero-backup specifies the name of the IAM role that will be associated with the service account.
--role-only is a flag that creates the IAM role but does not associate it with the service account.
--attach-policy-arn=arn:aws:iam::$ACCOUNT:policy/VeleroAccessPolicy specifies the ARN of the IAM policy that will be attached to the service account. The $ACCOUNT is the environment variable that you have set with the AWS account number.
--approve automatically approves the creation of the service account.

Follow the same step as shown above to create the service account of the recovery cluster.

$ eksctl create iamserviceaccount --cluster=$RECOVERY_CLUSTER --name=velero-server --namespace=velero --role-name=eks-velero-recovery --role-only --attach-policy-arn=arn:aws:iam::$ACCOUNT:policy/VeleroAccessPolicy --approve

Install Velero on both clusters

Here we will show how to install Velero separately on both clusters to perform the backup.

Add VMWare Tanzu to your repositories using Helm:

$ helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts

This will return the following response:

"vmware-tanzu" has been added to your repositories

To configure the installation of Velero on your cluster, create the config.yaml configuration file that contains various settings such as the location of your backup storage, the schedules for taking backups, and the resources that should be included in the backups.

To create the config.yaml file, use the command:

$ touch config.yaml && nano config.yaml

This command will create a new file named config.yaml in the current directory, and open it in the nano text editor. On the text editor, add the required configuration as shown below and save the file.

configuration:
  backupStorageLocation:
    bucket: darwin-eks-velero-backups
  provider: aws
  volumeSnapshotLocation:
    config:
      region: ap-northeast-1
credentials:
  useSecret: true
initContainers:
- name: velero-plugin-for-aws
  image: velero/velero-plugin-for-aws:v1.3.0
  volumeMounts:
  - mountPath: /target
    name: plugins
serviceAccount:
  server:
    annotations:
      eks.amazonaws.com/role-arn: "arn:aws:iam::${ACCOUNT}:role/eks-velero-backup"

Next, configure the config_recovery.yaml file that will contain the necessary information for Velero to access the backup storage and the EKS cluster.

To create the config_recovery.yaml file, use the command:

$ touch config_recovery.yaml && nano config_recovery.yaml

On the text editor, add the required configuration as shown below and save the file.

configuration:
  backupStorageLocation:
    bucket: darwin-eks-velero-backups
  provider: aws
  volumeSnapshotLocation:
    config:
      region: ap-northeast-1
credentials:
  useSecret: true
initContainers:
- name: velero-plugin-for-aws
  image: velero/velero-plugin-for-aws:v1.3.0
  volumeMounts:
  - mountPath: /target
    name: plugins
serviceAccount:
  server:
    annotations:
     eks.amazonaws.com/role-arn: "arn:aws:iam::${ACCOUNT}:role/eks-velero-recovery"

Installing Velero on the primary cluster

This section will show how to install Velero on the primary cluster.

In the primary cluster, use the following command to install the Velero server:

$ helm install velero vmware-tanzu/velero --create-namespace --namespace velero -f config.yaml

Which returns a response similar to:

NAME: velero
LAST DEPLOYED: Mon Jan 20 10:42:00 2023
NAMESPACE: velero
STATUS: deployed
REVISION: 1
TEST SUITE: None

Installing Velero on the recovery cluster

Here we’ll show how to install Velero on the recovery cluster.

First, switch context from the primary to the recovery cluster by using this command:

$ kubectl config use-context arn:aws:eks:ap-northeast-1:132053483863:cluster/recovery

This will return a response similar to:

Switched to context "arn:aws:eks:ap-northeast-1:132053483863:cluster/recovery".

Run the following command to install Velero in the recovery cluster.

$ helm install velero vmware-tanzu/velero --create-namespace --namespace velero -f config_recovery.yaml

Which returns a response similar to:

NAME: velero
LAST DEPLOYED: Mon Jan 20 11:08:00 2023
NAMESPACE: velero
STATUS: deployed
REVISION: 1
TEST SUITE: None

Installing the Velero CLI

Velero uses a native CLI for cluster configuration as a Custom Resource Definition (CRD). Instructions to install the Velero CLI vary depending on the host system. The full list of installation options for the Velero CLI can be found here.

To install the Velero command-line interface (CLI) on your local machine, you can use the following command:

curl -L https://github.com/vmware-tanzu/velero/releases/download/v1.5.4/velero-v1.5.4-linux-amd64.tar.gz -o velero.tar.gz
tar -zxvf velero.tar.gz
sudo mv velero-v1.5.4-linux-amd64/velero /usr/local/bin/

3. Backing Up the Primary Cluster

Now we will show how to back up the primary cluster. If you are still on the recovery cluster, switch back to the primary cluster context to install the application.

$ kubectl config use-context arn:aws:eks:ap-northeast-1:132053483863:cluster/primary

Install application on the primary EKS cluster

The first step is to install the primary EKS cluster.

For the purpose of this demo, we use a demo application that is to be used in our deployed cluster for backup and recovery. Clone the manifest into your repository.

$ git clone https://github.com/ssengupta3/darwin-velero

Navigate to the working directory.

$ cd darwin-velero

Apply the manifests to the cluster by running the command:

$ kubectl apply -f darwin-velero-deployment.yaml

This will return a response similar to:

deployment.apps/darwin-velero-server created
service/darwin-service created
service/darwin-load-balancer created

Check the pods and namespaces that have been created and running:

$ kubectl get pods -A

The output of the above command will be a list of pods running in the cluster, with information such as the pod name, namespace, status, and the number of restarts. You would also find pods running with the name similar to darwin-velero-server-xxx in the velero namespace, which means the Velero server pod is running successfully and ready for use.

Create a Backup

Now we will show how to create the backup.

Run the command below to create a backup of the application:

$ velero backup create darwin-backup

This will return a response similar to:

Backup request "darwin-backup" submitted successfully.
Run `velero backup describe darwin-backup` for more details.

Validate a successful backup

In this section we’ll show how to validate your backup copy creation.

Check that the backup has been submitted successfully by running the command:

$ velero backup describe darwin-backup

This will return a response that looks similar to:

Name:         darwin-backup
Namespace:    velero
Labels:       <none>
Annotations:  <none>
Phase: Completed
Started:            2023-01-20T12:46:12Z
Completed:          2023-01-20T13:31:12Z
Expiration:         2023-05-30T23:59:59Z 

Snapshot:  
 Taken At: 2023-01-20 12:31:12 GMT
 Expires:  2023-04-15 13:301:00 GMT 

Storage Location: S3
 Provider:          aws
 Bucket:            darwin-eks-velero-backups
 Prefix:            darwin-backup
 Credentials:       <none>
 Region:            ap-northeast-1
 AESCBC:            <none>

Resources:
  Included:
  - pods/*
  - services/*
  - deployments/*
  Excluded:  <none>

Velero-managed Resources:
 Namespaces:      <none>
 Resources:       darwin-deployment
 Cluster-scoped:  <none>

4. Restoring the EKS Backup

This section shows how to restore your EKS backup that was created with Velero.

Once the backup is saved, test to ensure the restoration is working as intended. First, switch the Kubernetes context to the recovery cluster:

$ kubectl config use-context arn:aws:eks:ap-northeast-1:132053483863:cluster/recovery

Restore the application from the recovery EKS cluster

To restore the cluster resources, run the command:

$ velero restore create darwin-restore --from-backup darwin-backup

This returns a response similar to:

Restore request "darwin-restore" has been successfully created.
Use "velero restore describe darwin-restore" for more details.

Validate a successful restore

Run the following command to check that the restoration has been performed successfully:

$ velero restore describe darwin-restore

On successful restore, the following output is returned:

Name:         darwin-backup
Namespace:    velero
Labels:       <none>
Annotations:  <none>

Status:
 Phase: Completed
 Start Time: 2023-01-20T12:46:12Z
 End Time:   2023-01-20T13:31:12Z
 
Spec:
 Backup:    darwin-backup
 Snapshot Volumes: true
 Include Resources:
-    Deployments
-    Services
-    Pods
-    Persistent Volume Claims

Restore Pods:
 Completed:   1
 Failed:      0
 In Progress: 0 

Restored Resources: 
 Deployments:                    1
 Services:                       1
 Pods:                           1
 Persistent Volume Claims:       1
 
Snapshots:
  Completed:   1
  Failed:      0
  In Progress: 0

Leveraging BlueXP Cloud Backup for Enhanced EKS Backup and Storage Protection

Hosting applications on Elastic Kubernetes Service (EKS) provides many benefits such as scalability, flexibility, and ease of use. However, as we’ve seen above, backing up and recovering these applications can be a complex task. With its enterprise-grade, Kubernetes-aware backup and recovery, NetApp BlueXP provides advanced data management and protection features that are not offered by AWS out of the box.

Leveraging an incremental-forever block-level data replication, BlueXP backup and recovery leverages Cloud Backup to create space-efficient backup capabilities for EKS clusters hosted using Cloud Volumes ONTAP. By only replicating the changes made since the last backup copy, it drastically reduces the amount of transfer data and storage space required. Additionally, with its search-and-restore indexed catalog feature, Cloud Backup allows you to quickly search for and restore specific files, folders, or objects with the list of available backups.

To prevent your backups from being vulnerable to attack vectors, the Cloud Backup ransomware protection feature scans your backups and automatically alerts about any attempt to encrypt or delete your backup data, ensuring your backups remain safe and accessible.

FAQs

How do I backup my Kubernetes cluster?
Backing up a Kubernetes cluster can be done using a variety of tools and methods. The process typically involves creating a backup of the cluster's etcd data store, which contains important configuration information, as well as creating backups of the application data and persistent volumes. The backups can then be stored in a secure location, such as a remote backup server or cloud storage.
How do I backup my ECS cluster?
Backing up an Amazon Elastic Container Service (ECS) cluster can be done using several methods, such as using the ECS-native backup and recovery feature, or using third-party backup solutions. The process typically involves creating snapshots of the EBS volumes that are used by the ECS instances, and then storing them in an S3 bucket for future use. Additionally, you might also want to create backups of the ECS task definitions, services, and service discovery registrations.

View full post