With all the advantages that Elastic Kubernetes Service (EKS) provides, the importance of backing up your cluster can’t be overstated. Regular backups of your EKS cluster, including the EKS storage, are an essential part of protecting your deployment of Kubernetes on AWS.
In this article, we delve into the various approaches of backing up an EKS cluster using the open-source tool, Velero. We also show the various steps involved in installing and configuring Velero to backup an EKS cluster.
Use the links here to jump down to the sections on:
There are several approaches to backup an EKS cluster in AWS, including:
The choice you pick may come down to your specific AWS EKS architecture and the components required to be backed up. When compared to other options, Velero provides a more comprehensive and versatile solution for backing up and restoring EKS clusters, making it the defacto choice for backing up EKS.
Velero is an open-source tool that allows for easy backup and restore of an EKS cluster at the application level. The process begins with installing Velero on the EKS cluster using the Helm package manager and then configuring it to connect to the storage backend (for storing backups) and to the etcd key-value store (for storing the Kubernetes cluster state).
Creating a backup through Velero leverages the Kubernetes API to query the current state of the cluster and create Custom Resource Definition (CRD) objects to manage the backup process. The restoration process also relies on the Kubernetes API for retrieving the backup data stored in a secure location and using it to recreate the cluster or specific resources such as Deployments, Services, ConfigMaps, and Persistent Volume Claims.
Using Velero as backup solution over other options has several advantages, including:
The following demo outlines the steps to back up and restore an EKS cluster using Velero and an AWS S3 bucket. The walkthrough involves backing up and restoring both the application definition and cluster state using a PVC.
In this step we will show how to create an S3 bucket and set up an IAM policy for Velero backups on an Elastic Kubernetes Service (EKS) cluster. While the S3 bucket will act as the storage location for Velero backups, the IAM policy will grant Velero the necessary permissions to access and manage the backups in the S3 bucket.
Here we’ll create the S3 bucket that will store our EKS backup copy.
To declare the unique S3 bucket name and appropriate AWS region as environment variables in a Linux or MacOS terminal, you can use the command:
export BUCKET=<YOUR_BUCKET_NAME>
export REGION=<YOUR_AWS_REGION>
For the purpose of this demo, we use the following:
export BUCKET=darwin-eks-velero-backups
export REGION=ap-northeast-1
Create the S3 bucket by using this command:
$ aws s3 mb s3://darwin-eks-velero-backups$BUCKET --region $REGION
Once the S3 bucket is created successfully, the following output is returned:
make_bucket: darwin-eks-velero-backups
To perform snapshots and save backups, Velero needs to make various API calls to the S3 bucket and EC2 instances. In this step we’ll create the IAM policy to grant Velero permission to do that.
Quick note: There are different ways to write an IAM policy to grant Velero the necessary permissions. The exact policy you need may depend on your specific requirements and use case. For example, you can use wildcard characters (“*”) to grant permissions on all resources. You can also limit the permissions based on specific resource ARNs by tagging those directly in the policy document.
You can also use AWS's predefined policies, for example, the AmazonS3FullAccess policy, which grants Velero full access to the S3 bucket and its objects.
Create the IAM policy document named velero_policy.json in a JSON format and add the following policy to grant Velero necessary permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:DescribeVolumes",
"ec2:DescribeSnapshots",
"ec2:CreateTags",
"ec2:CreateVolume",
"ec2:CreateSnapshot",
"ec2:DeleteSnapshot"
],
"Resource": [
"arn:aws:ec2:REGION:ACCOUNT_ID:volume/*",
"arn:aws:ec2:REGION:ACCOUNT_ID:snapshot/*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:ListMultipartUploadParts",
"s3:AbortMultipartUpload",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::darwin-eks-velero-backups/*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::darwin-eks-velero-backups"
]
},
{
"Effect": "Allow",
"Action": [
"iam:PassRole"
],
"Resource": [
"arn:aws:iam::ACCOUNT_ID:role/ROLE_NAME"
]
}
]
}
Please ensure to replace BUCKET_NAME with the actual name of the S3 bucket where you want to store the backups, ACCOUNT_ID with your account ID, and ROLE_NAME with the name of the role you want Velero to assume.
By granting Velero the ability to assume the specific role to perform operations, this policy grants Velero the permissions to access the EC2 volumes and snapshots with specific ARNs and S3 bucket objects.
Once you have the policy file ready, you can create the policy by running the command:
$ aws iam create-policy --policy-name VeleroAccessPolicy --policy-document file://velero_policy.json
Which returns an output similar to:
{
"Policy": {
"PolicyName": "VeleroAccessPolicy",
"PolicyId": "ABCDEFGHIJKLMNOPQRST",
"Arn": "arn:aws:iam::123456789012:policy/VeleroAccessPolicy",
"Path": "/",
"DefaultVersionId": "v1",
"AttachmentCount": 0,
"IsAttachable": true,
"CreateDate": "2023-01-20T12:00:00Z",
"UpdateDate": "2023-01-20T12:00:00Z"
}
}
Once the IAM policy is created, copy the policy ARN and use it to attach it to the IAM role/user using the following command:
$ aws iam attach-role-policy --role-name <ROLE_NAME> --policy-arn <POLICY_ARN>
Quick note: Attaching the IAM policy to an IAM user or role is an important step as it grants the user or role the necessary permissions to perform actions on the AWS resources specified in the policy. In the case of Velero, the policy grants permissions for actions such as creating and deleting snapshots and volumes, getting and putting objects in an S3 bucket, and passing a role. Without attaching the policy to an IAM user or role, the user or role would not have the necessary permissions to perform these actions and Velero would not be able to function properly.
Velero requires access to both the S3 bucket for storing backups and the EKS cluster for managing Kubernetes resources. In order to access these resources, Velero makes API requests using the IAM permissions that are granted to the Velero server service account.
That makes it necessary to use IAM roles for Velero service accounts to implement AWS policies that grant Velero the necessary permissions to access the S3 bucket and the EKS cluster.
In this section we will show you how to set up these Velero service accounts and install Velero on both the primary and recovery cluster.
In this step we will create the required IAM role and enforce a trust relationship to the Velero service account using the eksctl tool.
Before creating the service account, create environment variables for the cluster names and account identities used in the demo.
PRIMARY_CLUSTER=primary
RECOVERY_CLUSTER=recovery
ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
Once the environment variables are defined, create the IAM service account for the Velero server on the primary cluster by running the command below.
$ eksctl create iamserviceaccount --cluster=$PRIMARY_CLUSTER --name=velero-server --namespace=velero --role-name=eks-velero-backup --role-only --attach-policy-arn=arn:aws:iam::$ACCOUNT:policy/VeleroAccessPolicy --approve
In the above command:
Follow the same step as shown above to create the service account of the recovery cluster.
$ eksctl create iamserviceaccount --cluster=$RECOVERY_CLUSTER --name=velero-server --namespace=velero --role-name=eks-velero-recovery --role-only --attach-policy-arn=arn:aws:iam::$ACCOUNT:policy/VeleroAccessPolicy --approve
Here we will show how to install Velero separately on both clusters to perform the backup.
Add VMWare Tanzu to your repositories using Helm:
$ helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts
This will return the following response:
"vmware-tanzu" has been added to your repositories
To configure the installation of Velero on your cluster, create the config.yaml configuration file that contains various settings such as the location of your backup storage, the schedules for taking backups, and the resources that should be included in the backups.
To create the config.yaml file, use the command:
$ touch config.yaml && nano config.yaml
This command will create a new file named config.yaml in the current directory, and open it in the nano text editor. On the text editor, add the required configuration as shown below and save the file.
configuration:
backupStorageLocation:
bucket: darwin-eks-velero-backups
provider: aws
volumeSnapshotLocation:
config:
region: ap-northeast-1
credentials:
useSecret: true
initContainers:
- name: velero-plugin-for-aws
image: velero/velero-plugin-for-aws:v1.3.0
volumeMounts:
- mountPath: /target
name: plugins
serviceAccount:
server:
annotations:
eks.amazonaws.com/role-arn: "arn:aws:iam::${ACCOUNT}:role/eks-velero-backup"
Next, configure the config_recovery.yaml file that will contain the necessary information for Velero to access the backup storage and the EKS cluster.
To create the config_recovery.yaml file, use the command:
$ touch config_recovery.yaml && nano config_recovery.yaml
On the text editor, add the required configuration as shown below and save the file.
configuration:
backupStorageLocation:
bucket: darwin-eks-velero-backups
provider: aws
volumeSnapshotLocation:
config:
region: ap-northeast-1
credentials:
useSecret: true
initContainers:
- name: velero-plugin-for-aws
image: velero/velero-plugin-for-aws:v1.3.0
volumeMounts:
- mountPath: /target
name: plugins
serviceAccount:
server:
annotations:
eks.amazonaws.com/role-arn: "arn:aws:iam::${ACCOUNT}:role/eks-velero-recovery"
This section will show how to install Velero on the primary cluster.
In the primary cluster, use the following command to install the Velero server:
$ helm install velero vmware-tanzu/velero --create-namespace --namespace velero -f config.yaml
Which returns a response similar to:
NAME: velero
LAST DEPLOYED: Mon Jan 20 10:42:00 2023
NAMESPACE: velero
STATUS: deployed
REVISION: 1
TEST SUITE: None
Here we’ll show how to install Velero on the recovery cluster.
First, switch context from the primary to the recovery cluster by using this command:
$ kubectl config use-context arn:aws:eks:ap-northeast-1:132053483863:cluster/recovery
This will return a response similar to:
Switched to context "arn:aws:eks:ap-northeast-1:132053483863:cluster/recovery".
Run the following command to install Velero in the recovery cluster.
$ helm install velero vmware-tanzu/velero --create-namespace --namespace velero -f config_recovery.yaml
Which returns a response similar to:
NAME: velero
LAST DEPLOYED: Mon Jan 20 11:08:00 2023
NAMESPACE: velero
STATUS: deployed
REVISION: 1
TEST SUITE: None
Velero uses a native CLI for cluster configuration as a Custom Resource Definition (CRD). Instructions to install the Velero CLI vary depending on the host system. The full list of installation options for the Velero CLI can be found here.
To install the Velero command-line interface (CLI) on your local machine, you can use the following command:
curl -L https://github.com/vmware-tanzu/velero/releases/download/v1.5.4/velero-v1.5.4-linux-amd64.tar.gz -o velero.tar.gz
tar -zxvf velero.tar.gz
sudo mv velero-v1.5.4-linux-amd64/velero /usr/local/bin/
Now we will show how to back up the primary cluster. If you are still on the recovery cluster, switch back to the primary cluster context to install the application.
$ kubectl config use-context arn:aws:eks:ap-northeast-1:132053483863:cluster/primary
The first step is to install the primary EKS cluster.
For the purpose of this demo, we use a demo application that is to be used in our deployed cluster for backup and recovery. Clone the manifest into your repository.
$ git clone https://github.com/ssengupta3/darwin-velero
Navigate to the working directory.
$ cd darwin-velero
Apply the manifests to the cluster by running the command:
$ kubectl apply -f darwin-velero-deployment.yaml
This will return a response similar to:
deployment.apps/darwin-velero-server created
service/darwin-service created
service/darwin-load-balancer created
Check the pods and namespaces that have been created and running:
$ kubectl get pods -A
The output of the above command will be a list of pods running in the cluster, with information such as the pod name, namespace, status, and the number of restarts. You would also find pods running with the name similar to darwin-velero-server-xxx in the velero namespace, which means the Velero server pod is running successfully and ready for use.
Now we will show how to create the backup.
Run the command below to create a backup of the application:
$ velero backup create darwin-backup
This will return a response similar to:
Backup request "darwin-backup" submitted successfully.
Run `velero backup describe darwin-backup` for more details.
In this section we’ll show how to validate your backup copy creation.
Check that the backup has been submitted successfully by running the command:
$ velero backup describe darwin-backup
This will return a response that looks similar to:
Name: darwin-backup
Namespace: velero
Labels: <none>
Annotations: <none>
Phase: Completed
Started: 2023-01-20T12:46:12Z
Completed: 2023-01-20T13:31:12Z
Expiration: 2023-05-30T23:59:59Z
Snapshot:
Taken At: 2023-01-20 12:31:12 GMT
Expires: 2023-04-15 13:301:00 GMT
Storage Location: S3
Provider: aws
Bucket: darwin-eks-velero-backups
Prefix: darwin-backup
Credentials: <none>
Region: ap-northeast-1
AESCBC: <none>
Resources:
Included:
- pods/*
- services/*
- deployments/*
Excluded: <none>
Velero-managed Resources:
Namespaces: <none>
Resources: darwin-deployment
Cluster-scoped: <none>
This section shows how to restore your EKS backup that was created with Velero.
Once the backup is saved, test to ensure the restoration is working as intended. First, switch the Kubernetes context to the recovery cluster:
$ kubectl config use-context arn:aws:eks:ap-northeast-1:132053483863:cluster/recovery
To restore the cluster resources, run the command:
$ velero restore create darwin-restore --from-backup darwin-backup
This returns a response similar to:
Restore request "darwin-restore" has been successfully created.
Use "velero restore describe darwin-restore" for more details.
Run the following command to check that the restoration has been performed successfully:
$ velero restore describe darwin-restore
On successful restore, the following output is returned:
Name: darwin-backup
Namespace: velero
Labels: <none>
Annotations: <none>
Status:
Phase: Completed
Start Time: 2023-01-20T12:46:12Z
End Time: 2023-01-20T13:31:12Z
Spec:
Backup: darwin-backup
Snapshot Volumes: true
Include Resources:
- Deployments
- Services
- Pods
- Persistent Volume Claims
Restore Pods:
Completed: 1
Failed: 0
In Progress: 0
Restored Resources:
Deployments: 1
Services: 1
Pods: 1
Persistent Volume Claims: 1
Snapshots:
Completed: 1
Failed: 0
In Progress: 0
Hosting applications on Elastic Kubernetes Service (EKS) provides many benefits such as scalability, flexibility, and ease of use. However, as we’ve seen above, backing up and recovering these applications can be a complex task. With its enterprise-grade, Kubernetes-aware backup and recovery, NetApp BlueXP provides advanced data management and protection features that are not offered by AWS out of the box.
Leveraging an incremental-forever block-level data replication, BlueXP backup and recovery leverages Cloud Backup to create space-efficient backup capabilities for EKS clusters hosted using Cloud Volumes ONTAP. By only replicating the changes made since the last backup copy, it drastically reduces the amount of transfer data and storage space required. Additionally, with its search-and-restore indexed catalog feature, Cloud Backup allows you to quickly search for and restore specific files, folders, or objects with the list of available backups.
To prevent your backups from being vulnerable to attack vectors, the Cloud Backup ransomware protection feature scans your backups and automatically alerts about any attempt to encrypt or delete your backup data, ensuring your backups remain safe and accessible.