BlueXP Blog

Top 2 Strategies for AWS S3 Data Replication

Written by Gali Kovacs | Nov 7, 2017 10:48:45 AM

Amazon Simple Storage Service (Amazon S3) is one of the most popular services on AWS: A highly scalable, inexpensive, fast, reliable storage infrastructure.

According to the Amazon S3 SLA, AWS offers 99.99% availability per a given year, meaning that your data can possibly be unavailable for a total of 52m 35.7s over the course of a one-year period. That is not a very long period of time if you consider the length of a year.

However, if your application needs the data stored on Amazon S3 and can’t access that data during a usage spike, customers will notice. It is a scenario you never want to happen.

How can you make sure your data is located in more than one location so you can recover if there is downtime at a primary location?

In this article we will show you two strategies for data replication with Amazon S3. One is by using Amazon S3’s Cross-Region replication feature. The other way is NetApp Cloud Sync, which can help you migrate your data with fast and efficient data transfers to and from Amazon S3 and your on-premises storage environment.

1. Cross-Region Data Replication

Knowing the ins and outs of Amazon S3 is important. When data is sent to Amazon S3, your objects are redundantly stored on multiple devices across multiple facilities within a region, but Amazon S3 has just a single point of failure.

Single point of failure for Amazon S3 is a region where all your data is stored, so should that region experience a problem, your data might be unavailable at that time. To avoid the single point of failure and increase the availability of your data, you can use the Amazon S3 Cross-Region replication (CRR) Feature.

When you store your first object on Amazon S3 storage, you need to create a bucket for your object. Along with the name of the bucket, you need to select a region for your bucket.

Even though every region consists of numerous independent data centers — which AWS refers to as Availability Zones — it is possible that a whole region may experience problems, making your data unavailable. Your data won’t disappear, but until the issues are resolved, your users and the application that uses this data will have problems. It’s necessary to use the Amazon S3 CRR feature to avoid this scenario.

Cross-Region data replication enables automatic asynchronous object copy across different AWS regions. Availability is not the only reason to use Cross-Region replication: It also helps with meeting compliance standards which require you to keep data stored in different locations around the world, or even on different continents.

Another reason to use Cross-Region data replication is to decrease latency, because if you have users on different geographic locations and you want to lower the latency in accessing your data, your can use Cross-Region replication to copy your data to the regions closest geographically to your users. In this section we’ll show you step-by-step how to do use Cross-Region replication.

How to Use Cross-Region Replication

To enable the CRR feature, first you need to create a bucket. Once you’ve done that, go to your Amazon S3 console.

Next, open up the “Properties” tab and enable versioning. In the section labeled “Advanced Settings,” enable Cross-region replication.


Image 1. Amazon S3 console: Bucket properties

Versioning means that you keep several versions of your object inside the same bucket.  The destination bucket also has to have versioning manually enabled.

Once versioning is enabled, you need to set up CRR by selecting the bucket as a source and then the destination region, destination bucket, and IAM roles. Keep in mind that you can’t have Cross-region replication for the same source and destination region, and data can be replicated in only one destination bucket.

IAM roles are used for adding permissions to Amazon S3 to replicate objects from destination to source bucket. If you select the option “Create new role” AWS will automatically create an IAM role with the necessary permissions.


Image 2. Set up Cross-region replication

Once you have your Cross-Region data replication configured, all newly added objects in the source bucket will be asynchronously copied to the destination bucket located in an another region.

If, prior to setting up Cross-Region replication, you had some objects stored in your source bucket, those objects won’t be replicated until they experience some kind of change.

You don’t need to use the AWS console to set up Cross-Region replication — You can also use the Amazon S3 API and/or AWS CLI commands. Replication can also be set up between different AWS accounts.

No matter how you decide to set up Cross-Region replication, once you have it in place, you have taken a huge step towards making sure your data stays available.

2. Migrating Data to and from On-Premises Storage and Amazon S3

What else can be done to make sure that you always have your data on hand in the slight 00.01% chance that Amazon S3 won’t have your data available? One solution is to store a copy of your data on-premises, one that is up to date, with the data stored in Amazon S3.

For that, NetApp Cloud Sync comes in handy.

Data has become the most valuable resource a company has, and that’s why finding a fast, secure and reliable way to transfer data to and from the cloud is such a challenge. This can be difficult in that most data transfer tools require a certain degree of familiarity in order to use them effectively; or they demand complicated configuration settings which can be time-consuming to set up.

Luckily, Cloud Sync isn’t as difficult as all those data transfer tools.

The main difference is that Cloud Sync isn’t a tool, it’s a service. Ready to function without making any previous configuration settings, Cloud Sync enables you to migrate your data to the cloud, and back from the cloud into your on-premises data centers with just a few clicks. Cloud Sync’s dashboard enables you to see the data source and destination, synchronization history and scheduler controls all in one place.

How Cloud Sync Works

Cloud Sync provides you with the exact copy of your source data on the destination target while keeping the same folder structure intact. It works both ways, for migrating on-premises data to Amazon S3, or going the other direction from Amazon S3 to your on-premises systems.

Thanks to the Cloud Sync engine and parallelized streams, Cloud Sync first checks the catalog to determine what is stored in the file system; then it breaks up the data it finds there into multiple transports and transfers them at the same time.

If we compare this data transfer method with some conventional, traditional tools such as rsync (which is serial and has to go serially through every directory), Cloud Sync turns out to be much, much faster. By the time the transfer process with Cloud Sync is complete, rsync will still be processing data.

Parallelization of operations makes Cloud Sync extremely fast.

Cloud Sync doesn’t just work well, it works with any other NFS/CIFS share. To perform a data transfer from an on-premises data center to Amazon S3, all you need to do in Cloud Sync is to create a relationship between your local storage (NFS/CIFS) and an Amazon S3 bucket.

During the creation of this relationship, a Data Broker will be automatically created. The Data Broker is used for the transfer and synchronization of your data as well as for conversion from file-based NFS/CIFS data sets into an object format that is used by Amazon S3.

It also enables you to “continuously synchronize” your data to or from the cloud.

“Continuously synchronize” is an asynchronous relationship, meaning that there’s a scheduler which allows Cloud Sync to help you avoid long upload time. Otherwise, you would need to perform the data transfer on your own, and use some other methods to achieve that.

The Cloud Sync scheduler can be configured to synchronize your data as often as you need.

Whether it’s for compliance reasons or just peace of mind, if you want to replicate your data to or from Amazon S3 then Cloud Sync is a robust choice for data transfers.

Cloud Sync is also useful if you have data in your on-premises data center that you want to make available to users of your application publicly, without giving them access to your local network.

By using Cloud Sync you can transfer your data on Amazon S3, where the users are able to access the data by clicking on the object links you provide.

Conclusion

Even though Amazon S3 availability and durability have a highly respectable SLA, single point of failure is a problem you always want to avoid.

Increasing data redundancy as much as possible is one solution to this challenge. This article showed you two ways to do that: Storing copies of data in multiple regions with Cross-Region replication, and transferring data from on-premises storage to the cloud and back with Cloud Sync.

Data is something that companies thrive on. Finding the way to protect that data can help a company avoid a technical glitch from becoming a financial disaster. Cloud Sync’s fast, secure, and reliable data transfer methods and data replication with Amazon S3 can help solve that problem.

Want to get started? Try out Cloud Volumes ONTAP today with a 30-day free trial.