BlueXP Blog

Rsync or Cloud Sync - Putting DIY Data Migration Tools to the Test

Written by Gali Kovacs | Sep 21, 2017 12:00:00 PM

Migrating to the cloud is never an easy process for a company to undertake. Beyond the culture shift, migrations require finding the right type of tools to manage the lengthy process of moving all of your data from on-premises storage over to the cloud.

The planning process for the move is key: solutions that may look good at first glance, such as building your own transfer tool with open-source utilities, can actually wind up costing you down the line.

In this blog, we’ll show you why NetApp’s solution for data migration, Cloud Sync is the smart choice for enterprise-level data migration in the cloud, especially as it's compared to other DIY solutions such as Rsync.

Getting Your Data to the Cloud: DIY Solutions

If you decide that you’re ready to begin a migration to the cloud and want to do it on your own, you’ll have to create a homebrew solution for synchronizing your NFS or CIFS shares to Amazon S3. With DIY synchronizing tools you can send data to the cloud.

By switching the destination and source parameter commands, you can also get your data back from Amazon S3. DIY data transfer solutions include rclone, s3cmd, AWS CLI and rsync. Since rsync is the most popular in this group, this post is going to look at it in depth.

Rsync

One of the most popular DIY solutions for synchronizing data to the cloud is rysnc. An open-source utility, rsync has many fans among administrators using Linux systems.

It is used mainly to automate the creation of backups and its appeal lies largely in its ability to sync incrementally: only the data that has changed since the previous sync is updated upon the initial baseline.

For network usage, that’s a big optimization. For rsync to copy files to Amazon S3 and back, a Linux machine has to have an Amazon S3 bucket mounted to it.

 

The main challenge with using rsync is that it requires a lot of manual work such as writing scripts, monitoring the transfers and managing the tool. Rsync also lacks monitoring abilities—there is no way to detect errors or analyze success rates.

In addition, Rsync does not enable incremental synchronization in the object file system on Amazon S3. Amazon S3 files aren’t updated, they are always recreated entirely. This obviously has both cost and time impacts. 

You have to appreciate just how big of a job creating a viable homebrew solution is going to be: there is much more to consider than just syncing to Amazon S3. 

Designing a homebrew solution means implementing a number of components by yourself, including:

  • error alerts/monitoring
  • validation
  • scheduling
  • logging
  • performance reports
  • access controls
  • systems integration

It is a process that requires a lot of experimentation, and that means a lot of setbacks and failures. Such manual labor creates a risk for human errors (i.e; forgetting a file or a folder) that can result in data loss and significant business impact.


The NetApp Solution: Cloud Sync

For users who prefer a data migration service, NetApp has a data synchronization service called Cloud Sync. Cloud Sync comes with features that make it a full-fledged service rather than a DIY tool . Enabling secure and fast data transfers to Amazon S3, Cloud Sync can also transfer to multiple endpoints.

The Cloud Sync service offers users an intuitive interface that securely and reliably synchronizes data to or from Amazon S3 from any NFS or CIFS share, whether in the cloud or on-prem. With Cloud Sync less goes wrong, and error reporting allows you to know exactly what happened so you can ensure it doesn’t happen again.

Should something happen while you are syncing data with Cloud Sync, you don’t have to start over from scratch with a costly rebuild like you might have to do with DIY tools.

The biggest difference between Cloud Sync and DIY solutions such as rsync, is Cloud Sync’s speed and manageability capabilities. Data synchronization with Cloud Sync is much faster than alternatives because it takes advantage of parallel processing, which significantly improves throughput processing. 

When you add to the speed factor Cloud Sync’s management features and its ability to sync with multiple endpoints, the costs and trial and error of crafting a DIY data migration solution from scratch are outweighed.

Cloud Sync vs. Rsync

We tried out both solutions using an NFS with 1TB of data with files of various sizes in directories that would best simulate the kind of data typically transferred to Amazon S3. The table below shows what we found:

 

Solution

Average Transfer Speed (MB/s)

Time Spent (min)

Bandwidth Utilization

Cloud Sync

93.04

191

74.4%

Rsync

9.98

1858

8.0%

 

These results clearly show that Cloud Sync consistently outperforms rsync. The advantage comes from Cloud Sync’s ability to process files in parallel, giving it a better-than-average transfer speed and utilizing network bandwidth more effectively.

Conclusion

Enterprises moving to the cloud need to find a way to migrate enormous amounts of data efficiently and securely. The DIY route comes with its own headaches—besides being error-prone, they also lack support features.

An enterprise-grade migration service should be robust and come with user management, scheduling, and reporting features, making sure your data gets to and from where it needs to be safely and surely. Cloud Sync delivers all of the above.

With Cloud Sync, users get a data migration solution that completely leverages all that AWS has to offer when it comes to Amazon S3 buckets. Its ease of use and simple setup process make it attractive to admins. Its ability to avoid the hidden costs associated with creating and running DIY solutions, such as maintenance and app downtime and a pricing plan tied to relationships rather than data also makes it appealing to budget-conscious department heads. 

If your company is ready to make the move to the cloud, find out more about Cloud Sync and how it can make your migration goals a reality.