hamburger icon close icon

Guarding Your Trove: What's So Hard About Protecting Petabyte-Scale Data?

Enterprises have more data on hand than ever before. And that data is driving some of the most important work in many organizations. But with petabytes (PBs) of data to deal with, your IT teams can be overwhelmed, especially when it comes to figuring out how to protect all that data. Can cloud backup be the answer?

In this blog we’ll explore the main challenges when it comes to backing up petabytes of data and what that means for you when you’re choosing your backup solution.

Read on, or use these links to jump down:

Where Does PB-Scale Data Come From?

Why do today’s organizations have so much data? Swedish physicist Hans Rosling wrote that “The world cannot be understood without numbers. But the world cannot be understood with numbers alone.”

Today’s businesses aren’t just collecting “numbers”—the petabytes of data they’re accumulating matter to their operations, relying largely on NAS-based storage systems to find ways to accommodate all of that data.

Here are a few reasons why:

  • Digital transformation. More processes are being handled digitally, which by definition demands more data. For example, we’re digitizing paper records, moving to cloud-based services, and creating digital archives of historical data. We’re also storing images and videos, which require more space than simple text and numbers.
  • Data-intensive applications. Data is how organizations derive business intelligence, which involves finding meaning, stories, and patterns in those numbers. And that’s hard work!
    More companies are taking advantage of machine learning, artificial intelligence, and big data analytics to derive meaningful insights and drive business outcomes, but this demands the ability to store and process more data than ever. For example, data analytics solutions can help mine call center data to analyze and optimize customer-service and support experiences.
  • Smart devices and facilities. Beyond traditional systems and applications, data is coming in from all directions: IoT, industrial sensors, and more data-generating applications. For example, data generated by smart factories can be used to monitor performance, detect anomalies, and optimize operations.

But relying on such a large amount of data comes with the need to find ways today to protect that data. And as more and more organizations enter the petabyte realms, the harder that has become.

Top Challenges of Protecting PB-scale Data

As AWS CTO Werner Vogels once said, “Everything fails… all the time.” Planning and preparing for failure is one of the key tasks of every IT department. And that becomes even more relevant when it comes to data storage and access on a PB scale: losing even a small fraction of petabytes of data can be catastrophic.

Backups are your ultimate line of defense against cyberattacks, ransomware, sabotage, and worse. Often, a complete backup is the last line of defense standing between your business and disaster. If a backup is not complete, or is corrupted or missing, it could set you back days, weeks, and maybe even longer.

But it’s much harder to ensure you have effective backups when dealing with PB-scale data sets. Many solutions, even if they claim to be able to handle large-scale data, are actually designed for the old scale of data. They may work well with megabytes and even terabytes, but they can’t gracefully handle the petabytes we’re dealing with today.

The most common storage models used on-prem are still network attached storage (NAS), along with storage area networks (SAN). NAS uses clustering to scale up to store data into the PB range. A good example of technology that can do this on NetApp systems is FlexGroup volumes. That’s because NAS systems of any size generally rely on NDMP-based backup solutions.

NDMP-based back-up solutions face a number of challenges:

  • Slow backups
    Realistically, when you’re using a conventional NDMP-based backup system for petabyte-scale data, it can take days or even weeks to create a full backup due to network congestion and the challenge of increasingly large volumes. NDMP is slow in general, at the PB-scale it’s simply not sufficient.
  • Slow scanning
    Another challenge is scanning data prior to backup to discover which changes have been made. Traditionally, incremental backups have been an important optimization strategy because you’re only backing up what has changed since the previous backup. But at PB scale, the task of identifying which files have been modified can become unbearably time consuming and monopolize system resources that are needed elsewhere.
  • Backup and restore testing
    Once a backup has been created, most businesses’ standards (along with some regulatory requirements) demand that it be tested, which means additional days of work.

With all these challenges, many IT pros have discovered that backing up PB-scale data means they can’t finish within the allotted backup window – essentially, backups are getting skipped as a result.

A solution that can’t optimize today’s massive amounts of data will ultimately fail. And when older solutions struggle, they quickly become unreliable, forcing your teams to pick up the slack.

That’s why when it comes to protecting PB-scale data, you need to take a careful look at how well any potential solution handles backups.

What You Need to Protect PB-Scale Data

A PB-scale treasure trove of data is going to require all the defenses that a realm can muster: at the scale we’re talking about, you need tools that can process and analyze large volumes of data quickly and accurately, a storage infrastructure that’s secure and highly available, with robust security controls in place to protect the data from external and internal threats. Your solution should also offer simplified methods to achieve compliance goals, providing full insight into where data is stored, who has access to it, and how it is being used.

Many backup solutions lack the most essential features to handle petabyte-scale data. At a minimum, you need to look for a platform that can provide:

  • Block-level backups for faster updates and restores
  • Direct backups with no need for a media gateway, making the process faster and more secure
  • Incremental forever back up so there’s no more wasting time on unnecessary full backup copies
  • Object storage as the backup destination, providing unlimited scale and huge potential cost savings

Today’s Petabyte-Scale Protection Comes from NetApp

NetApp BlueXP backup and recovery powered by Cloud Backup is a backup-as-a-service delivered through the single-pane BlueXP management console that is fully integrated with the PB-scale storage demands of FlexGroup volumes.

Designed specifically to work with ONTAP deployments, BlueXP backup and recovery leverages object storage to cost-effectively house backups: either in the cloud on AWS, Azure, and Google Cloud, or on-prem with NetApp StorageGRID® appliances. This provides the unlimited scale PB back up requires.

But there’s more than just scale to how BlueXP backup and recovery achieves this:

  • Backups are created directly, with no media gateway involved: that means all of ONTAP’s storage efficiencies are preserved, keeping the backup copy’s footprint optimized. 
  • These backups are updated incrementally forever: once the initial baseline copy is created, your entire data set never needs to be copied over again.
  • When updating copies, BlueXP backup and recovery operates on the block level, meaning it only transfers changed data that exists within 4KB blocks rather than updating entire files.

All of this comes together to make for a backup solution that’s simple, fast, and capable of protecting even PB-scale treasure troves of data with ease.

New call-to-action

Semion Mazor, Product Evangelist

Product Evangelist