Big Data Analytics in the Cloud: Storage Challenges

Written by Yifat Perry, Technical Content Manager | Mar 18, 2021 8:37:09 AM

Data is the new currency, as organizations are focused on mining valuable business insights using information collated from multiple sources. The key enabler for this is data analytics services, fueled by the scale of cloud.

This blog will explore the storage challenges associated with big data analytics and explore how Cloud Volumes ONTAP can help address them:

What Is Big Data Analytics?
Big Data Analytics Challenges
Big Data Storage with Cloud Volumes ONTAP

What Is Big Data Analytics?

In an all-connected world, the data estate is growing exponentially. Big data analytics seeks to gain business insights from that crucial resource. Big data analytics contribute to business decision making by revealing information on usage patterns of customers, purchase correlations, customer preferences, market trends, etc.

Big data takes into account all the data—whether it’s unstructured, structured, and semi-structured—that can be leveraged by an organization to extract meaningful insights. Big data analytics helps here by processing and transforming the data to structured information that organizations can benefit from.

The major steps in the analytics process can be summarized as follows:

Capture
Storage
Transfer
Analytics and
Visualization

Whether it’s machine-generated data, application usage patterns, or data coming in from IoT devices, analytics services handle petabytes of data every day. This increase in data estate brings in its own set of challenges, both in terms of storing and processing that data. The scalability of cloud storage helps here to address capacity requirements, but there are other challenges associated with the analytics process, in terms of performance, reliability, management overhead, and more.

The cloud has become a preferred destination for big data analytics due to the range of analytics tools, services, and technologies available out of the box. Largely, this is due to the storage component: cloud storage can accommodate petabytes of big data, while providing the agility to scale on demand. The pay-per-use nature of the cloud helps to convert CAPEX to OPEX, meaning there’s no more upfront investment for costly storage devices that will eventually need to be upgraded based on data growth.

The cloud also gives customers the flexibility to deploy analytics services in a do-it-yourself model using IaaS or by subscribing to any of the popular big data analytics services available in the PaaS model. Irrespective of the approach, the big data storage layer becomes the key differentiating factor for successful analytics. If the storage layer is not well designed to deal with the demands of the big data analytics compute, the outcomes will be less than optimal.

Big Data Analytics Challenges

Big data analytics needs the storage layer to handle the performance demands and increasing capability demands, all while ensuring the reliability and security of the stored data. Moving data to the cloud is a first step, but choosing the right storage can still be a challenge.

Let’s explore some of these considerations to take into account.

Consistent high performance: Big data analytics applications are usually very compute intensive as the results are time sensitive. This calls for consistent performance to cater to high-IOPS storage calls. The cloud offers multiple storage options to meet varying performance demands—however, identifying the best fit option to meet your big data analytics application demand could make or break the desired outcomes.

Uptime and reliability: Keeping the lights on for the storage layer is critical for all analytics applications. Storage should be highly resilient and designed for high availability. While hosted in the cloud, storage should be protected cross-AZ from possible service provider outages.

Ease of management: Organizations running big analytics workloads often have more than one analytics application in their portfolio. Managing storage for all these applications has its own associated operational overhead, which often involves hopping between multiple management panes. When the data estate gets larger, a complex process becomes even further complicated.

Lower cost of ownership: Leveraging cloud storage for data analytics applications helps to convert CAPEX to OPEX. Though you no longer have to invest upfront on storage devices, you might still end up paying a substantial amount for monthly cloud storage for incremental data growth, which is usually the case with analytics applications. Hence it is important to use a data management solution that can optimize the storage being used.

Data protection: For big data analytics, the integrity and protection of data is of utmost importance. Corrupted data can tamper with the usability of insights delivered by the analytics application, whereas data loss can delay the entire process. In the event of a data corruption or malware infection, data should be easily recoverable to avoid loss of business value.

Hybrid and multicloud architecture: Rather than sticking to a single cloud service provider, many organizations are leaning more towards hybrid and multicloud architectures that make it possible to leverage the best services available on every platform. However, this also means that data will be spread across multiple heterogeneous environments, making it difficult to gain full visibility into the data estate or manage it seamlessly, whether it comes to provisioning, scaling, syncing, or periodic maintenance.

Data mobility and automation: When there are multiple environments for development, staging, and production, the storage layer should enable data mobility between these environments. Data mobility also becomes important during cloud adoption where a bulk amount of data is required to be moved from on-premises to the cloud. The ease with which the process can be automated is a key differentiator while selecting the storage service for analytics applications.

Big Data Storage with Cloud Volumes ONTAP

Cloud Volumes ONTAP is the data management platform that delivers the trusted NetApp ONTAP capabilities to Azure, GCP, and AWS. It is built on top of native cloud storage layer, providing an enhanced experience through NetApp proprietary features especially for specialized workloads like big data analytics. Cloud Volumes ONTAP provides block-level storage over CIFS/NFS/SMB that can be used to store large volumes of data used by analytics applications. It helps address the storage challenges faced by organizations for big data analytics in the cloud.

Assured performance: The Cloud Volumes ONTAP FlashCache feature can be used to store recently used data in local NVMe storage. This helps to serve read requests faster without accessing the storage layer, thereby enhancing the performance of analytics applications.

The process is transparently managed by Cloud Volumes ONTAP with no additional overhead for the customer. The Multi-LIF (logical interfaces) configuration of Cloud Volumes ONTAP helps to leverage multi-path I/O for large data set sequential reads required for analytics applications, and helps to improve the overall throughput. Cloud Volumes ONTAP also provides a high write speed setting that allows data to be written directly to cache to increase sequential write performance. The data will later be committed to block storage.

High availability: Cloud Volumes ONTAP can be deployed in HA pairs across availability zones in the cloud to avoid single point of failure and reach RPO=0 and RTO < 60 seconds. Even if one of the Cloud Volumes ONTAP nodes goes down, your access to the storage layer is never interrupted. The data stays synchronized between the two nodes, which helps in ensuring the much-needed data availability and integrity crucial for big data analytics in the cloud.

Single management pane and automation: NetApp Cloud Manager provides a single management pane for Cloud Volumes ONTAP, allowing you to manage volumes across on-premises and multicloud deployments. Irrespective of where the data resides, the same processes and interfaces can be used for managing the data volumes. And for developers, Cloud Manager APIs can be used to automate the storage lifecycle management and integrate it with your existing DevOps practices.

Storage efficiency: If big data is anything it’s that: big. Cloud Volumes ONTAP’s storage efficiency features, such as thin provisioning, data deduplication, and compression, can reduce the overall storage space and associated monthly charges by up to 70%.

With Cloud Volumes ONTAP data tiering inactive data is automatically tiered to less-expensive cloud object storage, further reducing the overall costs for your data when you aren’t running analytics jobs. The FlexClone® feature can be used to create writable clones of Cloud Volumes ONTAP volumes that consume storage only for changes to the clone. This enables agile deployment of new environments for analytics without burning a hole in your cloud storage budget.

Data protection: NetApp Snapshot™ technology creates rapid, point-in-time copies of Cloud Volumes ONTAP volumes that can be used to quickly recover data in the event of any data loss or corruption. Snapshot copies are application consistent and highly storage efficient. NetApp SnapMirror® technology can be used to replicate data to multiple environments for DR purposes. This block-level and highly storage-incremental replication helps to recover your analytics applications in alternate DR sites, should the primary site become unavailable. Cloud Volumes ONTAP can also protect your on-premises data with the help of Cloud Backup Service.

Gain an Edge in Big Data and Analytics

The valuable insights derived from big data and analytics can become key enablers of success for many organizations. But gaining those insights depends on a performant, cost-efficient storage layer.

Leveraging Cloud Volumes ONTAP for big data storage in the cloud helps to boost this process. Data analytics in the cloud is the future, and Cloud Volumes ONTAP with its advanced data management capabilities, high availability, and unparalleled storage efficiency can make an investment in the cloud even more valuable.

View full post