NoSQL databases are revolutionizing database deployment, shifting many organizations from more traditional database models. However, there are some differences between the way NoSQL operates that can make it challenging at the enterprise level to build NoSQL cloud databases.
In this entry of our databases in the cloud series, we will discuss the NoSQL architecture and then consider how NetApp’s Cloud Volumes ONTAP can help NoSQL in cloud deployments. We will also explore the way in which Cloud Volumes ONTAP supports big data analytics, a typical use case for NoSQL databases in the cloud.
NoSQL database systems represent a paradigm shift from traditional, relational databases, which manifests itself in two overarching areas. Firstly, NoSQL databases primarily make use of non-relational data structures, for example graphs, semi-structured documents, such as JSON and XML, key-value maps, etc.
Secondly, whereas an RDBMS normally scales up vertically, NoSQL systems adopt a horizontal scale out strategy, allowing them to work with larger volumes of data and improve the availability and performance of the database. Building a NoSQL database cluster in the cloud therefore involves a significant amount of compute and storage management.
With NoSQL databases, each node in a cluster requires access to its own block-level storage allocations, with new compute and storage allocated as the cluster expands. Most relational database systems simply scale to larger compute hosts, possibly with the addition of read-only database replicas.
The disparity in cloud resource usage between NoSQL and relational databases requires DBAs and cloud architects to re-evaluate database systems deployment.
MongoDB, Apache Cassandra, Hadoop, and Couchbase are some of the prominent types of NoSQL databases. These databases are each deployed as a cluster of nodes that work together to provide high availability and performance at scale. Distributing data across the database nodes is achieved automatically through a process called sharding, which is usually based on a hashed value of a set of fields in each data record. To remove single points of failure, each node will replicate its own data to a number of other nodes in the cluster, as determined by the replication factor. These are the big NoSQL advantages.
Migrating data from on-premises NoSQL database clusters to the cloud requires the use of tools and processes specific to each database platform. Most mature NoSQL database systems support a form of cross-datacenter replication that is used to create a second cluster and keep it incrementally synchronized with the primary site. A failover to the second cluster is used to switch database operations over to the new location.
Large NoSQL database clusters are able to utilize the aggregate processor, memory, and storage resources of all participating nodes. When a node fails, the database system remains operational, and a new replacement node can be added back into the cluster. As the new node does not contain any data, the database cluster will rebalance data onto the node from the rest of the cluster. This operation, however, can take time to complete and system performance may degrade while it is taking place. Rebalancing also occurs when new nodes are added to grow a cluster.
Working with large volumes of data dispersed around a sizable cluster of database nodes makes administrative operations, such as backup and restore, creating database test environments, and storage management, all more complicated than with a traditional database system.
Managing storage with Cloud Volumes ONTAP provides a wide range of benefits for NoSQL deployments in the cloud using AWS, Azure, or Google Cloud storage, including.
NoSQL in cloud deployments is frequently used for big data management and analytics projects. Organizations use big data to analyze huge datasets in order to uncover hidden patterns, insights and improve business decisions.
Cloud Volumes ONTAP serves out block-level storage, as is used by NoSQL database systems, and also NFS and SMB file shares, which can be used to store the large datasets consumed by cloud-based analytics services. Apache Hadoop can connect through to NFS storage using NetApp In-place Analytics. This allows data files to be stored in a single, central repository and accessed uniformly by all users and services.
Each compute node in an Apache Hadoop cluster normally stores a part of the full dataset to be processed, in a similar way to NoSQL database systems. Separating out the data storage, however, by using Cloud Volumes ONTAP to create a data lake has the following benefits:
NoSQL database clusters benefit from storage systems that support advanced features for managing block-level storage. Cloud Volumes ONTAP provides unparalleled levels of storage management in the cloud, catering for block-level storage for NoSQL database systems, as well as NFS shares that can be accessed directly from cloud big data analytics clusters.
The built-in storage efficiency features have a direct impact on costs for NoSQL in cloud deployments. The data protection and flexibility provided by features such as snapshots and FlexClone® give NoSQL database administrators and big data engineers the power to manage large volumes of data effectively.
For more in our cloud databases series, check out the previous entries on database challenges, SQL, Oracle, and the next part which will focus on database storage tiering in the cloud.