More about Azure Big Data
What is Azure NoSQL?
NoSQL databases are databases based on data models other than relational tables. Types of NoSQL databases include key-value, document, graph, and wide-column. These databases are becoming more popular as organizations create larger volumes and a greater variety of unstructured data.
In Microsoft Azure, there are multiple options for NoSQL databases and a variety of ways to host or deploy these tools. Azure big data offerings for NoSQL include MongoDB, Gremlin and Cassandra.
Related content: read our guide to NoSQL vs SQL
In this article, you will learn:
- Types of Azure NoSQL Databases
- Azure NoSQL Managed Database with CosmosDB
- Azure Cosmos Tips and Tricks
- Azure NoSQL with Cloud Volumes ONTAP
Types of Azure NoSQL Databases
Azure provides options for four types of NoSQL databases, including key-value, document, columnar, and graph. Below is an explanation of how these databases differ and what Azure services provide to each type.
Key-value databases store paired keys and values in hash tables. These tools enable you to assign data values to unique keys with the ability to retrieve data by the key later. With a key-value database you can store any number of values. Data schemas are provided and interpreted by applications attached to the database.
You can use key-value databases for applications running simple lookups. However, note that a key-value database is not designed for value queries, and is less suitable for querying data. The main advantage of using key-value databases is its scalability, which is usually the result of easy data distribution across multiple nodes on separate machines.
Here are key use cases for key-value databases:
- Session management
- Data caching
- Product recommendation
- User preferences
- Serving ads
- Profile management
Key-value databases offered on Azure: Cassandra (via Cosmos DB managed service)
Document databases work like key-value databases except that rather than individual values, they store whole documents arranged in groups or collections. These documents contain key-value pairs which you can query to access information. Documents can be stored in a variety of formats, including JSON, YAML, or XML. In these databases, each document can contain a different structure.
It is possible to store different data types in documents and organize several documents using different structures. However, you should make sure that each document contains only single entity data, belonging to an order or a customer. It is possible to store information in one document and then distribute it across multiple relational tables in your RDBMS.
Here are key use cases for document databases:
- Content management
- Product catalog
- Inventory management
Document database offered on Azure: MongoDB (via Cosmos DB managed service)
Also known as column-family or wide-column databases, Columnar database enables you to store data in table format. However, unlike a relational database, these databases have a denormalized approach to sparse data. This enables you to store multiple tables within a database without requiring each table or row to contain the same information.
While key-value databases and document databases use a computing hash for storage, the majority of column-family databases store data in key order. Typically, columnar implementations let you create indexes for specific columns in a column-family. You can then use indexes to retrieve data, using columns value instead of row keys.
In columnar databases, read and write operations for a row are typically atomic with a single column-family. Some databases may provide atomicity across the entire row, for multiple column-families, but this is not the typical case.
Here are key use cases for columnar databases:
- Activity monitoring
- Social media analytics
- Sensor data
- Weather and other time-series data
- Web analytics
Columnar database offered on Azure: Azure Table Storage
Graph databases map the relationships between data using nodes and edges. Nodes are your data values and edges are the relationships between these values and can be directional. These databases are designed to represent intricately related or hierarchical data structures.
Graph databases perform queries on a network made of nodes and edges, and can analyze the interconnected relationships. Large graphs with many entities and relationships can quickly perform highly complex analyses. Additionally, graph databases often provide a query language that enables you to efficiently move between relationship networks.
Here are key use cases for graph databases:
- Fraud detection
- Social graphs
- Recommendation engines
- Organization charts
Graph database offered on Azure: Gremlin (via Cosmos managed service)
Azure NoSQL Managed Database with Cosmos DB
The most robust NoSQL service that Azure offers is Cosmos DB. This is a managed service that includes engines and APIs for a variety of database formats. Cosmos DB includes features for transparent multi-master replication, turnkey global distribution, automatic scaling, and multiple options for consistency. Cosmos DB is the primary offering but includes variations through API, including options for the following:
Gremlin API in Azure Cosmos DB
The Gremlin API is a graph computing framework based on Apache TinkerPop. It enables you to elastically scale your database with automatic graph partitioning and includes automatic indexing. This API uses the Gremlin syntax and enables you to perform real-time queries without specifying views, secondary indexes, or schema hints.
MongoDB API in Azure Cosmos DB
The MongoDB API applies the database's wire protocol to Cosmos DB. This enables you to leverage native MongoDB tools, drivers, and client SDKs. With this API you can migrate existing applications with minimal modifications and ensure that your applications remain vendor agnostic.
Table API in Azure Cosmos DB
Table API is an interface that enables you to extend applications written for Azure Table Storage with Cosmos DB features. For example, dedicated global throughput and single-digit millisecond latencies. With this API, you can migrate applications with no code changes. This API includes client SDKs for Node.js, Python, Java, and .NET.
Azure Cosmos DB Cassandra API
The Cassandra API enables you to connect to existing Cassandra applications and use the Cassandra Query Language (CQL) for data recall. Additionally, this API provides features for persistent change logging, compliance certifications, and multiple consistency levels.
Related content: read our guide to Azure SQL Database
Azure Cosmos DB Tips and Tricks
Use Request Units (RUs) to Define Throughput
To configure the performance of containers and databases in Azure Cosmos DB, you need to define the throughput. To do this, you need to assign Request Units (RUs), which instruct Cosmos DB on how to handle the performance required for any workload.
RUs help you ensure that even highly demanding workloads, whether it’s located in the database or a collection, is optimized according to throughput. Hardware is handled by the cloud vendor, and your responsibility is to assess your performance needs and define how many RUs to provision for each container.
RUs and Billing
RUs can help you control and optimize your billing. When you define a minimal throughput per Cosmos DB container, you are essentially defining your billing. You can learn more about RU pricing in the official page, which provides a Request Unit Calculator. You can use the calculator to estimate how many RUs you might need, and how much it will cost you.
You can use Azure Cosmos DB to manage data on a massive scale, distributed globally. Here are key benefits to using geo-replication:
- Lower latency and higher availability—when using one Azure region, regional replicas can guarantee RUs are reserved on the container. Geo-replication ensures data is provisioned closer to applications, ensuring lower latency and higher availability.
- Writes and reads in every region—Azure Cosmos DB comes with a multi-master replication protocol, which ensures all regions support writes and reads. This enables applications to find the nearest region and send a request.
- Availability—when you geo-replicate data, you can ensure data remains available even if a region becomes unavailable. If that happens, other regions are configured to automatically handle requests from the unavailable region.
- Simple controls—in Azure Cosmos DB, adding and removing regions is a matter of a few clicks. You first need to select the regions of your choice, and then you can either do a manual failover or opt for automatic failover priorities.
Azure NoSQL with Cloud Volumes ONTAP
NetApp Cloud Volumes ONTAP, the leading enterprise-grade storage management solution, delivers secure, proven storage management services on AWS, Azure and Google Cloud. Cloud Volumes ONTAP supports up to a capacity of 368TB, and supports various use cases such as file services, databases, DevOps or any other enterprise workload, with a strong set of features including high availability, data protection, storage efficiencies, Kubernetes integration, and more.
In particular, Cloud Volumes ONTAP helps in addressing database workloads challenges in the cloud, and filling the gap between your cloud-based database capabilities and the public cloud resources it runs on.
Cloud Volumes ONTAP supports advanced features for managing SAN storage in the cloud, catering for NoSQL database systems, as well as NFS shares that can be accessed directly from cloud big data analytics clusters.
In addition, the built-in storage efficiency features have a direct impact on costs for NoSQL in cloud deployments. The data protection and flexibility provided by features such as snapshots and data cloning give NoSQL database administrators and big data engineers the power to manage large volumes of data effectively.