More about Azure Big Data
December 1, 2020
Topics: Cloud Volumes ONTAP AzureDatabaseElementary7 minute readAnalytics
What are Azure Analytics Services?
The Microsoft Azure cloud provides a range of managed services that can help your organization ingest, process, and analyze big data using a variety of technologies and approaches, including machine learning, Hadoop and Apache Spark, stream processing, and business intelligence (BI).
Azure analytics services are offered in several deployment models, including Platform as a Service (PaaS) and Infrastructure as a Service (IaaS). Integration is seamless between Microsoft services, as well as third-parties.
Related content: read our guide to Azure big data.
In this article, you will learn:
- Azure Big Data Architecture
- Azure Analytics Services
- Azure Analytics Services with NetApp Cloud Volumes ONTAP
- Learn More About Azure Database
Azure Big Data Architecture
Big data architectures are complex and may vary per unique needs and designs. However there are certain logical components that should be built into the architecture.
This following diagram demonstrates how logical components should work in big data architectures. Note that not all solutions use all components.
Image Source: Azure
- Data Storage—Azure offers a dedicated service offering unlimited, low cost data storage, called Azure Data Lake Store. Another good alternative is using blob containers in Azure Storage.
- Batch processing—big data solutions often use long-running batch processing jobs to aggregate, filter, and prepare data for analysis. To do this in Azure, you can run U-SQL jobs in Azure Data Lake Analytics. Alternatively, you can use Pig, Hive, Map/Reduce via HDInsight Hadoop, or HDInsight Spark.
- Real-Time Message Ingestion—most applications require a specialized mechanism for message ingestion, known as stream buffering. This process creates a buffer for messages, which supports reliable delivery and enables scale-out processing for message queues. You can do this using Azure Event Hubs, Azure IoT Hub, and Kafka.
- Stream processing—real-time messages need to be filtered, aggregated, and prepared for analysis, then written into an output sink. Azure Stream Analytics offers managed stream processing based on SQL queries. Another option is Storm or Spark Streaming in an HDInsight cluster.
- Analysis and Reporting—your big data architecture should be built with a data modeling layer like a tabular data model or a multidimensional OLAP cube. This is provided by Azure Analysis Services. For full BI analysis, you can use Microsoft Power BI. Data scientists on your team can leverage Jupyter notebooks, Python or R with Microsoft R Server.
Related content: read our guide to Azure big data.
Azure Analytics Services
Azure Synapse Analytics
Azure Synapse combines enterprise data warehousing with big data analytics. This analytics service lets organizations query data on their terms, at scale. It offers flexible options, including serverless on-demand and provisioned resources. Azure Synapse helps combine warehouses with big data analysis, providing a centralized interface for data ingestion, preparation and management.
This is an analytics platform, based on Apache Spark and built for seamless use in Azure’s platform. Databricks provides an interactive workspace, streamlined workflows, and a one-click setup. The latter is especially useful to promote collaboration between data roles, including scientists and engineers, as well as business analysts.
The Hadoop enables performance of complex, distributed analysis jobs on any volume of data. HDInsight simplifies the process of creating big data clusters in Hadoop, letting you quickly create and scale clusters based on individual needs.
HDInsight provides all Hadoop tools, including Apache Kafka, Apache Spark, Hive, Storm, and HBase. Additionally, the service provides enterprise-scale infrastructure for monitoring, compliance, security, and high availability.
Azure Data Factory
This service was designed for Extract Transform Load (ETL) operations handling structured data that require processing on massive scales. The ETL process is applied on data from structured databases. Data is first collected, then cleaned, and then converted into a format suitable for analysis.
Data Factory provides a codeless process for building both ETL and Extract Load Transform (ELT). There is no need for code or configuration. Data Factory comes with built-in connectors for more than 90 data sources.
Azure Machine Learning
Azure Machine Learning, commonly referred to as Azure ML, is a library providing pre-packaged and pre-trained machine learning algorithms. In addition to algorithms, Azure ML provides a UI for building machine learning pipelines including training, evaluation, and testing.
Azure ML also provides capabilities for interpretable AI, including visualization and data for a wide range of purposes. These features can help you better understand model behavior, implement fairness metrics, and compare algorithms to discover which variant is best for your purposes.
Azure Stream Analytics
This service includes real-time analytics and a complex event-processing engine. You can use Azure Stream Analytics to identify patterns and relationships in information extracted from various sources including sensors, devices, clickstreams, applications, and social media feeds. You can then use the patterns to trigger actions like building alerts, storing data for future use, and sending data to reporting tools.
Azure Data Lake Analytics
You can use Azure Data Lake Analytics to build data transformation software using a wide range of languages, such as Python, R, NET, and U-SQL. Data Lake Analytics is great for processing data in the petabytes. However, the service does not pool data in a data lake when processing, as occurs in Azure Synapse Analytics. Instead, Data Lake Analytics connects to Azure-based data sources, like Azure Data Lake Storage, and then performs real-time analytics based on specs provided by your code.
Azure Analysis Services
This is a fully-managed platform as a service (PaaS) offering for data modeling, used for enterprise-grade cloud-based data models. Azure Analysis Services offers features for advanced modeling and mashup, which enable you to combine data from various sources, set up metrics, and secure all your data in one tabular semantic data model. This lets you perform ad hoc data analysis more easily and quickly with various tools, including Excel Power BI.
Azure Data Explorer
This service enables fast and scalable data exploration of log and telemetry. You can use this service to handle the massive amounts of data streams generated by various systems, including features for collecting, storing, and analyzing data. A major advantage of Azure Data Explorer is that it lets you do complex ad-hoc data queries in seconds.
Azure Data Share
Azure Data Share enables simple and secure data sharing with multiple collaborators, including external users like customers and third-party partners. The service can help you provision new data sharing accounts in a few clicks, as well as add datasets and invite users to use the account. A major advantage of Azure Data Share is that it helps to easily combine data from third party sources.
Azure Time Series Insights
Azure Time Series Insights Gen2 provides end-to-end Internet of Things (IoT) analytics capabilities that can be scaled according to changing needs and demands. The platform provides a user-friendly interface and APIs for integration with existing tooling.
Azure Analytics with NetApp Cloud Volumes ONTAP
NetApp Cloud Volumes ONTAP, the leading enterprise-grade storage management solution, delivers secure, proven storage management services on AWS, Azure and Google Cloud. Cloud Volumes ONTAP supports up to a capacity of 368TB, and supports various use cases such as file services, databases, DevOps or any other enterprise workload, with a strong set of features including high availability, data protection, storage efficiencies, Kubernetes integration, and more.
Cloud Volumes ONTAP supports advanced features for managing SAN storage in the cloud, catering for NoSQL database systems, as well as NFS shares that can be accessed directly from cloud big data analytics clusters.
In addition, the built-in storage efficiency features, including thin provisioning, data compression, deduplication, and data tiering, reduce storage footprint and costs by up to 70%.