hamburger icon close icon

Azure Purview: Hybrid and Multicloud Data Governance from Azure

When your applications are deployed across multiple environments, it can be difficult to track down data sets in use and ensure governance. For effective data governance, you need to know the origin of the data, the expanse of the data estate, and the value that can be mined from the data. Azure Purview was designed to help you do that.

This blog will introduce Azure Purview and its features that can be used by data consumers for end-to-end visibility and management of data.

Read on below to find out:

Multicloud Data Estate Governance Challenges

Data is everywhere in an enterprise IT landscape. In a multicloud deployment, that data can reside in multiple copies in more than one cloud. Roles like CISO (Chief Information Security Officer) or a CRO (chief risk officer) would be interested to have this information handy as they are accountable for the data security and compliance, but that’s easier said than done. There are a number of challenges to data estate governance when using a multicloud deployment:

  • Deployment Sprawl: Discovering data sources can be a major challenge for organizations that have data spread across multiple applications and environments. Most organizations don’t have centralized registries for their data sources. That prevents users from consuming data sources, as they are either unaware of the existence of the data or don’t have access to the data to derive insights from it.
  • Scale: The traditional approach for data discovery is to reach out to data experts in the team and then engage with them to gain access and use the data. This is not a scalable approach when the data estate is large—as it is at the enterprise level—because experts will simply not have the bandwidth to handle the number of requests.
  • Metadata Management: Additional efforts are required to add metadata or create documentation. The metadata and documentation have to be updated to maintain credibility. They should also be made readily available to data consumers who would want to use the information.
  • Security: The challenge for security teams is to ensure that data is coming from trusted sources and that only people with the right permissions can access the data. If a data breach exposes sensitive data such as PII and biographical information, a company is likely to face business losses and legal action for violating data privacy regulations. Data access control has to be implemented for all data sources and managed with unified security policies.

In response to these challenges, Azure developed Azure Purview.

What Is Azure Purview?

Azure Purview aims to address the data governance challenges by providing a unified interface to manage and govern your data, irrespective of where it is stored. You can bring in data from on-premises, other clouds, or even your SaaS applications under the fold of Azure Purview.

Azure Purview Components

The following components of Azure Purview provide discovering, cataloging, and managing capabilities for your data:

  • Purview Data Map: This service captures the metadata information from data sources on-premises and in the cloud. The data is kept updated through out-of-the-box data scanning of sources and classification. The service is based on Apache Atlas 2.0 and can be accessed programmatically through its open-source APIs or configured through the Azure portal UI.
  • Purview Data Catalog: This service helps data consumers—whether business or technical users—to search and find the data that is relevant for them. It provides features such as data classification and labelling, business glossary management, automated data tagging with glossary terms, and more. Data lineage tracking is another useful capability, where the lineage of data can be visualized from the original source, through their transformation and movement through various processing and analytics tools.
  • Purview Data Insights: Once the data is discovered and classified, the Purview Data Insights feature will provide a birds-eye view of the data usage and movement across your digital landscape. The insights provided by this service will be especially useful for security officers to detect anomalies or suspicious usage patterns in the data estate.

Azure Purview is currently in preview and supports onboarding of resources from data sources like Azure Blob storage, Azure Data Lake storage (Gen1/gen2), Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse Analytics. The authentication of data sources for onboarding can be done using credentials created for Azure Purview or using Managed Identity. While the onboarded data continues to reside at its source, its metadata and data source reference is added to Azure Purview for visibility and governance purposes.

What Azure Data Governance Features Does Purview Have?

Azure Purview helps organizations get the maximum value from their data that could be spread across multiple environments. Some of the features and benefits of the service include:

  • Discovery and management of data from trusted sources: Finds the right data that can be used for data science use cases, application development, or for data analytics
  • Track and protect sensitive data: Azure Purview extends the data classification and taxonomy available in Microsoft Information protection to all connected data sources, helping you identify, track, and protect sensitive data
  • Unified data governance: Unified data governance from multiple sources through automated discovery and metadata management. The usage of open-source Apache Atlas APIs simplifies the data integration process.
  • Deliver business insights: In addition to showing the lineage of the data, Azure Purview automatically creates the files required for visualizing data in PowerBI which makes deriving insights an out-of-the-box experience. 
  • SQL Server data scanning and classification: Easily discover and govern data in SQL server through automated scanning and data classification. For example, if you want to look for references of a specific user attribute in all your SQL databases, you could leverage the search option in Purview and drill down to specific hierarchy where the attribute exists (e.g. y table in x database).

How Much Does Purview Cost?

Azure Purview Data Map is able to publish metadata that can then be consumed using Atlas Apache Open APIs. The service is charged based on capacity unit, where one capacity unit=1 API/sec. The cost incurred per capacity unit is $0.342 per 1 capacity unit hour and up to four capacity units are free during the preview period until May 31, 2021. Scanning and classification of data would incur a cost of $0.63 per 1 vCore Hour for data sources other than Power BI online and on-premises SQL Server. During the preview period, other services such as metadata storage and data catalog are provided free of charge.

What Is Azure Data Share?

Azure Data Share provides an easier solution to share data where data providers can simply create a data share, add data sets and invite data consumers to access the data. In addition to providing a 360-degree view of your shared data sets, Azure Data Share also helps with analysis of upstream data issues or impact of partners and customers consuming the data.

Azure Purview can be integrated with Azure Data Share service to govern the incoming and outgoing data sets and tracks their lineage. Organizations often share data with their customers and partners through options like FTP, APIs, or emails, which are not scalable solutions as the data size increases.

How to Get Enhanced Azure Data Management with Cloud Volumes ONTAP

Another solution that can help with data management and governance across multi- and hybrid cloud deployments is Cloud Volumes ONTAP.

Cloud Volumes ONTAP is the enterprise-scale data management solution from NetApp that delivers storage efficiency, flexibility, high availability, and data protection features for your cloud storage over and above what is provided by the native cloud storage for your data assets:

  • Data mobility between environments is fast tracked through SnapMirror® technology, thereby providing the flexibility to move the data where your application needs it.
  • Data protection and DR is enabled through NetApp Snapshot™ technology that can take application-consistent point in time backup of data used by applications like exchange, SQL, Oracle, etc.
  • Data security is assured through features like encryption, access controls through share/export permissions, ransomware protection, and Vscan antivirus protection.
  • Cloud Manager provides a unified management interface for enabling governance of data, irrespective of where the applications are deployed. For example:
    • You can move your SQL DB volumes between multiple cloud platforms and access it from your platform of choice.
    • Cloud Manager helps you to maintain standard data synchronization policies across multicloud and hybrid architecture.
    • Cloud Manager helps you create integrated standard, repeatable processes through automation and orchestration.
  •  Cloud Volumes ONTAP can be integrated with NetApp Cloud Data Sense to provide visibility into your entire data estate and into what kind of data you store, in particular sensitive human profile data and possible privacy breaches. Cloud Data Sense also helps enforce restrictions for movement of sensitive data across environments, giving Cloud Volumes ONTAP a holistic solution to address your data management and governance requirements.

For multicloud deployments these benefits are essential. Cloud Volumes ONTAP is a more powerful and capable way to keep your data in your control no matter which cloud you want to use.

New call-to-action

Yifat Perry, Product Marketing Lead

Product Marketing Lead