Data discovery is an aspect of data management , which involves collecting and evaluating data from a variety of sources, and can help understand trends and patterns in the data. Data discovery is often performed in close relation to business intelligence (BI). It combines multiple isolated data sources to help an organization derive insights from the data and make business decisions.
The data discovery process includes connecting multiple data sources, cleaning and preparing data, sharing data across the organization, and performing analytics to gain insight into business processes. Data discovery is closely related to data classification, in which data is classified according to its usefulness, sensitivity or security requirements.
In this article, you will learn:
Enterprise data is stored in a variety of data sources and storage devices, and may be accessed by employees, partners, and customers. Identifying and classifying data to protect and gain insight from it is critical to any business.
Data detection gives organizations the ability to:
Data discovery helps prevent loss or exposure of sensitive data, and enables the organization to implement appropriate security measures. But at the same time, it allows teams to look deeper into data, to reveal insights and share them with the rest of the organization.
A data discovery platform is a complete set of tools for detecting patterns in your data, identifying outliers outside those patterns, and deriving business insights.
Data discovery platforms can:
Most organizations hold massive quantities of data and a large variety of datasets. Because of this complexity, data discovery is almost always performed with the aid of automated tools. Here are some important factors to consider when selecting data discovery tools.
Check if and how the data discovery platform provides:
Beyond basic features, the first thing to consider in a data discovery platform is data governance. These platforms can monitor and manage data created by thousands of users, including task and rights management. But you must make sure they support automated, centralized governance.
Data security and privacy are key aspects of governance. Check what kind of user authentication and access control the platform provides. Data discovery platforms retrieve and store strategic business data, and must themselves be secure. They must also enable definition and enforcement of data protection policies for existing data sources.
Version control is also a critical function—the platform must make it possible to manage different versions of documents. This contributes to data integrity and can safeguard against accidental data loss.
A key role of data discovery tools is to help organizations comply with government and industry regulations such as the GDPR, HIPAA, and CCPA. Many companies use data discovery to find sensitive business data, personally identifiable information (PII), or protected health information (PHI), which may be hidden in emails, documents, and other data silos.
Here are some best practices you can use to successfully implement data discovery in your organization:
NetApp® Cloud Data Sense is the data privacy and governance service for data stored in the cloud and on premises. Cloud Data Sense leverages cognitive computing to deliver always-on privacy and data governance controls across your hybrid data sources.
By discovering, mapping and identifying personal and sensitive information, Cloud Data Sense automates the most challenging data privacy and governance tasks introduced by modern day data regulations such as the GDPR and the CCPA, PCI and many others.