What is Data Discovery?
Data discovery is an aspect of data management , which involves collecting and evaluating data from a variety of sources, and can help understand trends and patterns in the data. Data discovery is often performed in close relation to business intelligence (BI). It combines multiple isolated data sources to help an organization derive insights from the data and make business decisions.
The data discovery process includes connecting multiple data sources, cleaning and preparing data, sharing data across the organization, and performing analytics to gain insight into business processes. Data discovery is closely related to data classification, in which data is classified according to its usefulness, sensitivity or security requirements.
In this article, you will learn:
- Data Discovery Benefits
- Choosing a Data Discovery Platform
- Data Discovery Best Practices
- Data Discovery with NetApp Cloud Data Sense
Data Discovery Benefits
Enterprise data is stored in a variety of data sources and storage devices, and may be accessed by employees, partners, and customers. Identifying and classifying data to protect and gain insight from it is critical to any business.
Data detection gives organizations the ability to:
- Understand where corporate data is located, who can access it
- Know which data is transmitted, how, and over which channels
- Perform manual or automated data classification
- Identify, classify and track sensitive data
- Perform risk management and compliance assessment
- Visualize datasets and their uses
- Apply policies to control and protect data based on context-specific factors
- Reduce the risk of data migrations
Data discovery helps prevent loss or exposure of sensitive data, and enables the organization to implement appropriate security measures. But at the same time, it allows teams to look deeper into data, to reveal insights and share them with the rest of the organization.
What is a Data Discovery Platform
A data discovery platform is a complete set of tools for detecting patterns in your data, identifying outliers outside those patterns, and deriving business insights.
Data discovery platforms can:
- Enable visualization, integration, and migration of multiple data sources
- Automatically classify large volumes of sensitive structured and unstructured data
- Make more types of data available for analysis, making it possible to add more variables to models.
- Help companies identify and quickly analyze real-time data, in order to make timely business decisions.
- Promote security and governance. Data discovery platforms can classify data and identify its context in the organization, laying the foundations for governance and security policies.
Choosing a Data Discovery Platform
Most organizations hold massive quantities of data and a large variety of datasets. Because of this complexity, data discovery is almost always performed with the aid of automated tools. Here are some important factors to consider when selecting data discovery tools.
General Data Discovery Features
Check if and how the data discovery platform provides:
- Data analysis and visualizations (charts, maps, tables and other representations)
- Machine learning capabilities, including predictive analytics
- Evaluation of data quality and properties at the application, team and user level
- In-memory analytics, enabling faster query response times
- Data identification, classification, monitoring, tracking, and tracking
- Full text and metadata search
- Data preparation and tools to improve data quality
- Metadata management, which can be important to meet many compliance requirements
IT Management Factors
Beyond basic features, the first thing to consider in a data discovery platform is data governance. These platforms can monitor and manage data created by thousands of users, including task and rights management. But you must make sure they support automated, centralized governance.
Data security and privacy are key aspects of governance. Check what kind of user authentication and access control the platform provides. Data discovery platforms retrieve and store strategic business data, and must themselves be secure. They must also enable definition and enforcement of data protection policies for existing data sources.
Version control is also a critical function—the platform must make it possible to manage different versions of documents. This contributes to data integrity and can safeguard against accidental data loss.
Features for Regulatory Compliance
A key role of data discovery tools is to help organizations comply with government and industry regulations such as the GDPR, HIPAA, and CCPA. Many companies use data discovery to find sensitive business data, personally identifiable information (PII), or protected health information (PHI), which may be hidden in emails, documents, and other data silos.
Data Discovery Best Practices
Here are some best practices you can use to successfully implement data discovery in your organization:
- Develop a data discovery model - this is a standardized model that ensures data usage remains consistent across the organization. Data discovery tasks typically include data collection and analysis, as well as curation and data-driven actions. When developing this model, also consider a tool for generating reports.
- Identify pain points - each organization and each data program will have unique pain points. Identifying as many in advance as possible can help you fix issues before they escalate and ensure your data remains secure. Some common issues include massive amounts of data coming from multiple sources, complex architectures, etc. All of these should be addressed and monitored continuously.
- Use diverse data sources - if you can gather data from multiple sources, you can gain deeper insights. However, note that you need to properly set this up and to ensure data integrity and quality. You should also ensure you are collecting relevant data. When configured well, diverse sources can provide a wealth of actionable information.
- Tell stories with your data - to ensure all stakeholders can understand the data, it should be displayed in an accessible manner. Stories help people make sense of information, and find actionable insights. You can provide textual data stories, but consider providing visualizations. The majority of people can comprehend better using visual aids.
Data Discovery with NetApp Cloud Data Sense
NetApp® Cloud Data Sense is the data privacy and governance service for data stored in the cloud and on premises. Cloud Data Sense leverages cognitive computing to deliver always-on privacy and data governance controls across your hybrid data sources.
By discovering, mapping and identifying personal and sensitive information, Cloud Data Sense automates the most challenging data privacy and governance tasks introduced by modern day data regulations such as the GDPR and the CCPA, PCI and many others.