Data classification is the process of organizing and labeling data into categories, enabling appropriate protection measures, and efficient search, retrieval and use of each data category. Data classification is an important part of data management at large organizations. It is particularly important for risk management, compliance, and data security. It can also reduce an organization’s storage and backup costs.
Data classification tasks include classifying information according to its sensitivity, labeling data for easy retrieval, and eliminating redundant data. The classification process may sound technical, but it is a topic that any organization’s leaders need to understand and participate in.
In this article, you will learn:
A primary goal of classification is to identify properties of organizational data including:
By evaluating these and other properties, a data classification process can divide organizational data into several classification levels. Here is a commonly used four-level classification system:
Data classification involves applying tags and labels to data, which specify the data type, classification level (indicating how confidential is the data, see the previous section), integrity, and usefulness.
The following are three ways to perform data classification:
Related content: read our guide to data discovery
Many compliance standards and regulations have requirements for data classification. Below we list some of the common standards that touch on classification:
Compliance Standard |
Applies To |
Data Classification Requirements |
SOC 2 |
Service organizations |
Requires that service organizations include confidentiality data categories in their audits, and must demonstrate that sensitive information is identified and maintained to meet the objectives of related entities (most commonly, the service provider’s customers). |
HIPAA |
USA healthcare providers and their business partners |
Considers private health information (PHI) high risk data. Requires covered entities (health organizations) and business partners to establish mandatory procedures for classifying PHI, and controlling its collection, use, storage, and transmission. |
PCI DSS |
Organizations storing or processing credit cardholder data |
Requirement 9.6.1 states, with respect to credit cardholder data, that organizations must "classify data so that the confidentiality of the data can be verified". |
GDPR |
Organizations storing or processing personally identifiable information (PII) of EU citizens |
Specifies that any organization processing personally identifiable information (PII) pertaining to European Union citizens must perform classification of the data as public, proprietary, or confidential. The GDPR categorizes certain data, including race, sexual orientation, political views, and health data, as "special" data that requires additional protection. |
When creating your own data classification standards and process, consider the following six steps:
NetApp® Cloud Data Sense is the data privacy and governance service for data stored in the cloud and on premises. Cloud Data Sense leverages cognitive computing to deliver always-on privacy controls across your hybrid data sources.
By discovering, mapping and identifying personal and sensitive information, Cloud Data Sense automates the most challenging data privacy and governance tasks introduced by GDPR, CCPA, and other data privacy regulations.
Learn more about NetApp Cloud Data Sense