Do You Know What’s In Your Metadata?

May 13, 2022

Topics: Cloud Data Sense Advanced 6 minute read Data Governance

Any data that’s in use today contains more information than is immediately visible. This auxiliary data is known as metadata. But what is metadata?

Metadata represents information that goes beyond and exists outside of a data point itself. Metadata is data that describes and gives context to that data point.

But while this metadata is extremely useful, it can pose an additional challenge when it comes to data governance. In this post we’ll read more about metadata, how it is used, and what storage admins can do to better govern it.

Read on as we cover:

What Is Metadata?
Why Is Metadata Important to Data Governance?
How Cloud Data Sense Supports Metadata Governance
Conclusion

What Is Metadata?

Metadata is supplementary data about a piece of data. Metadata can be any information that is applied to a datapoint to describe its contents.

This includes a wide range of details, such as the hardware used to create the data, the user, the date of creation, the date of access, the number of changes that have been made to the data, its current state, its permissions, and more. The possibilities can also range depending on the type of data being considered. For example, metadata for video or image files can potentially include entire descriptions of the files themselves.

Metadata has a number of different uses in IT. For example, it is at the core of how object storage works. In object storage, metadata is used to find and retrieve files on demand. All file storage solutions leverage metadata in one way or another to manage the file lifecycle. The file permissions table is just one example of metadata used by a typical file system or file share.

But what kind of data can be found in metadata? Let’s use the example of a digital book in an online library to describe some different types of metadata that can be applied to a data asset:

Descriptive: Basic attributes such as title, author, genre, number of pages
Structural: How the book is organized internally such as sections, chapters, table of contents
Preservation: The book’s intended digital lifecycle such as last date of public availability, storage media, archiving guidelines (when, where)
Provenance: The book’s history such as publication date, by which publisher, in which language
Use: How many times the book has been accessed or checked out, when, by whom (user) or by what (external system)
Administrative: Attributes that define how the book is to be managed such as the number of digital copies available, privilege levels per user type, copyright rules to be upheld, and so on.

Any user looking into the history of a specific book would be able to refer to the metadata and get detailed information that could then be acted upon.

Why Is Metadata Important to Data Governance?

Metadata plays a critical role in data governance, at all levels:

Applying data protection policies: Metadata makes it easy for security posture management frameworks to understand which datasets require which security guardrails in order to meet internal and external regulatory requirements.
Managing the data lifecycle: Metadata tags provide invaluable information on data ownership and usage over time and can be used to automatically trigger data lifecycle processes related to change management, deletion, backup, retention, and more.
Enhanced big data analytics: High-quality, trustworthy, and relevant data pipelines are prerequisites for building robust ML and AI models that deliver reliable results. Automated data discovery and metadata tagging optimize the formidable task of making sure that all of the right data is accessible to data analytics platforms.
Reducing costs: Metadata can be used to reduce data storage footprints and costs by, for example, giving users a way to identify redundant or stale datasets that can be deleted or moved to a less expensive storage tier.

While there are huge benefits that data stewards can gain from accurate metadata, this information does add more complexity to the data estate. Considering the wide range of information that can be included in metadata, it’s clear that it poses significant challenges when it comes to data governance, especially at the enterprise level.

For instance, metadata can potentially include information that could be of privacy concerns, such as PII or other sensitive data. Data created within specific regions may also have territorial restrictions on its use, and that information is located in the metadata.

Data stewards need to be able to identify such information and act upon it accordingly. But doing so manually is a task that is so tedious and time consuming that it’s simply not effective. The answer requires the use of additional metadata governance tools. Fortunately, there is a way to avoid this level of manual oversight: Cloud Data Sense.

How Cloud Data Sense Supports Metadata Governance

NetApp Cloud Data Sense works across on-prem and cloud based environments to optimize, monitor, secure, and research the organization’s entire data estate. Driven by powerful AI algorithms, Data Sense automatically categorizes and labels the organization’s data across all storage repositories, giving you much more control over metadata governance.

Data Sense uses the metadata to centrally manage and visualize:

Data usage metrics: Including data changes, creation date, last used, last accessed. This is all crucial information in finding stale data.
File size: Locating this data helps identify where storage resources are being consumed most, giving you insight into how to optimize them.
File type: This can be crucial in identifying non-business data that may not be high priority.
Sensitive data: Metadata that could be subject to privacy concerns
File permissions: Metadata that indicates who has access to files
Data location and accessibility: Gives insight into security (role-based access control) and regulatory requirements (quick responses to data subject access requests)
Data lifecycle policies: Including deduplication, backup, deletion, retention, and archiving

A typical Data Sense data investigation workflow would start with running a scan to automatically sort and filter the organization’s file stores based on a wide variety of data characteristics and parameters, including the information stored in the file metadata.

For storage admins and other data owners looking to gain insight into their files’ metadata, all of the categories of that Data Sense sorts will also consider the file metadata. This makes it easy to pinpoint any potential privacy concerns and relevant geolocation information. These searches are performed by Data Sense’s context-aware AI, which parses the metadata to determine how it is being used, leading to the most relevant search results possible.

For admins looking to leverage metadata for better governance, one, many, or all of the search results can then be labeled with Data Sense’s internal free-form tags or by Azure Information Protection (AIP) labels.The file tagging process can be automatic, based on Data Sense policies, or manual. The assigned tags are then stored in the files’ metadata, without affecting the file itself.

Conclusion

Metadata provides more insight into data, but getting more insight into your metadata requires an additional solution. NetApp provides that level of metadata governance with Cloud Data Sense.

Add Data Sense to your metadata governance framework and sign up now to try Cloud Data Sense free for up to 1 TB of data.