April 5, 2021
Topics: Cloud Data Sense Advanced6 minute read
Digital transformation, analytics, and artificial intelligence are top of mind in organizations across the world. We live in a data-driven economy where information is valuable and a key enabler to business growth. With data being generated and stored at an incredibly fast pace, it’s more important than ever for organizations to have strong data governance competences.
How are you governing data in your organization? In this post we’ll take a closer look at data governance and how it can be essential to protecting the data you control.
What Is Data Governance?
Data governance is a set of policies and practices that an organization can establish to support their data management. It plays an essential role in fulfilling the organization strategy and growth by providing visibility to what data is available and how it can be used. By applying data governance principles, a company can gain understanding over its data inventory, define necessary guard rails, and enable new business streams.
Why Is Data Governance Important?
In a world where organizations are exponentially increasing data collection, and using a multitude of cloud services to host that information, data governance is crucial to effective operations.
Taking a critical look into what data the organization has and how it is stored avoids incurring in unnecessary costs associated with the processing and storing of data without any business value or impact.
Some organizations opt to create specific groups within their organization to oversee data governance. A data governance group, involving both business and engineering experts, creates a positive organization-wide impact by enabling new data-driven business models, decision making and value creation.
Engineering aspects such as the data architecture, access controls and tooling are naturally important in data governance, but it’s crucial to look at them holistically. Governing data is much more than a simple technical problem—it requires an entire shift in the organization’s culture and its practices associated with data.
Despite the inherent complexity of the data governance topic, there are a few key points that we will explore in this article that can make it simpler to understand and implement.
Compliance? Privacy? A People-Centric Approach
The first instinct organizations have, especially big enterprises, is to look at data governance policy purely as means to fulfill compliance and privacy requirements. While this approach is not entirely wrong—data governance principles do make compliance standards easier to audit and help you find ways to comply with privacy regulations in multiple jurisdictions—that shouldn’t be your primary focus.
A more effective approach to data governance is to focus on people. Having worked with multiple organizations solving data management challenges, it became clear to me that both the people within your organization and your customers should take the spotlight.
In an organization, data governance solutions shouldn’t be created and managed by a single person. A written document or slide deck is only effective when co-created and understood by all stakeholders. Moreover, with news regarding data breaches and misuse of personal information becoming everyday news, your customers, especially when talking about end consumers, expect that your organization has good data governance practices and takes responsibility in both protecting their information and respecting their privacy.
Data Discovery and Cataloging
You can only protect data that you are aware of. One of the most crucial goals for a data governance group to accomplish is to find and catalog the data within the organization.
Organizations have complex and often cluttered data environments. The process of continuously discovering new data sources and maintaining a living data catalog is an incredible challenge to solve.
A great way to start is by focusing on a few concrete business cases and identifying the relevant sources of data that support them. Another way is during a cloud transformation and enablement phase. Cloud migrations are ideal moments to gain data visibility. If that is the case in your organization, make sure your data governance work can seize that opportunity.
In essence, the discovery and cataloging of data sources boils down to a few key questions that you need to cover: what is this data, where is it located, who can access it and how long are we keeping it.
Data Lake or Swamp? Retention and Business Uses
The lack of technology governance solutions manifests in poor data capabilities and disengaged employees. Often infrastructure data lakes resemble a swamp where data is being continuously pushed and stored without any consideration and positive business impact.
While addressing the retention of data, it’s important for organizations to ask themselves “what is this data good for?” There needs to be a constant assessment of both the present and future importance of the data source. Today a dataset might be used to support existing business cases or comply with regulatory requirements, but there might be a future potential for growth initiatives and new revenue streams that should be considered.
Storing and retaining data is a balancing act that needs active engagement and collaboration between business and engineering teams to find the most optimal and cost-efficient way.
Storage Location and Tiering
From a technical perspective, there are different ways to handle big data and the management of storage volumes.
Modern solutions, especially cloud-based, make it easy to automatically transition data across different geographical locations and tiers. This enables us to get the most out of our storage investment by defining rules for data to flow between a hot tier that is more performant, highly available, and costly to a cold tier that is slower to access and very inexpensive according to the business needs. This can make it ideal to persist data for archival or future business needs.
The location of data plays a key role in data governance. With cloud storage becoming ubiquitous, there is no shortage of technical opportunities to place data in virtually any jurisdiction you need. Defined business expectations and compliance requirements are key to success, but when your organization data is spread across multiple locations and mediums, you need to have the proper tools to ensure those requirements are actually met. In addition, when addressing privacy constraints, make sure to leverage tools that can help you automatically identify sensitive information across your datasets. For compliance purposes, some data will need to remain within specific geographical zones, and that will present an additional governance challenge.
Analytics and AI Require Business Understanding
To non-technical experts, the most visible part of a data strategy and governance is usually analytics and artificial intelligence, and of course the positive impact they create across the organization and business.
This is of course misleading, since from a technical perspective analytics is just part of the equation and often the tip of the iceberg. Underneath, data engineering plays a major role in making sure data is ingested, stored, and prepared with high quality. A good data infrastructure is crucial in providing a reliable and scalable foundation for analytics and artificial intelligence.
Both the data science and engineering areas require a very active engagement and close involvement from business owners, and vice-versa. Big data experts usually lack specific industry knowledge and business understanding. Likewise, business owners need experts to guide them throughout the different technical possibilities and constraints.
Data governance is a mix between culture, practices and engineering. There are several technical tools that can be used to establish a data governance solution. Selecting the most appropriate ones, while understanding that tools are just part of the equation, is important because they support and lift the heavy weight of laboursome tasks.
The capability to govern data in complex data environments is precious. You can’t really protect data that you don’t have the proper governance tools to find that data and monitor it. Cloud Data Sense by NetApp is a great tool for the job, with multiple features like built-in support for MySQL, MongoDB and PostgreSQL, that gives you control over all the cluttered data spread across your systems using artificial intelligence.
Sign up for Cloud Data Sense trial, free for up to 1 TB of data.