The modern enterprise accumulates vast amounts of data—sourced from server logs, website tracking software, customer call records, social media, video surveillance systems, and networks of connected devices and sensors. But what happens to all that information?
According to research by the operational intelligence platform Splunk, much of this information is dark data, which lurks in the shadows of enterprise systems and never ends up being used.
The study of more than 1,300 business leaders across the world revealed that an average of 55% of all information captured by companies was dark data. However, many of those surveyed believed they were collecting far more—with a third saying that at least 75% of all their data was dark.
In this post, we explore what dark data is, why companies aren't using it and the implications it could have for your own organization. Plus, we’ll show you how NetApp Cloud Data Sense can help you gain insight into what might be stored in the dark data you have in your storage systems.
Read on as we cover:
- What Is Dark Data?
- The Biggest Dark Data Challenges
- Dark Data Discovery: Better Data, Better Insights
What Is Dark Data?
Dark data is all the unknown or unused information that an organization captures as part of its day-to-day operational activities. It's an untapped resource that offers huge potential for analytics and driving business revenue. But, without visibility and control, it can become a significant burden.
Dark Data Examples
Dark data can come in any form, but in most companies it’s likely to be from a number of different sources, such as:
- Employee records
- Internal processes
- Video and sound recording data
- Log files
- Geolocation information
Organizations find it difficult to leverage this data for a number of different reasons. For example, an overwhelming majority (85%) of respondents to the Splunk survey cited a lack of suitable tools to access and analyze dark data. Many of the respondents (66%) also believe much of the information they collect is unusable because of data gaps, such as missing dates, location details and operational metrics.
Another issue is the sheer volume and complexity of data, a lot of which is unstructured and distributed across a wide range of data collection and storage systems.
The Biggest Dark Data Challenges
Virtually every organization should see its dark data as a business opportunity. However, it's also important to be aware of the challenges it presents, which we’ll take a look at below.
The lower cost of storage has been one of the key drivers behind the growth of dark data. But data storage still represents a significant cost to any large-scale business. All the more so in the pay-as-you-go model of the public cloud, where monthly bills can gradually creep up unnoticed before eventually spiraling out of control.
To help keep storage costs down, you should take measures to root out duplicate, stale and other redundant data—through use of deduplication technology, retention security policies and tools that give you insights into data usage patterns.
Dark data may contain personal information, which may subject it to applicable privacy laws. You'll need a way to discover and classify all personal data in your possession in order to give it appropriate treatment in line with data protection requirements.
If this is the case, you should take measures to protect it and update your privacy program accordingly. Alternatively, if you're not actually using it then you should consider promptly removing it.
Data governance will be pivotal to ensuring effective use and control of your dark data.
Policies and procedures that govern data management will help you eliminate inconsistencies from your dark data and improve data quality. They will help reduce data management costs.
Data governance, data compliance, and cost-efficient storage wouldn't be possible without clear visibility into your data. In an enterprise IT environment, dark data will come in a variety of different formats, distributed across a range of storage services on both cloud-based and on-premises infrastructure.
To overcome this complexity, you'll need tools that can merge information from all your different data sources to provide consolidated insights from a single pane of glass.
Dark Data Discovery: Better Data and Better Insights
As large-scale enterprises generate more and more data, the demand for solutions to address the dark data problem will continue to grow.
According to findings in the Splunk report, companies see staff training and new types of software as the most promising solutions to these challenges, with the use of artificial intelligence (AI) following close behind.
Data classification tools with AI capabilities therefore offer much potential, especially those that are easy to use by anyone at any level of technical expertise. NetApp is doing just that with it’s new data governance utility, NetApp Cloud Data Sense.
Data Sense is an AI-driven data mapping and governance technology that can help you better understand your dark data. By scanning your storage repositories in the cloud or on-prem, Data Sense can discover unknown data based on pre-identified business areas that would have personal dark data/dark data to help you identify data and build a baseline data map.
Data Sense dashboard view
Data Sense helps shed light on your dark data. The Data Sense dashboard gives you full details on all of the data in your storage volumes across your entire data estate, categorizing your data by type and by its sensitivity risk. Plus, you can get key data governance insights into savings opportunities, open permissions, and more with the governance tab:
The governance tab
Data Sense reports provide context-specific personal information, which can help you address your organization’s privacy obligations. And by understanding they will also help improve the quality of your data, helping you make the best choices for your business.