What is Cold Data Storage?
Cold storage enables you to effectively retain inactive data. Typical use cases of cold storage include:
- Media files—such as images and video
- Replicated data—stored as backup, disaster recovery, and archival purposes
- Compliance data—information required to maintain compliance
Since cold data remains inactive most of the time, it is typically more cost-effective than high-performance stores that serve as primary repositories. Here are some criteria to use when designing or choosing options for cold storage:
- High capacity and data durability
- Relatively slow data retrieval and response time
- Media types like linear tape-open (LTO) tape and hard disk drives (HDDs)
The majority of cloud vendors offer options for cold storage. Popular services offering low data availability at-cost include Microsoft Azure Cool Blob Storage, Amazon Glacier, and Google Cloud Storage Nearline.
In this article, you will learn:
- Cold vs Hot Storage
- The Demand for Cold Data Storage
- How to Effectively Manage Cold Storage
- Use Data/Storage Automation
Cold vs Hot Storage
What is Hot Storage?
Any data that is frequently and immediately accessed should be kept in hot storage.
Typical use cases for hot storage include dynamically modified data, data queried by users, data required for usage in ongoing workflows.
Hot storage provides reliable immediate access. Data transferred from hot storage repositories is often called “data streams”.
The speed of a data transfer mainly depends on the number of routes the data needs to pass through in order to get to its destination.
For example, when data is processed closer to the source, it can move quickly. Data that moves through multiple different networks will arrive later, in this scenario.
What is Cold Storage?
Any data that is rarely used should be kept in cold storage.
Typical use cases for cold storage include archival data, data that can’t be used due to legal complications, and compliance data.
Cold storage provides a safe location for data that is not in frequent use, like old databases. This data is often called “dormant data”.
The retrieval of cold data typically takes longer than retrieval of data from hot storage. The speed retrieval varies, and can take between minutes and hours.
In some cases, cold storage data retrieval might necessitate you to manually search through physical hard drives and connect the disks to a computer.
The Demand for Cold Data Storage
The popularity of cold data storage has been increasing in recent years. This is due to reasons—an exponential increase in data, changes in storage consumption, established and emerging compliance regulations, and the low prices of cold storage.
Exponential Data Growth
According to IDC, annual data growth is expected to exceed 44 zettabytes by the end of 2020 and continue to increase. The majority of this data is either inactive or frequently accessed. More than 80% is attributed to unstructured and machine-generated data.
Primary Storage Consumption
Storage continuously consumes resources. In most cases, data remains in the first target for the duration of its lifecycle. While primary storage is often refreshed, cold data continues to consume expensive hot storage resources.
Insights from heat maps indicate that data remains hot during the first 72 hours after it is created. After thirty days, data cools down. After ninety days, data becomes cold. NAND flash SSD media assets are highly effective for active data, but a waste on cold data.
There is no need to use high performance for cold storage. This can lead to overhead and expensive storage billing. Yet, cold data consumes vast amounts of primary storage, estimated between 75% to 90%.
Recent years have seen an increase in awareness of private data usage. More and more regulation entities are created to standardize the usage of data and ensure the privacy of citizens.
The General Data Protection Regulation (GDPR), for example, requires companies to meet certain standards when handling the data of citizens and organizations from the European Union (EU).
While the GDPR is known globally, there are many more regulations enforces for the protection of data, including:
- New York State’s banking and cybersecurity regulations for financial institutions
- Health Insurance Portability and Accountability Act
- HITECH Act
- Basel I, II and III
To comply with some of these regulations, organizations are required to store specific types of data for tens and even hundreds of years.
Cold storage provides a far lower alternative than hot storage options. There are many types of cold data storage options, from storage systems and media options to various cloud services. While each cold storage option provides unique features, most remain cost-effective, enabling businesses to reduce storage costs.
How to Effectively Manage Cold Storage
There are many ways to effectively manage cold storage. Common practices include using inexpensive storage, leveraging cloud cold storage, evaluating cold data consumption annually, and implementing data storage automation.
Use Inexpensive but Dependable Cold Storage
Slow hard drives and tapes are often considered good storage media types for cold data.
However, you still need to regularly test all disks and tapes, to ensure the media works properly. You should also note the lifespan of drives and tapes and retire old resources before failure. Otherwise, you might put your data at risk. While it’s cold data, it still needs to be preserved.
Consider Cloud-Based Cold Storage
There is a wide range of cold storage options in the cloud, which might better suit your purposes than on-prem options. In some cases, cloud storage can reduce costs and provide you with controls for security and compliance. In other cases, you might be required to use on-premise resources to meet compliance requirements. Assess each cloud option carefully before moving to the cloud.
Perform Annual Evaluations of Cold Storage Data
Not all cold data should be stored permanently. To ensure you are putting your resources to good use, annually evaluate your data. End user personnel and legal departments can join the assessment to determine which data to retain and which should not. An annual evaluation can help ensure resources are allocated according to true needs.
Use Data/Storage Automation
The majority of storage vendors facilitate tiered data storage using artificial intelligence (AI). Typically, the consumer can define rules and triggers, which the AI software uses to distribute data in storage.
Here is how a tier strategy typically works:
- In-memory storage and solid-state drives are often used as the primary tier for storing frequently used data.
- A secondary tier is used for storing intermittently used data on less expensive drives.
- Another tier is dedicated to storing cold data on cost-effective slow disk drives and tapes.
Automoting data tiers ensures data is continuously optimized and distributed to ensure optimal results at the lowest possible costs.