Relational databases are at the heart of many line of business applications and their optimal usage. That’s why their storage and performance needs require some special attention. Quickly retrieving and writing random requests to satisfy low-latency times is essential on the performance side.
When it comes to storage, on the other hand, making efficient use of the underlying storage layer is also essential from both a management and a cost perspective. Storing log files, flat files, and database backups are normal aspects of how RDBMS work; these storage demands all determine the quantity, location, the type and the costs of storage the database uses.
In this article we will introduce you to the Cloud Tiering service and how it can be applied to RDBMS use cases on NetApp AFF and SSD-backed FAS systems to help database and IT admins manage their database storage efficiently and cost effectively.
Cost and Performance in Relational Databases
Every type of database has its own architecture to execute its operations. Although relational databases such as MySQL, Oracle, or DB2 have each their own logical structure in which an incoming SQL transaction from a client application is carried out, they do share common features between them such as query optimization plans, data and query caches, and thread handling for less computer memory allocation. These are all techniques used by these engines to fine tune performance.
There are other features unique to each of these relational databases. MySQL has pluggable storage engines which are targeted for particular data types, such as transactional or non-transactional workloads, making it a versatile engine. DB2 has scan sharing, which allows a scan to access the buffered pages from another scan, which increases performance. Oracle has database instances, which control many of the processes mentioned above.
Some common denominator performance-improving aspects for relational database systems include:
- Query speeds
- Application design
- Caching of completed SQL queries
- Database configuration and data distribution
- Storage layer configuration
It’s at this last point, the storage layer configuration, that we want to mention a few considerations related to performance and cost that can influence your database workloads:
- Distribution of indexes, data and tablespaces among the available storage
Optimization of this distribution can improve data retrieval speed.
- Distribution of disk I/O
Data striping in RAID configurations and balancing loads between storage controllers optimizes throughput and prefetching of data. Prefetching mechanisms can read data faster if they’re spread along a RAID disk configuration. Plus RAID offers additional recovery options.
- Max limit throughput and number of IOPS supported by underlying storage
These are the physical limits of the underlying storage layer. Throughput is measured in MB/s or GB/s and the number of transactions is measured in IOPS—the amount of I/O operations per second.
Performance isn’t the only concern when managing database environments—costs are also an important factor. A large part of these costs is generated at the storage layer. Databases often have high percentages of data that becomes inactive which consumes top tier storage that could be put to better use.
That kind of inefficiency is also how backups impact storage usage. Since recovery is of the highest importance for databases, they need up-to-date and consistent backups. And because these backups are essentially cold data, they wind up consuming huge amounts of primary storage system space, or else they require entire secondary locations to house them.
Log files are another storage cost factor. Roll back or roll forward recovery methods both need logs to put the database back to a consistent state. Active and archived logs might consume even more space than the actual data in the database itself.
For relational databases running on NetApp AFF and SSD-backed FAS storage systems, there is now a solution for these major storage level concerns. The new Cloud Tiering service provides an option to reduce storage TCO without sacrificing performance by allowing cold data to be tiered to object storage in the cloud.
How Cloud Tiering Service Helps Databases
Let’s take a look into some use cases Cloud Tiering service has for database environments. Based on the tiering policy selected and a user-defined cooling period, Cloud Tiering places cold data into a low-cost object storage service in the cloud (known as the cloud tier) without sacrificing performance, resulting in reduced storage costs. Once a cold block is requested by the database manager, it is automatically tiered back to the performance tier on AFF or SSD-backed FAS for further retrievals. This is done seamlessly, without any changes required on the database configuration.
Here are some cases in which you can apply Cloud Tiering service to your database sets:
1. Log Archiving
In recovery scenarios, databases use existing backups and then replay archived transaction logs until the desired state. Many companies require extensive periods of retention for archived logs, especially when under regulatory requirements, even though most of that data is probably not going to be used after a few days. The capacity required may easily be many times the size of the database itself.
Cloud Tiering’s Auto tiering policy applied to a volume storing archived logs could save tons of space. For example, if there is a 60-day retention policy for backups, having the Auto policy enabled on these volumes will tier all the cold logs blocks after 31 days of inactivity, or another user-defined period. You still have the archived log data available to meet the 60-day retention policy for backups, but at the same time you save space on the performance storage tier. There’s no need to create an additional storage silo for archived logs, or any need to deal with other inefficient backup storage methods, such as tape.
Typically, datafiles are organized by date; as the datafiles ages, their contained data is less and less likely to be accessed. In this case, you can set Cloud Tiering’s Auto tiering policy to relocate the older datafiles to the cloud tier, according to either their retention requirements or to the datafile access patterns. If, for example, the requirement for a database volume is to remain in the performance tier for 60 days, then setting a minimum cooling period of 60 days would be the way to go.2. Full Datafile Tiering
Some databases use an Information Lifecycle Management (ILM) assistant to move datafiles between file systems based on criteria. If you have a set of datafiles on an active data volume which has a pattern of getting hit during 90-day periods for all of its data files, a good option with Cloud Tiering is to set a minimum cooling period of two days. Another choice would be to change the tiering policy of the archival volumes to All, which will relocate the entire volumes to the low-cost tier as soon as it is enabled.
In the first case above, an Auto policy would be the correct one to apply to the volume because that is the policy that applies to all the blocks from the active file system. However, in the second case where the ILM assistant moves the datafiles to a different volume designated as an archive, either Auto or All can be used.
3. Local Storage Snapshot-based Backups
In this case you could have a volume with data that needs to be in the performance tier with a defined NetApp ONTAP Snapshot policy. By applying Cloud Tiering’s Snapshot-only tiering policy to this volume, you can save space by just tiering the cold Snapshot blocks from the point-in-time backups. This way you keep Snapshot copies available in case you need to perform an instant local restore and have storage savings of around 10%-15% weekly, depending on the case.
Combined with native database tools, such as Oracle Information Lifecycle Management (ILM), total space savings can increase significantly. ILM would be in charge of moving the aged datafiles from the active volume to a volume designated as an archive, where the Cloud Tiering policy is set to Auto or All, while Cloud Tiering takes care of tiering the local point-in-time NetApp Snapshot copies, located in the active volume, and the aged datafiles, that where moved by ILM into the archive volume.
4. Tiering Data Protection Volumes and External Database Backups
Replicating volumes for disaster recovery or backup and archive purposes is another important use case for Cloud Tiering in a database environment. In this scenario you can either apply the Auto or the All tiering policy to a NetApp Data Protection volume (SnapMirror® or SnapVault® destination volume).
With the All tiering policy, all the incoming data blocks are instantly tiered to object storage, allowing over 90% capacity savings on the performance tier. In the situation of a disaster recovery or a restore, since the metadata remains in the main performance tier, all the data would be retrieved from the capacity tier through sequential I/O, which shouldn’t significantly increase the recovery time.
In a case where the replicated datasets are also intended to be used for cloning, setting the Auto tiering policy on data protection volumes is more appropriate. With the correct minimum cooling period you ensure the replicated data is available long enough in the performance tier in order to address cloning requests.
Another use case where the Auto tiering policy can be applied with just a few days set as the tiering minimum cooling period, is with volumes that contain flat-file backups, such as the kind created by Oracle Recovery Manager. Such backup data files need to be kept online for a period of time in case a rapid recovery needs to be performed. But once that time frame expires it is unlikely they will be accessed again, making them perfect targets for Cloud Tiering to move to the cloud-based capacity tier.
Databases are at the heart of most enterprise IT environments. They need attention on key aspects such as performance and costs.
We mentioned some key phases in the inherent workflow of a database in which NetApp Cloud Tiering service can be applied in order to save storage costs without an impact to performance.