BlueXP Blog

Database Tiering with Amazon EBS Storage Pools

Written by Aviv Degani, Cloud Solutions Architecture Manager, NetApp | Jan 24, 2019 1:30:29 PM
Databases come in all shapes and sizes, each having its own individual requirements for I/O performance. For example, an OLTP (OnLine Transaction Processing) database system will typically require fast random access above anything else.
From a cloud storage perspective, each type of database workload would be best served by a different Amazon EBS disk type, however, managing performance and capacity in this way has an obvious impact on deployment complexity. Cloud Volumes ONTAP offers a solution by allowing you to optimize and scale individual storage pools based on any Amazon EBS disk type.

In this article, we will demonstrate how Cloud Volumes ONTAP simplifies the management of Amazon EBS storage and how this can be applied in database environments to cater for a multitude of concurrent workloads.

Database Storage Tiering


Applications that generate a high frequency of short database transactions exist in nearly all business domains, such as media, e-commerce, finance, air travel, and many more. A single database transaction performed by these applications typically involves, for example, updating information for a single customer at a time, which requires random access I/O operations. Ensuring that these operations are completed with minimal latency is essential for providing end-users with responsive applications and services.

Database I/O Requirements


When data changes need to be made, most database server implementations will first write the changes to the transaction log, which is known as write-ahead logging. This is faster than writing the updated data blocks directly to disk, as the same data can be captured in the transaction log using sequential I/O. The database server will periodically push dirty blocks back to disk, which allows it to group the random access I/O operations together, and therefore perform them more efficiently.

Another scenario that calls for strong sequential I/O performance is the setup of a reporting database, such as a data warehouse. In this case, most user database queries will require scanning large quantities of data, and so will be dependent on strong levels of I/O throughput. The overall amount of data that needs to be stored will also be much larger than for the transactional databases discussed previously, which will usually be archived regularly to keep them small and fast.

Data archiving presents yet another situation with its own storage requirements. As access to an archive may be very infrequent, it is not necessary to use high performance storage, though this may be useful at a future time when the data must be read. As archives are added to regularly over time, they also tend to consume a large amount of storage space. For these reasons, it is best to use a low-cost, capacity storage solution.

What Is a Storage Pool?


A storage pool is a body of storage consisting of one or more disks of the same type that is used to provision the data volumes accessed by end users. The type of disk used determines the I/O profile of the storage pool, and disks can be added to the pool at any time in order to improve performance.

Amazon EBS Storage Pools with Cloud Volumes ONTAP


As we have seen, storage optimization is vitally important for ensuring database performance and managing cloud storage costs. Cloud Volumes ONTAP using AWS storage helps to facilitate this through the deployment of independent storage pools known as aggregates. Each aggregate is created by using up to six disks of any Amazon EBS disk type. The more disks used in an aggregate, the higher its overall I/O performance will be. You can also add more disks to an existing aggregate after it has been created.
Creating Amazon EBS storage pools in Cloud Volumes ONTAP via the Cloud Manager.
Aggregates are used to create volumes, which represent an allocation of storage made available to client hosts and applications. Volumes created within the same aggregate will each share the total available performance of that storage pool, however, to drive I/O performance higher, you could instead divide your volumes across more than one aggregate.

Dividing volumes across multiple storage pools.
The aggregates dashboard gives users the ability to choose the storage pool for new volumes, which can be created by hovering over the target aggregate and selecting Create volume. Volumes sharing the same aggregate use thin provisioning to prevent the pre-allocation of storage before it needs to be used. This helps to ensure that storage space is not wasted, but instead distributed appropriately between all the volumes using an aggregate. Cloud Volumes ONTAP is an intelligent storage system that also provides cost-saving space efficiency features, such as data deduplication and data compression, that can reduce a volume’s footprint in the aggregate by up to 70%. It also allows you to ignore DB-level compression which helps save on compute resources your servers will use.

In the case of storage provisioning for the long term, aggregates can be configured to automatically tier the data they store to Amazon S3. This provides capacity storage at a very low cost with the advantage that Cloud Volumes ONTAP will automatically bring the data back into Amazon EBS storage whenever it is accessed. This provides a form of database automation around the process of keeping cold data on the most cost-effective storage, but still providing rapid access to it when necessary. This feature can also be used to help solve Amazon RDS size limitations.

Conclusion


Enterprise database deployments require simultaneous access to many forms of disk storage based on the type of workloads they will be processing. Cloud Volumes ONTAP helps to manage and scale data storage in these circumstances by allowing you to create multiple storage pools, with each pool using any of the different types of Amazon EBS storage. Volumes can then be created in the pool which best matches the requirements of each database. Volumes can even be moved between storage pools without requiring any client-side changes.