The -as-a-Service (*aaS) business model has created an environment where “good enough” is never enough; customers clamor for additional capabilities and added value. SaaS companies, such as Acme Corp., are listening. And they’ve responded in the knowledge that customer loyalty must not only be earned, but also maintained. Managing their databases in the cloud is one key element.
The backbone of any SaaS or subscription-based company is recurring revenue. Unlike the product sales revenue model, wherein a customer is sold a software and support package once and the deal is done, recurring revenue is intrinsic to *aaS. As such, customer satisfaction is critical, and downtime is intolerable. Through added features and capabilities, companies that employ the SaaS model generate revenue through net new customers and existing customer expansion. Reducing downtime through architectural hardening similarly prevents customer loss, but it does so at the engineering cost of a more resilient backend that customers will never see. In short, SaaS companies must strike a balance between feature development and architectural hardening.
Acme Corp.’s business philosophy is simple: features produce revenue. The decision to deploy in Amazon Web Service (AWS) has freed up staffing resources to work on SaaS differentiation. Those engineers focus on rapidly developing features, with the goal of granting customers early access. Early customer access fuels innovation. Unpopular new features are quickly identified and abandoned. Hot features rise to the forefront. Since features, in and of themselves, are of limited use inside a vacuum, each customer’s data, which is stored within the Oracle Database, must be safely accessible from within the preview environment.
Business Problem One thus comes down to Technology Problem One: how do you generate a rapidly refreshable performant clone of the production database to both enable feature preview while protecting production data?
Acme Corp., a SaaS provider for enterprise workflow and collaboration, has built its cloud-based company around the concept that customers, once gained, must not be lost due to architectural design decisions. That can be reduced to two underlying beliefs: services must be available when businesses need them and data must be protected from loss.
With both a domestic and international customer base, neither scheduled nor unplanned outages are tolerable, and data loss is unthinkable. At the core of every Acme deployment is an Oracle Database, which must always remain online and available in order for the service to remain up and running.
Technology Problem Two is, therefore, how to achieve high database availability (and data durability) in the cloud—in this instance, in AWS.
Technology is a business enabler. Companies, such as Acme Corp., rely on proven solutions delivered by NetApp and Oracle to meet their essential business needs in the cloud.
NetApp Cloud Volumes Service for AWS is the one-stop solution for businesses running Oracle databases in the cloud. Cloud Volumes Service comes equipped with an SLA for filesystem performance, data durability, and service availability. For Acme Corp., fast copies are the primary incentive for using Cloud Volumes Service because they deliver point-in-time, near instant mountable replicas of the production database in the cloud, and it’s the power behind rapid feature delivery.
Oracle RAC is a shared-everything database configuration option that provides high availability (HA) with the added advantage of linear scaling. In an RAC configuration, the database server is no longer a single point of failure. Acme Corp. runs RAC in two node clusters, protecting their operation from both unplanned and planned outages, as well as enabling many types of rolling software upgrades.
Oracle Data Guard is a high availability and disaster recovery solution that provides very fast automatic failover (referred to as fast-start failover) in case of database failures, node failures, corruption, and media failures. Data Guard performs log shipping both in a synchronous (zero data loss) and an asynchronous manner, providing multiple copies (primary and one to many standbys) of Acme Corp’s most valuable resource: its customer data. Because zero downtime is the goal, Data Guard is the plan B.
In general, Oracle RAC configurations in the cloud are not supported on third party clouds, such as AWS. That said, Acme Corp maintains that database high availability in the cloud is critical to the success of the business. The service must not go down, meaning that the database must remain online, making RAC an absolute requirement. To that end, Acme Corp looks to VMware Cloud on AWS for availability.
A lack of multicast, in tandem with the complexities of shared storage environment, are among the primary reasons that Oracle does not support of RAC in the cloud.
Network multicasting is used by Oracle RAC to maintain communication between constituent members; support comes through the VMware Cloud virtual network layer (NSX).
Regarding shared storage, the Oracle support policy states that:
“All storage products must be supported by the server (host) and storage vendors.” Specifically, this requires the server and storage vendors who provide and support the hardware to directly support all components of the storage stack.
VMware vSAN is a supported RAC storage solution because it underpins datastores that are accessible by all nodes in the ESXi cluster. Managed by Oracle ASM, the Virtual Machine Disks (VMDKs) are attached to and accessed by all members of a given RAC cluster. Every host provisioned to the VMware
Cloud SDDC (Software Defined Data Center) comes equipped with 10.5TB (10,731GB) of raw capacity. Use this capacity first because it’s free.
But there are caveats:
1) Usable vs. raw capacity: The 10.5TB of raw capacity works out to just over 5.5TB of usable capacity, and that is with RAID5 Erasure Coding, which has the lowest capacity overhead of the available options.
2) Premature scale out: If more space is required than provided by the nodes in the ESXi cluster, at least one additional node is automatically allocated, which leads to wasted resources and overspend.
This setup is the case for Acme Corp., for which the storage capacity needs of the various RAC databases far exceed the storage capacity of the ESXi cluster. By coupling storage with compute, Acme Corp. is left with either too little storage or too much compute.
Cloud Volumes Service
Although Oracle does not require certification of specific NFSv3 file servers, the vendor themselves must be supported. NetApp is not just another run-of-the-mill storage vendor listed on the RAC technology matrix for Linux platforms; it is one with more than a decade of experience running the most demanding Oracle workloads thereon.
Then there are the value-added features:
Per gigabyte, you pay $0.89 for storage with vSAN, in contrast with $0.10, $0.20, $0.30 with Cloud Volumes Service.
Storage cost-optimization: At $7.00 per hour per host, vSAN capacity costs $0.89 per usable gigabyte ($5,110 / 5,731GB = $0.89) per month.
By comparison, with Cloud Volumes Service, users pay as little as $0.10, $0.20, $0.30 per gigabyte and scale up or down as application needed.
From 1TB to 100TB, users can provision capacity dynamically with Cloud Volumes Service, avoiding premature compute scale out and keeping control over cloud spend.
Fast Copy: Using vSAN, RMAN duplicates take up to 15 hours. Many orders of magnitude faster than an RMAN duplicate from backup, CVS read/writable volume copies are accessible seconds after instantiation. Fast copies may be created on demand, opening the doors to innovation for companies like Acme Corp., which rely upon clones for feature development, as well as customer preview. With Fast Copy, clones may be spun up or torn down as quickly as desired.
The above represents a two-region database configuration in the cloud with prod and DR operating out of alternate coasts. Region A and region B both contain production and disaster recovery environments. The following workflow is equally true for A-B as it is for B-A:
1. Oracle Application Servers placed across availability zones natively communicate with VMware Cloud on AWS by way of VMware Cloud ENI route table entries.
2. VMware Compute Edge Gateway maintains route table for all NSX Logical Networks by way of Cross Account Identity and Access Management Role.
3. NetApp Cloud Volumes Service provides virtual interfaces for connectivity to Cloud Volumes Service. These virtual interfaces must be accepted at SDDC before they can be used.
4. Two node production RAC databases store datafiles, logfiles, and configuration files within shared vmdk files placed within datastores atop vSAN storage devices. Anti-affinity rules prevent the RAC cluster members from running, co-existing, on the same physical hosts.
5. Oracle Data Guard maintains four standby RAC databases – two local databases and two placed in a remote region for the purpose of DR the Amazon Transit Gateway.
6. On a weekly basis, two clones are created from the Cloud Volumes Service-backed standby database. One clone is for the preview environment, while the second clone serves the needs of QA and engineering (bug fixes and performance tuning).
Virtual machines running inside a traditional SDDC are protected against host failure by vSphere HA, and virtual machines deployed in a stretched cluster are protected against availability zone failure. In a future reference architecture, expect to see designs based on this still-higher level of availability, which is made possible by VMware Cloud on AWS and Cloud Volumes Service.
Since recurring revenue is the pursuit of all *aaS companies, catching and keeping customers must be the goal of every executive. To that end, both differentiation through value-added features and a predictable always-on service are a must.
With Cloud Volumes Service for AWS, coupled with VMware Cloud and Oracle RAC, you can be a cloud hero at your organization and meet the needs of the business. With Oracle RAC—made possible by VMware Cloud on AWS—achieve an always on database in the cloud and an always on service. Oracle Data Guard is plan B in the unlikely event of failure.
Fast Copy on Cloud Volumes Service enables rapid innovation, allowing users to get the best features to market quickly. Why wait 15 hours for database clones when you can wait seconds? The all-too- frequent coupling of storage and compute leads to unnecessary expenditures, since companies often acquire compute power for the sake of storage. NetApp undoes the damage. By decoupling storage and compute with Cloud Volumes Service, users free up capital to focus on product differentiation.
Request a demo from a NetApp cloud specialist today.