BlueXP Blog

SRE vs DevOps: Using Both with NetApp Cloud Volumes ONTAP

Written by Pavel Klushin, Cloud Evangelist | May 31, 2021 6:11:05 AM

Enterprise IT is accelerating the adoption of DevOps culture and with it, the more agile, API-driven, fully automated deployment of cloud-based IT infrastructure components. As a result, concepts like site reliability engineering (SRE) are also being leveraged to ensure better harmonization of development and operations activities increasingly each day.

This article takes a closer look at these concepts, and how NetApp’s Cloud Volumes ONTAP solution enables enterprise IT to achieve better DevOps and SRE results in practice.

Read on to find out more on SRE vs DevOps as we look at:

What Is DevOps and SRE?

DevOps practices combine software development (Dev) and IT operations (Ops) with the aim to shorten the systems development lifecycle and provide continuous delivery of software with high quality and integration with IT operations.

DevOps was born out of the need to bridge the gap between the teams leading the software development activities and the IT operations teams who are tasked with implementing that software, often on a complex, shared IT infrastructure. Since then, DevOps has grown to become somewhat of a culture in many organizations, aptly aided by the increased adoption of cloud technologies.

Here are some of the key points about DevOps culture.

  • The concept of DevOps is highly reliant upon the use of processes/methodologies (such as agile software development) and unifying tool sets in a continuous fashion to Plan -> Code -> Build -> Test -> Deploy -> Operate -> Monitor -> Plan application development and deployment.
  • DevOps allows development teams to become more agile by implementing continuous integration and continuous delivery (CI/CD) of applications, reducing the time to market for new product launches.
  • Automation and monitoring are used extensively throughout each of these stages to eliminate unnecessary delay of manual intervention and ensure consistent, repeatable, standard processes. Infrastructure as Code (IAC) also allows enterprises to maintain the consistency and reliability of the cloud infrastructure configuration by defining it through application code. Not only does this increase the repeatability of these operations but due to lack of human or manual involvement, site reliability is also increased by default.
  • Popular DevOps tools to achieve automation and agility, including:
    • GitLab / GitHub: Used for storing version-controlled code repositories
    • Docker / Kubernetes / OpenShift: Microservices-based distribution and deployment of apps in the form of containers and their orchestration and management
    • AWS / Azure / Google Cloud / Terraform: Infrastructure provisioning automation through the support for Infrastructure as Code
    • Puppet / Chef / Ansible: Configuration management
    • Jenkins: Continuous integration and continuous testing

Site Reliability Engineering vs DevOps

As a concept, site reliability engineering (SRE) predates DevOps. As the name suggests, SRE is the concept of ensuring the stability of the production environment. It was first implemented at Google, who designed the site reliability engineer position by combining the skillsets of system administrators and software engineers into one role. Site reliability engineers are tasked with ensuring the stability and reliability of the production environment (hosting Google’s public services such as Google searches, Google Ads, Gmail, YouTube, etc.).

Which Is Better: SRE or DevOps?

Comparing SRE vs. DevOps isn’t quite useful since it’s not an either-or situation. Both SRE and DevOps as concepts focus on achieving the same outcome: bridging the gap between development and operations teams to deliver services faster. While DevOps can be considered a broader concept that defines the processes and the tools used for this purpose, SRE focuses on how to achieve this through the specific use of site reliability engineers in an organization.

Can SRE and DevOps Coexist?

Organizations can combine both SRE and DevOps ideologies, and in many cases, they can foster better results for the overall organization. Site reliability engineers can act as the glue between Development and Operations teams ensuring that the core DevOps processes are adhered to by both sides. They also ensure to balance the creation of new application features with the site reliability and the deployment in production.

Optimizing Cloud DevOps and SRE with NetApp Cloud Volumes ONTAP

NetApp Cloud Volumes ONTAP is a cloud-native, DevOps-empowering, enterprise-class storage management solution that is available on all major public cloud platforms. Cloud Volumes ONTAP provides a highly performant, highly reliable, highly secure storage and data management solution that meets the needs of various enterprise applications in the cloud.

Cloud Volumes ONTAP capabilities such as the native API access enables organizations embrace DevOps culture on the cloud, while the high reliability, availability, and single-pane-of-glass monitoring enables site reliability engineers to meet their day-to-day goals of ensuring the reliability of their cloud infrastructure and the application stack running on it.

Let's take a closer look at some of these capabilities to see how Cloud Volumes ONTAP can help whether using SRE vs. DevOps or both.

Reliable DevOps Automation

Cloud Volumes ONTAP as well as the associated NetApp BlueXP Console platforms are cloud native solutions with a complete RESTful API that can be used for end-to-end automation of provisioning, managing, and monitoring tasks which are critical requirements for any organization to truly embrace cloud DevOps.

  • Provisioning and Management: BlueXP Console’s RESTful API allows users such as site reliability engineers to fully automate provisioning new Cloud Volumes ONTAP instances and persistent storage volumes. These can be via Infrastructure as Code (IAC) tools or through other orchestration engines that are in use within the customer’s environments.
  • Terraform Integration: With the BlueXP Console Terraform plugin, DevOps users have an easy way of defining these storage provisioning steps using IAC for repeatable deployments at scale, as a part of a fully automated DevOps deployment workflow.

Zero-Capacity Data Cloning

One of the fundamental requirements during all application development cycles is to be able to test new code being developed in various test environments. These test copies need to have 100% likeness of the final production environment they will be deployed to. This is a particularly important focus of site reliability engineering.

In many organizations, there are more than one such environments to reproduce, including a development environment, a QA environment (functional testing), integration test environments (end-to-end test), and user acceptance test environments (UAT). All of these environments need to regularly clone the production environment.

Creating and maintaining these multiple development and test environments as separate writable clones is a lengthy, complex and resource-intensive task if performed manually. Many organizations run hundreds of tests simultaneously and automatically, and each one needs a new, clean, writable data copy that will be disposed and refreshed after test completion. That can have a direct impact on the application release timelines.

Most organizations also struggle with automating such infrastructure cloning operations due to the duplicated data storage needed for these copies. That can be quite expensive, especially in the public cloud, where duplicate resources will significantly increase cloud storage bills.

NetApp Cloud Volumes ONTAP provides a way to avoid the data duplication related cost and time issues with the help of the API-based programmable access to FlexClone® data cloning technology. NetApp FlexClone allows DevOps engineers or site reliability engineers to instantly create writable clones of data stored on a Cloud Volumes ONTAP volume with zero additional space being consumed under the hood.

There are two key benefits to this technology:

  • Zero-capacity Costs: Each FlexClone volume can provide a copy of the same data for use as development or a testing environment with only the additional data written to each clone during the development or testing process consuming actual storage space in the cloud. That reduces the costs of dev/test environments to the minimum.
  • Instant Creation Process: FlexClone copies are based on NetApp Snapshot copies, which makes the cloning process instantaneous, cutting down development time in some cases from months to just weeks.
  • Fully automatable: The clone copy creation process is fully available through the RESTful API so that DevOps and site reliability engineers can build the data cloning process via infrastructure as code into an existing application release workflow as a part of a DevOps CI/CD pipeline.

Kubernetes deployments can also harness the same cloning technology in order to clone persistent volumes for stateful sets such as MySQL or Apache Cassandra database volumes.

Read more about how FlexClone works in these customer success stories.

Data Persistence

Contrary to the popular belief, data persistence is still a key requirement for containerized applications such as those run on Kubernetes. Many stateful workloads such as databases deployed via containers need persistent Kubernetes volumes that are backed by persistent storage volumes on the underlying storage system.

Cloud Volumes ONTAP enables DevOps and site reliability engineers to address this data persistence and other accessibility and data management needs easily for both classic and containerized applications within a single platform.

  • Portability and Durability: Cloud Volumes ONTAP data volumes can be accessed via both file (NFS or SMB) as well as block (iSCSI) by Kubernetes persistent volumes and given that these volumes are residing outside of the pod itself. This enables the data portability as well as the data durability across Kubernetes pods.
  • Efficient Backup: the built-in ONTAP Snapshotting capability enables the DevOps engineers to protect these data, such as databases and file shares and even back those up in order to meet enterprise compliance requirements. 
  • Cross-Region High Availability: These persistent data volumes can also be made highly available across multiple cloud regions with Cloud Volumes ONTAP high availability configuration. This ensures enterprise organizations meet their data availability and business continuity requirements with ease. This level of availability isn’t natively offered by the cloud providers.

Find out more about Managing Stateful Applications in Kubernetes and Cloud Volumes ONTAP.

Storage Efficiency

Many cloud customers struggle with keeping the cloud resource costs from spiralling out of control. As a result, DevOps engineers are now challenged with the need to understand various storage technologies available on the cloud and their consumption-based cost models.

Powered by ONTAP, Cloud Volumes ONTAP is designed with a number of storage efficiencies that reduce the total cost of ownership for DevOps workloads in the cloud, including thin provisioning, data compression and deduplication, and data tiering, which automatically moves infrequently used data from block storage to lower-cost object storage and back as needed.

These storage efficiencies translate into up to 70% cloud storage cost savings for Cloud Volumes ONTAP customers. Read more on Cloud Volumes ONTAP’s storage efficiencies here.

Summary

Both DevOps and site reliability engineering enable organizations to reduce IT time for application development and deployment lifecycle, and improved quality and reliability. Whether you’re using SRE vs. DevOps or some combination of them both, Cloud Volumes ONTAP can help.

Cloud Volumes ONTAP empowers DevOps culture while also increasing the site reliability of their applications and the underlying cloud infrastructure through the built-in data availability, data security and data efficiency capabilities.