Personalizing Healthcare Through Faster Genome-Sequence Analysis

Written by Jeff Whitaker, Cloud Data Services | Sep 13, 2018 8:21:23 AM

With the advent of human-genome sequencing, healthcare providers are looking for more ways to personalize the care they administer to patients. If providers can get patient sequence results quickly, analyze the data, and then provide a custom care plan—or even custom medication—that’s targeted to each individual, it increases the chances of successful treatment.

Until recently, analyzing massive amounts of data and getting the results rapidly had been an extreme challenge for companies. I’m talking about running analyses on-demand and expecting to have a back end with the horsepower to support an ever-expanding compute requirement. That effort requires massive processing power at your fingertips, when you need it.

With the advent of large-scale cloud providers, their scale of compute power has opened the door for these big data analyses. But for genome-sequence analysis, there’s an added challenge: The analysis engines, or the algorithms themselves, are inherently proprietary. The sequence analysis company’s intellectual property is held in these algorithms, and these algorithms run on-demand requiring massive compute for each run – effectively needing the scale of the cloud. Now it’s about feeding the algorithms the massive amounts of file data that come from the genome sequencers.

As with most workloads, we want them to run faster, right? Well, it’s no different with sequence analysis, but because of the nature of the workloads, data is a bottleneck in achieving the necessary outcomes. In some extreme cases, the algorithms are too complex, and the data pipe is too slow to even get the runs to complete.

Often to get that data to the cloud, you have to build a file server to become the cloud target for which the algorithm operates. So, now you need to build an environment to run your workload and you have to build a file server—likely, multiple file servers—to support the compute environments. Is all that effort what you signed up for?

Enter NetApp. We are a company with a long history of expertise in managing file datasets—large-scale file datasets. Recently, we rolled out a new cloud-native file service offering that’s called NetApp® Cloud Volumes, which helps you resolve your file needs when you run your applications and your workloads in the cloud. With this service, you get multiprotocol file access, including both SMB and NFS access to your datasets. And we designed this service from the ground up to deliver the low latency and the high throughput performance that your demanding applications, or algorithms, require.

Let me support these statements with a real-world example. One of our customers, a leading genome-sequence analysis organization, has started moving datasets from its current hand-built environment of multiple file servers that run on top of cloud compute and storage to NetApp Cloud Volumes. This customer’s analysis capabilities have grown from 48,000 parallel sequences to over an amazing 1 million. That’s more than 20 times the performance! With its new parallel sequence capabilities, this customer also has been able to run a certain highly complex algorithm in the cloud that previously was impossible.

Choose Your Performance Level

With the multiple performance levels that are available with NetApp Cloud Volumes, you can adjust dynamically based on your needs. You can even go up to a very high throughput, very low latency extreme tier that gives you performance that’s on par with dedicated on-premises file systems. So, for your demanding analysis algorithms, you can switch to a higher performance level as needed, then go back to a different performance level.

Quickly and Securely Synchronize Data

When it comes to using cloud compute for your runs, you need to get your data from the sequencers to a cloud storage location. Cloud Volumes is tightly integrated with NetApp Cloud Sync, a cloud-native service that syncs data between on-premises and cloud environments, or even between clouds. With Cloud Sync, you can rapidly and securely migrate data (a one-time synchronization) or continuously sync data into your cloud volume (and back).

Share Data Access to Reduce Costs and Errors

If you have multiple algorithms or if you need to do multiple runs on the data while it’s in the cloud, you can get shared access to the same dataset with Cloud Volumes. With shared access, you can accelerate your workflow by eliminating time-consuming data copy processes. And without the need for multiple copies of your data, you can also lower your costs and reduce the risk of errors and incorrect data.

See for Yourself How Much Performance You Can Gain

Another great part about Cloud Volumes is that it’s a simple-to-use cloud service offering. So, if you’re interested in finding out whether your environment can achieve the same performance gains as other Cloud Volumes customers, give it a try.

NetApp Cloud Volumes is based on industry-leading NetApp ONTAP® technology. You get both NFS and SMB support that’s enhanced with extreme performance and advanced data management capabilities. Cloud Volumes is currently available for Amazon Web Services (AWS) and in private preview for Google Cloud Platform.

Are you ready to try it out and see what kind of performance gains you can achieve? Register for access to NetApp Cloud Volumes Service for AWS or for preview access to NetApp Cloud Volumes Service for Google Cloud Platform.

View full post