Serverless deployment on AWS is powered by AWS Lambda. Ephemeral storage is the default working storage provided by AWS for Lambda and is much faster than other available storage options. AWS recently announced support increase from 512 MB only up to 10 GB of ephemeral storage.
In this blog we’ll look at various storage options available for Lambda, why there was a need to increase the maximum Lambda ephemeral storage capacity, and some typical use cases where ephemeral storage should be used (and some where it should not).
Read on or jump below using these links:
AWS Lambda is the central piece to serverless architecture solutions offered by AWS. It is an event-driven compute service that lets you use almost any type of code without provisioning servers and execution environments.
AWS offers Lambda functions as a fully managed computing service that takes care of all the infrastructure for you. It was originally designed for temporary workflows such as object uploads (images, videos, files, etc.) to Amazon S3, processing IoT device’s sensor data, handling website clicks, updating DynamoDB tables, etc. Lambda functions (also referred to as “events”) can be triggered by over 200 AWS services. It is offered on a pay-per-use basis, meaning you only pay for the time the Lambda executes your code. This makes Lambda a great choice for applications that do not have consistent traffic and for performing non-repeatable, one-time tasks.
AWS Lambda provides different storage options that are able to meet different developers’ requirements. These options are:
Ephemeral storage
AWS Lambda provides a temporary file system accessible at /tmp in its execution environment. The maximum size for this temporary or ephemeral storage was capped in the past at 512 MB but has recently expanded to 10 GB.
Ephemeral storage is mostly used to support operations performed by code and should not be used for any workload that requires data to persist. Although a single Lambda execution environment can sometimes be used by multiple Lambda invocations in order to optimize performance, there isn’t a guarantee single environment will be used that way. Also, the execution environment is deleted every time a new execution environment is created.
Amazon S3
Amazon S3 storage is an elastically scalable object storage service. It is very durable and offers high levels of reliability. This service is a perfect solution for storing unstructured data.
S3 provides native integration support for AWS Lambda, allowing developers to invoke Lambda functions in response to S3 events, such as creating an S3 bucket in response to specific triggers. S3 is used heavily as intermediary storage in data lake operations supported by Lambda.
Amazon EFS
AWS EFS is a fully managed, shared, and elastic file system that can be integrated seamlessly with other AWS services. It is a durable, highly available storage option.
EFS volumes can be mounted in Lambda functions, making it simpler for developers to share data across invocations efficiently. EFS is a managed service and as such it grows and shrinks in size as you add or delete data, meaning developers don’t need to manage storage limits themselves.
Lambda Layers
Very often the code in Lambda functions use additional helper libraries. AWS allows you to bundle these libraries in the deployment archive and turn them into a Lambda layer. Multiple Lambda functions can share the layers. A single Lambda function can have up to five layers, at the same time it is also subject to the maximum storage deployment size of 50 MB.
Ephemeral storage in itself is not a new feature, but its expanded 10 GB capacity is. As discussed above, Lambda users previously had a maximum limit of 512 MB of ephemeral storage in Lambda. Let’s take a look at why AWS decided to expand this feature.
Anyone who has ever worked on AWS Lambda for tasks such as processing big files (images, videos, PDFs, etc.), ETL (extract, transform and load) jobs using large data sets, or creating and using machine learning models will know the pain of small runtime storage and performance issues on Lambda.
Tasks like these are data-intensive and need large amounts of temporary data. That data could be specific to a particular invocation (dynamically creating large files) or cached data (machine learning models). In such cases, the data can be reused for all invocations in the same execution environment and also in a highly performant manner.
With the 512 MB limit of ephemeral storage, developers had to use Amazon S3 as intermediary storage or mount an Amazon EFS file share to Lambda as a workaround. This could fix the storage problem but degraded the performance of the application, as the data could not be cached locally and every function invocation had to read (often the same) data in parallel. This also meant scaling was difficult, which would increase the complexity of the application with the need to manage more services. And don’t forget that all of that would come at an added cost as well.
The effect of all the above essentially meant Lambda did not have sufficient local storage for data-intensive applications. This drove AWS to provide a larger amount of configurable ephemeral storage for Lambda users.
The large amount of configurable ephemeral storage for AWS Lambda is a great solution for the execution of applications that require large volumes of local storage. This can be critical when performing tasks such as downloading large data sets, unzipping files, creating and processing large files, and more. Some typical use cases that will benefit to a great extent from this are:
Extract-transform-load (ETL) jobs:
The much larger ephemeral storage now available in Lambda enables many complex and data-heavy ETL jobs. It’s now possible to perform more intermediate computations, download more resources, and complete complex processing without any lag.
Machine learning (ML) tasks:
Whether for creation or usage, many machine learning models depend on large reference data files such as models and libraries. Having more ephemeral storage in Lambda enables you to download, use, and create more and larger machine learning models.
Heavy data processing:
Many data-intensive workloads, such as those which download S3 objects to temporary storage on certain events, can now handle larger objects. They can also do that in a highly performant manner.
Zip and unzip huge files:
It has never been easier and more efficient to deal with large and zipped files, as in the case of workflows that use large zip files to initialize various managed databases. Ephemeral storage allows those files to be unzipped and used with no in-memory processing required.
Processing images and video:
Image and video processing is a common and popular use case of AWS Lambda. Many applications use images for performing tasks that require huge amounts of data and often external libraries as well. Ephemeral storage provides an efficient option to perform these tasks efficiently.
While we’ve been praising the configurable 10 GB ephemeral storage, there are still scenarios where this is not an ideal solution, even though it provides a cost and performance-effective storage.
The important thing to keep in mind here is that the ephemeral storage provided by AWS for Lambda is still a temporary file system available within each Lambda function invocation and cannot be shared across all function invocations. This means that, for each function invocation, the Lambda function can use the configurable 10 GB of additional storage—but after the execution of the Lambda invocation is complete, this storage disappears and cannot be used again.
So, if your use case requires data to persist after execution—such as for processed raw data, analytical reports, application-specific logs, etc.—it is better to use various durable storage solutions, such as Amazon EFS, Lambda Layers, or NetApp Cloud Volumes ONTAP.
The AWS Lambda is a great tool to help developers run their code without getting tied down with managing servers and other execution dependencies. Until now Lambda was not an optimal choice for data-intensive applications and workflows, but the introduction of configurable ephemeral storage of up to 10 GB will significantly help to change that.
While this may have solved some challenges, to share data between invocations and for persistent data, you still need efficient storage solutions. Cloud Volumes ONTAP provides an efficient and cost-effective solution for your enterprise-grade storage needs and also helps in optimizing your storage costs.
Cloud Volumes ONTAP leverages Amazon EBS block storage to provide highly efficient persistent storage with additional enterprise features that you won’t find natively on AWS. That includes shared file services, storage optimization features that can reduce costs by 70%, zero-RPO high availability, and multicloud usability. Lambda users can easily set up Lambda events to take advantage of persistent storage with Cloud Volumes ONTAP.
Read more about how AWS users are taking advantage of Cloud Volumes ONTAP.
Ephemeral storage is the term used to refer to any storage that does not persist after the resources using it, such as a container, cease to exist. On AWS, ephemeral storage is used by several services, one of which is AWS Lambda.
AWS Lambda provides several options for storage. These options include object storage with Amazon Simple Storage Service (Amazon S3), file storage with Amazon Elastic File Store (Amazon EFS), Lambda Layers, and Lambda ephemeral storage.
The ephemeral storage provided by AWS Lambda is a temporary file share that is now as large as 10 GB. This is a huge increase over the previous storage limit, which was just 512 MB.
This non-persistent file share is useful mainly to support Lambda events as they take place. None of the data stored in this ephemeral storage will persist once the Lambda function ceases to operate. That makes it un-useful for any use case that requires persistent storage.