Amazon Web Services (AWS) just announced a new integration with AWS Lambda and Amazon Elastic File System (Amazon EFS). I had a chance to explore some of the features ahead of the release date, and in this post, I’ll explain what this means for the future of AWS Lambda.
A definition in terms
AWS Lambda is serverless computation on AWS. The term “serverless” is often used synonymously with AWS Lambda, but more vendors than AWS offer serverless functions, and Lambda is only the compute part of serverless on AWS. I won’t say too much more about Lambdas or serverless as a concept, suffice to say it’s kind of a big deal.
EFS is distributed file storage. Because of its distributed design, Amazon EFS avoids the bottlenecks and constraints inherent to traditional file servers. Distributed data storage allows multithreaded applications and applications that concurrently access data from multiple Amazon EC2 instances to drive substantial levels of aggregate throughput and input-output operations (IOPS).
This doesn’t mean your Lambda has a memory now
So, if you can now add persistent storage to your Lambda functions, does that make them stateful? With a place to save files, you can have each run of a function affect subsequent runs, allowing iterators and accumulators across functions.
That’s a not-great idea.
AWS Lambda is designed as a stateless service for event-driven architectures, and you’re going to run into multiple issues trying to create a state machine or other stateful services:
- Lambda functions run “at least once” in response to an event, so you aren’t guaranteed to get steady accumulation from repeatedly calling a function.
- If your functions must “chain” to work correctly, a single failure can break the whole sequence, and suddenly your robust service is fragile.
- For complex state machines, other tools like step functions, EC2 instances, or containers make more sense for this kind of work
But there’s a lot you can do now
The use of serverless compute for machine learning is an obvious use case for this new feature: using EFS as a datastore, you can have Lambda functions asynchronously train to build a model.
Further, you can use EFS as a source for reference files as needed when running Lambda functions, e.g., for more detailed recognition tasks.
For tasks like virus scanning of .zip files, Lambda developers will finally have a place to put a large number of files, all of which need to be scanned individually.
You must configure Amazon EFS to run in a virtual private cloud (VPC)
EFS supports 25,000 simultaneous connections
This includes both Lambda functions and EC2 instances connected to the same file system. The simple way to prevent problems here is to limit the max concurrency of functions that have access to the file system, although this is a quite high ceiling most operations will probably not have to deal with.
Understand Security Controls
With EFS, security is paramount, with multiple checks in place to make sure data access is authorized. You can find more info on using IAM authorization and access points with EFS in this post.
To connect a Lambda function to an EFS file system, you need:
- IAM permissions for the Lambda function to access the Virtual Private Cloud (VPC) and mount the EFS file system
- Network visibility, including VPC routing/peering and security group
Configuration can further limit access:
- An EFS access point can limit access to a specific file path
- File system security (user ID, group ID, permissions) can limit read, write, or executable access for each file or directory mounted by a Lambda function
How New Relic lets you monitor this cool new thing
At launch, Amazon Cloudwatch will include some key metrics about how your EFS-Lambda connection is performing, including:
- Burst rate (if you have burst credits)
- Burst credit balance
- % IO limit (Amount of IOPs consumed on your filesystem relative to the IOPS limit)
New Relic’s Infrastructure monitoring tool gathers Cloudwatch metrics, and we’ll soon add these stats to Lambda functions using EFS.
Sign up for the New Relic users Slack channel to get updates and discuss the cutting edge of serverless infrastructure.