The Occasional Chaos of AWS Lambda Runtime Performance
From the documented worst case to unpredictable speed demons
In non-Serverless systems, we can use the baseline performance of a system to make predictions about how that system will behave in the future. Absent changes to the underlying infrastructure and external dependencies, that baseline performance doesn't change quickly, or drastically. We can take a few measurements over a relatively short period of time, and use that information to start predicting the future.
Surely we can do the same thing in a Serverless world, right? In this article, we'll discover that AWS Lambda performance can be very difficult to predict, especially for lower-memory Lambdas. Establishing performance baselines using just a few invocations of a Lambda (as we've seen in many articles about Serverless performance) is simply not sufficient to predict the behavior of the Lambda over a longer period of time.
Before we dive in too far, let's review the basics of AWS Lambda configuration. Control over the runtime performance of a Lambda is boiled down into a single dial, memory. Ranging from 128MB to 1.5GB, that setting affects not only the memory available to the Lambda, it also scales the CPU, network and disk performance proportionally. Therefore, we expect that a Lambda configured for 256MB of memory would be twice as powerful as a 128MB Lambda, and a 1GB Lambda should be twice as powerful as a 512MB Lambda.
However, is that actually true? Does Lambda runtime performance scale proportionally to the memory setting? If not, how does the real-world performance differ from the expectations set out in the documentation?
Let's find out.
To answer our questions about Lambda CPU performance scaling, we'll use a simple methodology. We've built a basic AWS Lambda function using Java 8, which executes a recursive Fibonacci algorithm. The Fibonacci parameters and the number of iterations have been chosen to ensure that the code can execute within the lifetime (aka, the configured timeout) of a Lambda, with any of the available memory settings.
The code for our Lambda function can be found on Symphonia's Github page. The same code, FibonacciLambda, is deployed as seven separate Lambda functions (in AWS’ us-west-2 region), each configured with a different memory setting. The seven memory settings we're using are 128MB, 256MB, 512MB, 768MB, 1024MB, 1280MB, and 1536MB — there are more available, but these represent the minimum, maximum, and a range of intermediate values. If performance scales according to the memory setting, the chosen values should show a nice spread of performance characteristics.
To invoke these Lambda functions, we've set up Cloudwatch Event triggers — Lambda's equivalent of cron. Those triggers invoke each Lambda function once every four minutes. Four minutes is long enough so that even the 128MB Lambda should have plenty of time to completely execute the Fibonacci algorithm, so we should only have a single execution of a Lambda running at a given time. It is also frequent enough that most invocations of the Lambda functions will be “warm”, meaning that they will take place on an already-started container instead of waiting for one to be started from scratch (a “cold start”).
At a high level, the only real performance measure that we need to keep track of is the duration of each Lambda execution. Because the algorithm is the same for each Lambda function, we should see the higher memory Lambdas executing in less time. Function duration is one of the Cloudwatch metrics that are enabled by default for all Lambdas, so we can easily monitor it without any additional configuration.
Show me the numbers!
Without further ado, here are the Lambda execution durations over the course of a 48 hour period (720 invocations per Lambda).
This is not the graph we expected to see! Instead of a nice neat set of seven distinct non-overlapping lines spaced out linearly, we have this instead. It's not exactly chaos, but it is unexpected. At some times, the 128MB Lambda is executing the Fibonacci algorithm just as quickly as the 1536MB Lambda. At other times, it takes about 12 times as long, which is perfectly proportional to the memory settings. The other lower-memory Lambdas exhibit the same inconsistent behavior.
What we find after some analysis is that the documented performance scaling generally represents the worst-case scenario for a given memory setting. So, while a 128MB Lambda might often execute in the same amount of time as its 1536MB cousin, its worst case execution time will be 12 times slower, which is proportional to the memory settings.
These dramatic performance changes aren't necessarily consistent between Lambda memory settings. For example, at a time when a 128MB Lambda might be exhibiting worst case performance, a 512MB Lambda might be performing better than expected. In some cases lower-memory Lambdas may even execute faster than the 1536MB Lambda.
With so much inconsistency, can we set any bounds on Lambda performance? It's difficult, and affected by the instantaneous state of a platform that we can't introspect, but we'll give it a try anyway. Let's look at the data in a slightly different way, sorted by execution durations:
From this chart we can see that while the shorter execution times (on the left-hand side of the graph) are somewhat muddled, most of the time, the highest memory Lambda is likely to be the fastest. And, over time, the rest of the Lambdas perform according to their memory setting, as expected.
So, “does Lambda performance scale proportionally to the memory setting?” Yes. In fact, AWS Lambda performance scales at least proportionally to the memory setting. When the platform has the resources available, our Lambdas might perform better than expected. However, that inconsistency can also throw any attempts at benchmarking or performance tuning into chaos.
Consistency and Predictability
With those previous bounds in mind, we see that what we're paying for isn't just pure performance, it's also consistency and predictability. Our 1536MB Lambda executes quickly, with little performance variation between invocations. At the other end of the spectrum, our 128MB Lambda could take anywhere from 16 seconds to 3 minutes to execute the same algorithm.
Why is there such a wide range of performance for lower memory Lambdas? It's clear from our measurements that the AWS Lambda platform will make more CPU power available to lower memory Lambdas, but only when those CPU resources are available. When it's not, the platform will fall back to the documented behavior. It's important to understand that AWS can change the behavior of the platform at any time, and we should never count on better performance than what is documented.
Take it away…
Armed with this knowledge, we can make better decisions about how to configure Lambdas. If our code has modest resource requirements, and can tolerate large changes in performance, then it makes sense to start with the least amount of memory necessary. On the other hand, if consistency is important, the best way to achieve that is by cranking the memory setting all the way up to 1536MB.
It's also worth noting here that CPU-bound Lambdas may be cheaper to run over time with a higher memory setting, as Jim Conning describes in his article, “AWS Lambda: Faster is Cheaper”. In our tests, we haven't seen conclusive evidence of that behavior, but much more data is required to draw any strong conclusions.
The other lesson learned is that Lambda benchmarks should be gathered over the course of days, not hours or minutes, in order to provide actionable information. Otherwise, it's possible to see very impressive performance from a Lambda that might later dramatically change for the worse, and any decisions made based on that information will be rendered useless.