Analyzing Cold Start latency of AWS Lambda
Very often when John and I start talking to people about AWS Lambda, especially in the context of Java, the first question they ask is “but what about cold starts?” The answer to this question is complicated. In this article I share some quantitative research I have performed on cold start latency with Lambda, mostly with JVM runtimes. I also uncover some unexpected behavior, including “cool starts” that occur when a Lambda function has gone uninvoked for a few minutes.
Cold starts occur when AWS creates new instances of a Lambda function to process events. Cold starts are most easily observed by an increased latency to process an event, in comparison to “warm” invocations of a Lambda function. In most production systems cold starts are infrequent.
For more of a background to cold starts in general and how to mitigate them, especially with Java / the JVM, please see our O'Reilly book on Programming AWS Lambda with Java .
The first thing you need to know, however, is that if you're using Lambda for a production application then you probably don't need to care about cold starts. Cold starts noticeably impact applications that either (a) have very low throughput - e.g. a handful of invocations per day, or less and/or (b) have very little tolerance for occasional slower responses.
As an example - let's say your application is processing on average 10 requests per second, each request takes less than 100ms to process, and events occur at fairly regular intervals. In this case it's reasonable to expect that around 99.999% of your invocations will be warm, and unaffected by cold starts. Or, in other words about 1 invocation in 100000 will incur a cold start.
But let's assume that cold starts are a concern for you. How slow are they? Will you need to modify your application to improve them? And what is the impact of such mitigations?
Over the course of May and June 2020 I ran a set of latency benchmarks. The code for this experiment is available on Github. In order to get some interesting data I configure my benchmarks across the following variables:
- Invocation type - cold start, “warm” invocation, and “cool” invocation (which I explain later)
- Runtime - Java 8, Java 11, and Python 3.8
- Memory Size (which also impacts CPU performance) - 512MB, 1.5GB and 3GB
- Size and type of function artifact - Small zip (1KB), large zip (48MB) and large uberjar (53MB)
- AWS Region (15 of them - see here for which ones)
Each latency test consists of one Lambda function querying another Lambda function in the same AWS region. The latency captured is the time from when the “querying” function calls the Lambda invoke SDK to when it gets a response. The “target” function simply logs a line to standard output, and then completes. While this latency includes aspects like network delays, our “warm” invocation effectively acts as a control. I'll clarify that in a moment.
I run all the tests every 6 hours. This is frequently enough to smooth out some intraday differences, but slowly enough to guarantee that our target Lambda functions undergo a cold start when invoked. For each combination I also have 3 different target functions, again to smooth out quirks.
Warm vs Cold
Let's look at a very simple view of the results. Here we compare cold start with warm start, averaged across all other configurations.
This data isn't super useful, for reasons that will become clearer through this article, but immediately we can see that cold starts, as expected, are significantly slower than warm invocations. Throughout my data I see that the “warm” invocation round trip latency is typically around 19ms - we can use this as our benchmark for one Lambda to call another in the same region.
In this example above we can see that the warm latency is < 5% of cold latency, and so it's reasonable to approximate the addition of latency due to a cold start as the actual total “cold” number I report. Or in other words subtracting the overhead of the test harness doesn't change the data significantly.
To clarify some statistics for this graph, and subsequent ones, I am aggregating data as follows:
- For each date, invocation type (cold vs warm), region, runtime, artifact size / type, and memory size combination I
capture the approximate median latency (
approx_percentile(latency, 0.5), in SQL terms)
- I then average this median latency across all the daily results for the combined data. In this particular graph that means I'm averaging across all regions, all memory size values, all artifact sizes, and the Java 8 and Java 11 runtimes.
The raw data for daily medians by combination for this period is available on github .
I am most definitely not a statistician or data scientist, but I hope you find this method at least somewhat reasonable!
Java cold starts by various aspects
Let's break out the cold start data a little to see what's going on for some of these different combinations. First up, let's separate out by JVM type, memory size, and artifact size, while still averaging across regions.
Well, this is interesting - we're seeing quite the range. On the fast end is Java 11, 3GB memory, and a small artifact, at around 400ms. At the opposite extreme we see a few combinations all taking around 1 second - 250 % slower.
The biggest contributor to cold start time is function artifact size. The slowest small artifact test (with Java 8 + 512MB) is around 680ms, and the fastest large artifact (with Java 11 + 3GB) is around 900ms: about 30% slower.
In other words if you're using Lambda and Java, and want to reduce the duration of your cold start then one of the most effective things you can do is reduce the size of your function's artifact. Your main methods to do so are:
- Use a different package for each Lambda function and only include the code necessary for that function
- Only include the libraries necessary for each Lambda function in its artifact.
In our book we spend significant time covering how you achieve this for Java Lambda functions by using Maven modules and dependency management.
Java 8 vs Java 11
In the previous example I showed that the fastest cold start was with Java 11. To clarify that let's compare Java 8 and Java 11, for 3GB memory and a small artifact:
Java 11 is coming in at around 410 ms, and Java 8 at 560ms - about 35% slower.
So should we always use Java 11 rather than Java 8 if we want to minimize cold start latency? Unfortunately the answer isn't that simple. Let's compare Java 8 and 11 again, but now looking at data for 512MB memory and a large artifact.
Oh dear - it looks like in this case Java 11 is sometimes slower than Java 8. That's awkward.
So what does this mean? If you have a smaller artifact it's probably worth using Java 11 if cold start is your overriding concern. However there are almost certainly other drivers to whether you use Java 8 or Java 11, and they will likely be more important for you than cold start.
Zip vs Uberjar
One of the things I expected to see was that an “uberjar” packaging of an AWS Lambda function (one where library dependency JARs are unpacked at build time and re-packed into a single JAR file) would have a slower cold start time than a “zip” packaging (one where library dependency JARs are embedded in the “lib” subdirectory within a zip file ). I expected this since the official AWS documentation recommends zip packaging over an uberjar (see “Reduce the time it takes Lambda to unpack deployment packages” here), even though the official maven example uses an uberjar.
It turns out that the opposite is true, at least for the one combination of variables I used (Java 8, 512MB) - uberjars have faster cold starts than zip files:
The difference between the two packaging types isn't large - typically a little less than 10% - but it's not what I expected. It would be interesting to compare the two packaging types for different runtimes and memory sizes too - I'm only using a very specific configuration here.
Even though it looks like zip files have slower cold starts than uberjars, there are other reasons why the zip file packaging format is typically a better choice than uberjar, and it remains our recommendation. See chapter 4 of our book if you're interested in this subject.
Cold Starts around the world
It turns out that runtime, artifact size, and memory size aren't the only factors to impact cold start latency. The region that you run your Lambda function in also impacts cold start.
Here are cold start latencies averaged out over runtime, memory, and artifact, but separated out to all the 15 regions in the test:
To add some clarity here let's look just at eu-central-1 (Frankfurt) and ap-south-1 (Mumbai).
The difference here is significant. The precise difference is volatile - but overall in my tests Frankfurt's cold starts are about 25% slower than Mumbai's. I can only guess why this is so, but I wouldn't be surprised if it's to do with the size and popularity of the region, since the US based regions were also slower as a rule.
One thing you may have noticed is that for certain configurations our results can be very volatile - and remember I'm already taking the daily median across 12 samples so there's always some amount of smoothing out occurring. We see volatility occur more significantly for large artifact sizes.
Let's look at a pretty common use case - the us-east-1 region, Java 11 and 1.5GB RAM - for both small and large artifacts:
For the small artifact daily-median cold start latency varies between about 500 and 550ms, or about 10% of minimum. For the large artifact, however, the latency typically varies between 1 second and 1.35 seconds - or 35% of minimum.
While this particular view compares volatility for different artifact sizes, the keen-eyed among you will have seen in the previous section that region choice also plays a big part in cold start latency.
Is Java actually worse for Cold Starts?
Some people challenge that you shouldn't use Java at all with Lambda because it has slow cold start times. But is it really worse than other languages?
Yep, it is. 😀
Here I compare the two Java runtimes vs Python 3.8, with 1.5GB and small artifacts. Python's cold start is about 250ms, Java 11 a sliver under twice that.
So yes, if cold start time really is important to you, and you don't want to or can't use Provisioned Concurrency, then switch language.
But for many production applications the occasional one second cold start is fine, and for warm invocations the difference between the languages is too close to call. We can prove that with the following view of warm invocations across runtime:
My suspicion is that Java is better than many other languages (like Python) for latency at load for non-trivial use cases, but that's another test for another time.
And finally, just when you thought all Lambda invocations were “cold” or “warm”, it turns out there are “cool starts” too! This is something I haven't seen before.
My “warm” invocations were separated by 10 seconds from the previous invocation, and my “cold” invocations by six hours. I also added a “cool” invocation, at 4 minutes after the previous invocation. Here's what we see for warm vs cool:
We see that “cool” starts average out at about 34ms latency, or about 80% slower than a warm invocation. This includes the overhead of the test harness, so the actual difference proportionally will be bigger than this. I don't know more precisely when the gap between invocations becomes long enough to make a “cool” start occur.
If we break out these numbers by runtime we see that there is no difference for “cool” starts between Java and Python:
However (just looking at cool starts now for clarity) we do see a difference across regions for “cool” start latency:
us-east-2 (Ohio) is the fastest, ap-northeast-1 (Tokyo) about 75% slower.
This is a very specific use case, but if your lambda function is invoked every few minutes then this is something you should be aware of.
What I haven't covered
There is a lot more analysis that could be performed of cold and “cool” starts, e.g.:
- Percentile analysis (removing my daily medians) to look more in depth at volatility
- Alternative JVM custom runtimes - e.g. GraalVM
- User-application impact on cold starts, e.g. complex application instantiation using frameworks like Spring (my current test has a trivially simple Lambda function under test and so only captures the latency of the platform and runtime)
- Further analysis of uberjar vs zip packaging
- “Medium size” artifacts, e.g. using a couple of fairly standard libraries like DynamoDB and S3.
- Further analysis of “cool” starts - e.g. how long a gap between invocations is necessary before a cool start occurs?
And this is only cold start assessment! As I've mentioned multiple times I actually don't think cold starts are typically a key driver to design, so I'd also be interested in looking at throughput / latency-under-load across similar types of configuration axes.
- Cold starts in Lambda have improved over the years, but they are still significant - typically several hundred milliseconds, or more.
- Java has slower cold starts than some other language runtimes - about 250ms slower for Java 11 vs Python in certain configurations.
- Cold start latencies vary significantly depending on many configurable factors - including region
- The biggest impact to cold start latency in Java is artifact size. This is one of the reasons we recommend having separate code artifacts per Lambda function
- Cold start latencies can be very volatile, especially for larger artifact sizes
- The Lambda platform also exhibits “cool start” behavior for function instances that have not been invoked for several minutes.
If you're interested in having the raw data for this investigation, would like to help with some of the areas I'm yet to cover, or have questions or comments in general then please feel free to drop me a line at firstname.lastname@example.org , on Twitter at @mikebroberts , or via the Github repo for this work.