Learning Lambda — Part 8

Cold Starts

Mike Roberts
Nov 14, 2017

This is Part 8 of Learning Lambda, a tutorial series about engineering using AWS Lambda. To see the other articles in this series please visit the series home page. To be alerted about future installments subscribe to our newsletter, follow Symphonia on twitter @symphoniacloud, and our blog at The Symphonium.

In this installment of Learning Lambda I discuss Cold Starts. Depending on who you talk to, Cold Starts may be a minor footnote in the life of a Lambda developer, or may be a complete blocker to Lambda being even considered a valid computation platform. We find how you approach them is somewhere between these two — worth understanding and treating with rigor, but not a deal-breaker in most situations.

But what are Cold Starts, when do they happen, what impact do they have, and how can we mitigate them? There’s a lot of FUD (Fear, Uncertainty, and Doubt) surrounding Cold Starts, and I hope to remove some of that FUD for you with this article. Let’s dive in.

What is a Cold Start?

Way back in Part 3 I talked about the lifecycle of a Lambda function. I wrote the following:

[Lambda creates] a container for our function, on demand, whenever it needs to execute the function. So when we hit Test for the first time Lambda spun-up an actual Linux-hosted container.

The important point here, fundamental to Functions-as-a-Service platforms like Lambda, is that we don’t have a server constantly running our code, sitting idle with a port open until a request is made. Rather our code is only instantiated when it is actually required.

When our code is instantiated the Lambda Platform and Runtime will go through a sequence of steps, which collectively are known as a Cold Start. These steps are as follows:

The Platform allocates an underlying Virtual Machine (VM) resource to host our function
The Platform instantiates a Linux container that it will run our code within, on the allocated VM
The Platform copies our code to the container, and expands the compressed artifact we supplied at deployment time
The Platform starts the language Runtime we specified, within the container
The Runtime loads our code
The Runtime instantiates our function

After these steps are performed the Platform can call the Runtime with the event that occurred in the first place, and the Runtime will in turn call our function with the event.

Why are Cold Starts special occurrences? Because typically Lambda won’t perform one for every event that triggers our function. This is because once our function has finished executing, Lambda can freeze the combination of container, runtime and function instance and keep it around a little while in case another event happens soon. If an event does happen soon then Lambda will simply thaw the container, and call it with the event. For many Lambda functions Cold Starts in fact occur less than 1% of the time. But when do Cold Starts occur? Let’s look at that next.

When does a Cold Start occur?

A Cold Start is necessary whenever there is no existing container available to process an event. This situation happens at the following times:

When a Lambda function’s code or configuration changes (including when the first version of a function is deployed)
When all previous containers have been expired due to inactivity
When all previous containers have been ‘reaped’ due to age
When Lambda needs to scale out because all current containers for the required function are already processing events.

Let’s look at these 4 types of occurrence in a little more detail.

(1) When we deploy our function for the first time Lambda will create an instance of our code, as we’ve already seen. However Lambda will also create a new container + runtime + code combination whenever we deploy a new version the function code, or when we change the Lambda configuration of our functions. Such configuration doesn’t just cover Environment Variables — it also runtime aspects like timeouts, memory settings, DLQ, etc.

A corollary of this is that one instance of a Lambda function is guaranteed to have the same code and configuration no matter how many times it is called.

(2) Lambda will keep container instances around for a little while in case another event happens ‘soon’. The precise definition of ‘soon’ is not documented, but it’s typically about 5–10 minutes. In other words if your function processes an event, and then a minute later another event occurs, there’s a good chance the second event will be processed using the same instance of your function that was used to process the first event. However if there’s an hour or more between events, your function will likely experience a Cold Start for every event. This case can be mitigated with a ‘work-around’ which I describe later in this article.

(3) Even if your Lambda event is fairly active, Amazon doesn’t keep containers around forever, even if they’re being used every few seconds. How long AWS will keep containers around is, again, undocumented but at time of writing we see containers lasting 5–6 hours and after that they’re torn down and sent to pasture.

(4) Finally, a Cold Start will occur if all current instances of a function’s container are already busy processing event, and Lambda ‘scales out’. I’ll talk about this more in the next part of this series.

Identifying Cold Starts

How do we know when a Cold Start has occurred? Here are 3 ways.

1 — Logging

First of all we can identify Cold Starts, and container instances, through CloudWatch Logs. To experiment with this let’s first create a Lambda Function (or you can use a previous one if you like):

We’ll build and deploy this as usual. Then we can invoke it:

$ aws lambda invoke --function-name MyLambda out --payload '""' && cat out && echo
{
    "StatusCode": 200
}
"Hello Lambda"

Now let’s visit the AWS CloudWatch Logs console. We can do that directly, or via the Lambda Console’s monitoring tab. Once there you should see the Lambda function’s Log Group:

If you’ve had a Lambda function before named ‘MyLambda’ then you may see some earlier lines. Click on the most recent item, and you should see something like this:

Now invoke the function a couple more times, and switch back to the log view. Wait a minute or so, refresh, and you should see log lines for the recent invocations:

Note that so far all log statements have occurred within this one Log Stream, as it’s named in Cloudfront.

Now change the function’s code, just changing the returned String is fine, then rebuild, and re-deploy. Then invoke the function again. Go back to the Logs console, and navigate up to the function’s Log Group again. You should now see two items, each representing a different Log Stream:

If you click on the older log stream, where we were before, you’ll see just the logging for the invocations we did before changing the code. Clicking on the newer log stream will show the log lines for the new code. This is because Lambda creates one Log Stream per Container instance.

Since uploading a new version of our code triggers a Cold Start we see the new logs in a new Log Stream, and we can identify when precisely Cold Starts occur by looking at the oldest event in each Log Stream.

2 — New function instance

We can also validate that a whole new instance of our function is created on a Cold Start. Change the code of the Lambda function to the following:

Here we store a semi-unique containerId for each instance of our code. Note that we specifically store the value outside of the handler, meaning that it will last across invocations of the handler function, but won’t be kept across multiple instances of the container. When we create a new instance of our code we’ll get a new containerId.

If I deploy and invoke this code I see the following:

$ aws lambda invoke --function-name MyLambda out --payload '""' && cat out && echo
{
    "StatusCode": 200
}
"This is function instance ace06a79-976e-4ec9-afdc-fcd6f46ffd48"

If I invoke it again, the function instance ID stays the same, implying that it’s the same instance of our code that’s processing the invocation.

However, say I now update the code, or configuration. This time let’s go with a configuration update, and change the timeout:

$ aws lambda update-function-configuration --function-name MyLambda --timeout 25

Now if I invoke the function I’ll see the instance ID has changed:

$ aws lambda invoke --function-name MyLambda out --payload '""' && cat out && echo
{
    "StatusCode": 200
}
"This is function instance 7dbfffab-98a0-4908-b341-c458295dce0a"

This is because when we change the configuration for a lambda function all previous instances of the function are discarded, and we Cold Start with a new instance.

3 — Latency Spike

Finally, we will typically be able to observe a latency spike when a Cold Start occurs. One way to demonstrate this is to make our function package larger, e.g. by including part of the AWS SDK in the pom.xml file :

If I package, deploy, and invoke this function a few times I see the following in my logs:

Note that the first invocation of this function took about 180ms, and the other two took less than a millisecond. (This is just how long event processing occurred in the Runtime by the way — this is not end-to-end latency within AWS.)

You’ll also typically see these latency spikes within your performance monitoring. I’ll describe monitoring in more detail in a later installment.

Impact of Cold Starts

So far I’ve described what Cold Starts are, when they happen, and how we can identify them. But why should you care about Cold Starts?

As I just mentioned in the previous section one way to identify a Cold Start is that you’ll typically see a latency spike in your event processing when one occurs, and this is most often why people are concerned about them. While end-to-end latency of a small lambda function might be 50ms in a usual case, a Cold Start could add at least 200 ms to this amount, and, depending on various factors, may add seconds, or even tens of seconds.

The reasons that Cold Starts add latency are because of all the steps that need to occur that I described in the first section. Instantiating a container, unpacking your code, instantiating the runtime, and instantiating your code all take time that is not incurred during ‘warm’ invocations.

Does this mean that we always need to care about Cold Starts? That depends a lot on what your Lambda function is doing.

For instance, say your function is asynchronously processing objects created in S3, and you are ambivalent as to whether it takes minutes to process such objects. Do you care about Cold Starts in this situation? Probably not.

As another example of where you likely won’t care too much about Cold Starts — say that you have a function that is processing messages from Kinesis, that each event takes about 100ms to process, and that there’s typically always enough data to keep your Lambda functions busy. In this case one instance of your Lambda function may process 200,000 events before it gets ‘reaped’, or in other words cold starts might only affect 0.0005% of Lambda invocations. Even if a Cold Start added 10 seconds to your start-up latency it’s highly likely that you’ll be ok with such an impact in this scenario, when you consider amortizing that time over the lifetime of a container.

On the other hand say you’re building a web application, that there’s a particular element that calls a Lambda function, but that function only gets called in AWS once per hour. This will mean that you’re getting a Cold Start every time the function is invoked. And further, let’s say for this particular function that the Cold Start overhead is 5 seconds. Is this sufficient? Quite likely not. And if not, can this overhead be reduced? Perhaps, and we’ll talk about that in the next section.

Although the concern with Cold Starts is almost always about latency overhead it’s also important to note that if your function loads data from a downstream resource at startup it will be doing that every time a Cold Start occurs. You may want to consider this when you’re thinking about the impact your Lambda functions have on downstream resources. This is a topic we’ll discuss more when we get on to matters of scaling in Part 9 of this series.

Mitigating Cold Starts

Cold Starts will always occur with Lambda as it’s currently implemented by AWS, there’s nothing to stop that. Moreover there’s nothing we can do to reduce the impact on Cold Start time of container instantiation (fortunately AWS have made improvements themselves in this area since Lambda was first launched.) It’s possible that AWS may change their strategy around container instantiation in the future, allowing us to have pre-warmed containers for paying extra $$s (Microsoft’s Azure Functions has something a little like this, with App Service Plans.) But there’s nothing we can do about such things with AWS today.

However we can mitigate most other aspects of Cold Start latency — let’s look at how.

Reduce artifact size

Often the most effective tool in reducing Cold Start impact is to reduce the size of our code artifact. We can do that in two main ways — (1) reduce the amount of our own code in the artifact to just that needed by the Lambda function and (2) prune dependencies so that only libraries that our Lambda function needs are stored in the artifact.

There are a couple of follow-on techniques here. Firstly create a different artifact for each of your Lambda functions, and observe the tasks above for each artifact. Second, if you want to optimize library dependencies further, then consider breaking dependent libraries apart to just the code you need. And perhaps even re-implement library functionality in your own code. Obviously there’s some work necessary here to do this correctly and safely, but it might be a useful technique for you.

These techniques reduce Cold Start in two ways. First of all there’s simply a smaller artifact to copy and unpack before the runtime starts up. But furthermore there’s less code for your runtime to load and initialize.

All of these techniques are somewhat unusual in modern server-side software development. We’ve got used to being able to add dependencies willy-nilly to our projects, creating multi-hundred-megabyte deployment artifacts while Maven or NPM ‘download the internet.’ This is typically sufficient in traditional server-side development since disk space is cheap, networks are fast and, most importantly, we don’t care too much about startup time for our servers, at least not in the order of a few seconds here and there.

But with Functions-as-a-Service, and Lambda in particular, we care about startup time to a much more significant extent, and so we need to be more judicious with how we build and package our software.

A particular area to be aware of here area for JVM developers is Amazon’s own Java SDK libraries. For example, the full AWS SDK from Maven is so large (with dependent libraries) that it exceeds the maximum Lambda artifact size for Lambda functions, before we even consider the impact it might have on Cold Start time. So it’s crucially important to only include precisely the libraries you need.

Remember also that the Lambda runtimes of all flavors already include certain libraries on your behalf, so you don’t need to include extra copies in your artifact.

To prune dependencies in JVM projects you may want to consider using the Apache Maven Dependency Plugin, or a similar tool.

Startup Logic

Once our code has been unpacked and loaded we still have opportunities to reduce our Cold Start time by deliberately optimizing our code for such a concern. This means being very aware of any processing we do at startup, e.g. in constructors or static initializers.

One area to be wary of here is using traditional application frameworks and code containers within Lambda functions. Spring, Express, and the like all have their place in server-side software development, but we find they often add code-size and performance overhead that is unnecessary for Lambda development.

Language choice

Another area that can impact Cold Start time is choice of language runtime. Javascript and Python simply take less time to startup than the JVM or .NET Runtime. Therefore if you’re writing a small function, that isn’t called often, and you care about reducing Cold Start impact as much as possible, you may want to use either Javascript or Python, all other development aspects being equal.

Because of this difference in startup time we often hear people dismiss the JVM and .NET Runtime as Lambda runtimes in general, but this is a short-sighted opinion. For instance in the situation I described earlier with the Kinesis processing function — what if, on average, the JVM function took 80ms to process an event, but a Javascript equivalent took 120ms? In this case you would literally be paying twice as much for the Javascript version of your code to run (since billable Lambda time is rounded up to the next 100ms). In this situation Javascript may be the wrong choice of runtime.

It’s perfectly possible to use alternative (non-Java) JVM languages within Lambda. One important aspect to remember though is that typically these languages come with their own runtimes and libraries, and both of these will increase Cold Start time.

Finally on the topic of language choice, it’s worth keeping some perspective when it comes to impact of language on Cold Start, or event-processing, performance. Often times the most important factor in language choice is how quickly you can build and maintain your code — the human element. The cost of runtime performance differences between Lambda language runtimes may pale in comparison with this.

Configuration — Memory, VPC

Certain aspects of your function’s configuration can also affect Cold Start time. For instance, a higher memory setting gives more CPU resource, and therefore a higher memory setting may speed up the time it takes your JVM code to JIT compile.

Another configuration setting that impacts Cold Starts that you should be aware of is whether you are using a VPC or not. If you are using a VPC then AWS needs to allocate an ENI (Elastic Network Interface) at container instantiation time. For lower-latency-requirement lambdas where this impact is a problem you may need to re architect your system so that your Lambda function doesn’t need to have VPC access enabled.

The ‘Ping’ work-around

All of the mitigations above are about reducing the time Cold Starts take when they occur. There’s one final mitigation I want to describe which is useful in certain contexts. This technique is simply to reduce the number of Cold Starts that happen at all.

I mentioned earlier in this article that if there has been no event activity for 5–10 minutes then the next event to happen will incur a cold-start. One way to work around this is to force a ‘dummy’ or ‘ping’ event to be processed by the Lambda. This can be pushed through outside of the component (e.g. if you have a Lambda that processes web requests you can fire a dummy web request at the system), alternatively you can use Cloudwatch Scheduled Events to call your Lambda function directly.

There are several drawbacks to this approach:

Your Lambda code likely needs to be programmed to understand the difference between a ‘real’ event and ‘ping’ event.
In the case of Lambdas implemented in Java you might not be able to use inbuilt POJO deserialization if you’re implementing the ‘ping’ as a Cloudwatch Scheduled Event, since the event structure is likely different to your ‘real’ event.
Since Lambda is charged by the 100ms, you are paying for your Lambdas to handle the ping events. This shouldn’t be much of a cost, but if you have many Lambdas it may be. Also remember the possible cost of running the actual ‘pinger’.
If you’re using Cloudwatch Scheduled Events you need to keep your scheduled event triggering precisely the same version as your actual production event source, otherwise it’s having no useful impact.

There’s no question that this is a hack, but it’s a hack that organizations have used successfully, and is even implemented as a default setting in Zappa, a popular Python Lambda framework.

Summary, and next time

That brings us to the end of Part 8 of Learning Lambda. In this article we’ve explored the subject of Cold Starts in depth. Cold Starts might be nothing you need to ever spend too much effort on, depending on what you use Lambda for, but’s it’s certainly a topic that you should be aware of, since how Cold Starts are mitigated often run contra to how we typically build and package systems.

I mentioned FUD around Cold Starts at the beginning, and Cold Starts are also often thrown under the bus for latency problems that are actually nothing to do with Cold Starts at all. Remember to perform proper latency analysis if you’re having latency concerns — make sure your actual problem isn’t, for example, how your code is interacting with a downstream system.

So far in this series we’ve only been talking about processing a small number of events, one after the other. However one of the joys of Lambda, and Serverless in general, is the automatic and vast scaling capabilities it gives us. In the next part I’m going to talk about how Lambda scales, and what impact that has on how you develop and operate your Lambda functions.

To see the other articles in this series please visit the series home page. To be alerted about future installments subscribe to our newsletter, follow Symphonia on twitter @symphoniacloud, and our blog at The Symphonium.

Need help with Lambda, or other Serverless technologies? We’re the experts! Contact us at Symphonia for expert advice, architectural review, training and on-team development.

« Learning Lambda — Part 9 Learning Lambda »