Defining Serverless — Part 3
Pairing auto-provisioning and auto-scaling with efficient costs
This is Part 3 of a series defining Serverless Services in terms of five common traits, which John Chapin and I introduced in our free ebook from O'Reilly — What is Serverless?. If this is the first part you've come to I advise taking a look at Part 1 first, for context. In this part we discuss the third trait common to Serverless services.
Trait #3 — Costs based on precise usage
In Part 2 of this series we described how Serverless services self auto-provision, and self auto-scale. It's great that we can hand off so much work to a vendor, but what cost does this come with? For example we saw that AWS Lambda can scale up without any effort. As I write this the default account wide concurrent execution limit for Lambda is 1000 per region. So Lambda will happily create 1000 concurrent instances of your Lambda function. Great! But how much does that cost?
The good news here is that to be Serverless a service must bill based on precise usage. We only pay per 100 milliseconds, and moreover only per 100 milliseconds that our function is actually processing an event - we're not paying for the idle time between requests that a server or container is charged for.
Since we're paying per 100 ms of active usage it also means we pay no more for 1000 lambda functions running in parallel versus running serially. This, in my mind, is extraordinary.
We see this charging of actual precise usage in other services too. AWS API Gateway charges in increments of $3.50 per million API calls, plus fine grained charging for data transfer. AWS DynamoDB is charged in terms of reads and writes per second (2 reads, 1 write). AWS S3 is partly charged per GB stored. AWS Athena, Amazon's Serverless SQL Analysis services charges at $5 per TB of data scanned, but in units of 1MB granularity. That represents a precision increment of 0.05% of 1 cent. That's what I call precise.
While as much as possible Serverless services charge based on real-time usage they will also often have some amount of overhead charge too, although this will be a minor part of the total charge for a Serverless service used in a typical case. As an example AWS Lambda will charge for storage of the code of your Lambda functions, but usually this will be a tiny fraction of your overall costs.
An important part of this discussion is that the precision of costs don't just apply as we scale-up, they also apply as we scale-down, and in fact costs should scale down to, or very close to, zero in many cases. If you don't use your Lambda functions, you aren't charged for them. Not only is this useful for variable load through a day, it's also great as services wind down in their overall usage — there's no legacy equipment costs to be concerned about. This is also great for disaster recovery scenarios — it costs effectively zero dollars to have your API Gateway and Lambda functions fully deployed and ready to execute in a fail-over region.
Note that not all Serverless services will wind all the way down to zero — many services charge a minimal basic fee. But many services do wind down to zero and those that don't shouldn't charge a minimum fee that prohibits use.
In summary, the self auto-provisioning and auto-scaling trait from Part 2, combined with the precise usage costs we've described here lead to Serverless systems being charged more efficiently and more reactively than any other deployment architecture I'm aware of. Large systems with fairly constant load 24 hours / day over months and years will almost certainly be cheaper in terms of infrastructure cost when deployed to more traditional architectures. But most systems don't fit that profile, and those that don't should look to see what infrastructure cost benefits they might gain from Serverless.
2020-04-06: Updates related to EC2 now being per-second billing.