Serverless, Testing, and two Thinking Hats
I've recently started a new video series - Cloud Coffee Break - and a subject I start covering in episode two is how I test serverless applications. My recommendations here tend to buck the trend of what a lot of people consider “best practice” in the serverless community, and I realized this subject is probably worth its own article too. I hope you find the following interesting.
A serverless testing strategy
Serverless is a step-change in how we architect software. Rather than building components that are meant to run for hours, days, or weeks, we instead are building functions that may just be instantiated for one event. Further we actively look to avoid writing and running some of our own components, when instead we can lean on vendor provided services. Given this significant change, surely testing practices need to change a lot too?
Actually, no, and in fact we find the model of the test pyramid works very well when testing serverless applications, just as it has done for the last two decades when testing traditionally-built software.
- Focus most of your tests on unit tests, and functional tests, where the tests and code-under-test are running in the same process.
- Mock and stub programmatic interfaces if necessary for your in-process functional tests, but don’t rely on “local” running versions of external dependencies.
- For your end-to-end tests (a.k.a. “integration tests”) have your application-under-test be deployed to the cloud, preferably with an ephemeral environment for each test suite run.
Also important to remember here is that the goal of unit and functional tests is to validate your code is working as you expected, and the goal of end-to-end tests is to validate the assumptions that you've made about how your application will interact with its larger environment.
For a thorough implementation of this approach, see the chapter 6 source code for our book.
Why not local simulators?
This guidance is different to what some other folks in the serverless world may suggest. Quite often I've seen the predominant testing on a project to be a lot of integration-functional-combo tests, running against a version of the app “deployed” locally with “sam local” or Localstack. There are some big problems with this approach, however.
First, “sam local” and localstack do not provide a high-fidelity replica of the cloud. They mostly work, but they are not complete, and it is unlikely that they ever will be. Moreover it is often hard to know when they are going to be an accurate simulation and when they aren't.
Local simulators have their place (which I'll describe later) but they don't provide confidence in the behavior of an application deployed in production to a cloud environment. Again, the value of end-to-end to tests is in validating the assumptions about our environment - local simulators aren't able to answer this question with authority. We have seen multiple occasions where people have used localstack / sam local for testing, only to be bitten by differences in behavior.
Second, if you are relying heavily on out-of-process code as part of your primary test suite then your test suite will be slow for a decent size app, with decent test coverage. Even ten years ago I was writing unit-test suites with thousands of tests that ran in single-digit seconds. Having thousands of functional-tests that use local simulators will be slow to run, and also as a code base is a lot of work to maintain.
Why do people use local simulators?
So why do people use local simulators for testing? I have a few theories:
It fits with the way many engineers are used to working - “If I can't build and test while on a plane with no internet then it's not a Real Project”. I'd counter this with the fact that even before 2020 I was spending most of my time on a reliable internet connection hundreds or thousands of times faster than when this sentiment was most useful. Optimizing for the 1% of time I'm offline at the expense of a more effective process the other 99% of the time just doesn't make sense any more.
Next, our end-to-end cloud testing strategy relies on a good unit test suite, and not everyone has one of those. I'll admit it - if you're running your end-to-end tests against the cloud then running your complete end-to-end test suite is likely going to take more than single digit seconds. But that's ok if you have a solid unit test suite that does run quickly so that you don't need to run your end-to-end tests so often. The problem is that writing unit tests, and designing code to work well with unit tests, is not something that many companies encourage or support, and it's not something that comes “naturally”. It requires investment and learning, but once you've worked with a project that has good unit tests you'll forever be disappointed by projects that don't.
Third, end-to-end cloud testing requires excellent deployment automation and architectural isolation, in order to be able to deploy and teardown new stacks at will, quickly. Most folks are using some form of infrastructure-as-code these days, but there's a difference between that and true environment isolation. For example, if you're using a deployment tool that is only setup to deploy the three explicitly named stages “dev”, “test”, and “prod” then you're not ready for ephemeral end-to-end testing. You have to be able to deploy any reasonable number of isolated, independent, non-production environments at any one time for this to work. Again, this is all perfectly possible, but it requires technical discipline.
Finally, I think that people use local simulators because of the one thing I do recommend them for in limited scope - experimentation - but that they don't sufficiently separate out the activities of experimentation, and building. I'll explain this more now.
The two “thinking hats” of Serverless
When we are building applications there are many things going through our minds, but two of them are:
- What does the system I'm building need to do to satisfy its functional requirements?
- How will the environment I'm building in provide, and react to, stimuli?
In traditional software development where we are writing everything on standard platforms with a few basic libraries, then we can mostly focus on just the “building” part. For this activity we can focus on writing tests, and writing implementation code.
In serverless systems, however, the surface area of our environment might be a lot larger, and it will almost certainly be more volatile. These days whenever I build an AWS-hosted system I always end up using an API call, feature, or perhaps a full-on service that I've never used before. Because of this I'm going to be spending a lot more time investigating my environment.
When I'm learning about this environment I will be trying all sorts of techniques that I don't use for writing production code - such as writing code in the web console, manually tweaking service configuration without going through the infrastructure-as-code loop, etc. And yes, for certain scenarios I'll use local simulators here too. A great example is when writing a web application with both a front-end and back-end component and I want to see how precisely the two are communicating via API Gateway and Lambda. Here being able to start a local version of those two services, and very rapidly iterate to see how my two components are connected, is very useful.
But just like I don't deploy production changes via the web console, I don't test with local simulators.
A way of thinking about this is to consider “wearing different hats” for different types of thinking (this relates to the idea of Edward de Bono's Six Thinking Hats) . When I'm learning about the AWS environment then I'm wearing my experimentation hat, and I'll use one set of techniques. But once I've done that learning and I'm ready to write some production code then I'll need my building hat - and this uses different techniques.
Similarly to how wearing two hats at the same time makes you look silly, then trying to perform with two thinking hats at the same time will lead to an inefficient process.
Effective serverless testing is fundamentally not that different to the techniques many people have been espousing for many years. Where it does require some new work is rigor around deploying ephemeral environments to the cloud, but that's a refinement of existing continuous delivery ideas, rather than a revolution.
A lot of people suggest using local cloud simulators for serverless testing, but we strongly encourage not doing this. Instead separate out your tests into in-process unit and functional tests, and end-to-end tests that run against a version of your application deployed to the cloud.
Local simulators are useful for experimentation and learning, but we recommend that you use different techniques and modes of thinking for experimentation versus testing.
If you're interested in help bringing the ideas here to your own organization, please drop us an email at email@example.com .