AWS CDK - The Good, The Bad and the Scary

Mike Roberts
Jun 7, 2022

AWS Cloud Development Kit (CDK) has become, in its short history, a very popular infrastructure-as-code tool. It’s not too surprising why - it allows engineers to use richer programming languages to define infrastructure, rather than having to use JSON or YAML. With richer languages comes better tooling, better abstractions, and more. My old friend and colleague Gregor Hohpe has used the phrase “Infrastructure-as-actual-Code” to delineate between tools like CDK (and Pulumi) vs CloudFormation (and Terraform).

I’ve spent time over the last year working on a couple of projects using CDK. I’ve now appreciated first hand the power that comes from using TypeScript to define a complicated AWS deployment. Overall I think if you have a fairly small team that is owning the application, the infrastructure code of an application, and the production deployment of that application, then CDK is worth strongly considering (I’d use it myself).

However I also still have major concerns with CDK. Some of those are immediately apparent, e.g. the lack of cloud-side support. But many of my concerns relate to longer term operability. I think choosing CDK requires making a trade-off - better efficiency for developers in the short term vs headaches for operations engineers in years to come.

In this article I share my thoughts - both positive and negative - about CDK, for technical decision makers. I hope these ideas will help as you make your own choices about what infrastructure tooling to use.

The Good

Programming Language Superpowers

Besides CDK, the standard way of deploying AWS infrastructure, with only AWS tools, has been to use CloudFormation (perhaps with SAM helping out a little). The problem with first-party use of CloudFormation is that it requires using YAML or JSON as a source language, and neither of these languages are particularly friendly when it comes to building anything of any complexity.

CDK, on the other hand, allows engineers to use JavaScript, TypeScript, Python, Java, or C# to define their infrastructure (with more languages on the way). Because of this writing CDK is much more effective than writing CloudFormation, for the following reasons:

Modern editors guide engineers using auto-complete, and catch many bugs early using syntax checking.
Standard language structures, like iteration and function calls, allow abstractions to be made within the definition of a “stack”.
Library code - shared among multiple “stacks” - is also simple, using standard language techniques, allowing further abstractions to be made.
Infrastructure-definition code can be unit tested.

Engineer enthusiasm

Unsurprisingly, engineers tend to prefer writing CDK to writing YAML! And so with the choice of CDK comes good will and a recruiting benefit - both useful for management.

Furthermore this enthusiasm is apparent in the larger community. It’s hard for folk to get particularly excited about CloudFormation, but with CDK we’re seeing dedicated conferences, open source contributions, and more - all of which support your own team’s work.

The Bad

Runtime Debugging Headaches

CDK isn’t an AWS service in the normal sense - you can’t go to the AWS Web Console and pull up the CDK view. This is because CDK is predominantly a client-side abstraction over CloudFormation.

That means while writing CDK is more effective, deploying and debugging CDK at runtime is a different story.

This is immediately obvious when you go to look at a CDK app in the only place it exists in the AWS Web Console - the CloudFormation console. All of your nicely abstracted resources are suddenly flattened into a vast number of more fundamental AWS types, all with hard-to-read generated names. And when things go wrong (things will go wrong) you still need someone on the team that understands CloudFormation.

Ongoing meta-responsibilities

The problems with CDK not having a cloud-side component don’t stop at debugging though. Because it’s a client-side tool it puts many more responsibilities on to a team that AWS would usually handle, for example:

The deployment environment must be maintained - for development and production. E.g. the correct versions of CDK and programming language environment must be maintained for the lifetime of the application. With a CloudFormation / SAM approach AWS are responsible for providing such an environment.
CDK requires a “bootstrap” environment to be set up and maintained, for each AWS sub-account. This has application cross-cutting concerns and requires careful thought.
If using JavaScript / TypeScript for CDK you will need to upgrade your Node versions fairly frequently due to Node’s (comparatively) rapid support cycle (Node LTS versions are only supported for 2.5 years). This might be particularly concerning if your application code is written using a language that has longer support (e.g. Java), but you’re using TypeScript as an organizational standard for CDK.
Unlike a regular AWS service, the CDK team is already choosing to make major version changes to CDK. This can lead to operations headaches when upgrades need to be applied:

The Scary

As a developer I mostly love CDK. I am way faster with it than I am with CloudFormation.

But if I put my CTO hat on, and think about CDK as a strategic choice for an organization, then things get a lot murkier.

Longer-term Operations Requirements

While the exciting time of a software project is when there’s a lot of activity happening, the truth is that most industrial software projects that have any amount of success tend to enter a “maintenance mode” longer-term. Deployments are less frequent, and the set of individuals looking after the app changes over time.

What happens to a CDK application in this context? I have several concerns, beyond those I mentioned in “meta-responsibilities” above.

How maintainable is the app’s CDK code? Because CDK uses a “regular” programming language it opens up the possibility for heavily engineered solutions - especially in the case of org-wide abstractions. What happens to an application-developer’s finely tuned CDK library when the organization ops team changes fundamental requirements about how a company is using AWS, e.g. a change of VPC architecture? Is that developer still employed by the company? And if not which ops engineer is going to need to figure out how the code works?
Similarly, how many programming languages are maintenance operations engineers going to be expected to know? Traditionally even if a company has used multiple programming languages for app-dev, from an operations perspective an engineer can typically get by with knowing bash / Python, or PowerShell. But if app-dev teams chose the programming-language-de-jour when building an app will operations engineers now need to know Python, JavaScript, Java, Go, and whatever other languages CDK enables, in order to support an organization’s portfolio?
What happens for applications for which a team doesn’t regularly update the CDK environments, e.g. those that go 3 years between deployments? Will they be non-deployable without significant work because of changes to the CDK bootstrap in their account? Will the version of Node they use even still be available? What happens in the case of an urgent need to deploy due to a security issue?

AWS support of CDK over the long term

As I’ve already described, CDK isn’t an “AWS service” in the normal sense - there’s no CDK API, there’s no CDK section of your AWS bill. While this may change over time, if it doesn’t that begs the question - to what extent will AWS continue to support CDK? Sure, CDK does have an official marketing page, but SAR has one of them too and it’s been given the cold shoulder over the last couple of years (and SAR at least has a cloud component.)

This breaks out in a few ways.

How stable will CDK be over time? We’ve already seen a major upgrade from V1 to V2, and V1 will no longer receive security updates from 2023. Such a rapid sunsetting is unusual in the AWS ecosystem.
Will CDK continue to receive significant investment from AWS? Since there’s no cloud-side element to CDK we can imagine that it might more-easily get mothballed if it loses favor. That’s a problem for something that will need continued updates to maintain support for its underpinnings (e.g. Node LTS updates, again.)
On a related matter, will CDK continue to support new AWS services and features in a timely manner?
Will the existing problems with CDK (e.g. the lack of runtime support in-cloud) be addressed?

Should you use CDK or not?

Over the last year I’ve heard of a lot of companies switching to CDK, largely I suspect because senior developers want to use it, and are making their voices heard. And since it’s an “officially supported” AWS product there’s a certain expectation that it will be maintained over time to the same extent that EC2 and S3 are.

But because of the fundamental nature of CDK being a client-side tool, and not a cloud-side tool, I think this expectation is false, or at least is yet to be proven. This brings up all the longer-term concerns I’ve mentioned here.

Whether you should use CDK or not, therefore, depends on how the concerns here could impact you over time. For example if you are using CDK on a small number of applications, where the applications themselves are deployed frequently, and the people on team handling deployment are the same as those who are building the app itself, then I think CDK is a good idea. Again, I would use CDK myself for this kind of organizational setup.

However I would likely not use CDK in the following scenarios:

Where an application is deployed infrequently / is already “in maintenance mode”
Where an application-development team is a separate team from the one responsible for production deployment

I do hope that over time AWS makes CDK the stable, supported, standard tool for infrastructure deployment. But for now the choice to use CDK is one that favors developers in the short term, while I fear giving a sizable headache to Ops teams in the long term, without significant ongoing maintenance.

« MonoLambdas, Nano Functions, and Goldilocks How to prepare for the coming CPU confusion »