Using Infrastructure-as-Code for AWS Organizations
Infrastructure-as-Code (IaC) is a technique for automating the deployment of software resources. It’s an alternative to deploying resources manually through a graphical UI (sometimes described as “ClickOps”), or via commands in a terminal. IaC’s benefits are that it increases speed, reduces effort, and reduces risk.
In its early days, IaC was mostly used for low level infrastructure - virtual machines, etc. - but these days IaC is used for pretty much anything that can be deployed in the cloud. This means it can be used for both low-level resources and highly abstract concepts.
When building an AWS Organization what you’re actually doing is deploying and configuring resources in the Cloud. This means that you can apply IaC to your Organization, and this article shows you how.
Welcome to Org Ops Part 4
This is the fourth article in my series on AWS Organization Operations - or Org Ops for short. This series describes how I recommend you approach managing the most fundamental resources in AWS for small-medium sized companies.
In Part 1 I explained why it makes sense for most companies using AWS to adopt a multi-Account AWS Organization, and what the shape of such an Organization might look like. Part 2 covered the activities involved with building the Accounts in such an Organization. Part 3 described using AWS Identity Center for implementing human user management in your Organization.
None of the previous articles used IaC techniques in their recommendations. At the end of Part 3 I explained why this was: that at the very beginning of creating an Organization you might not be able to use IaC, or it might make more sense to use the Web Console even if you can.
However once you have some basic concepts deployed then it’s usually time to consider IaC.
This article covers the following areas:
- How to decide whether to use IaC for different types of Organization resource, and if so which tool to use
- How to use IaC for Organization-scope resources
- How to use IaC for Organization-managed, Account-scope, resources
This article assumes that you have prior experience using and building IaC processes - e.g. for deploying application or platform resources. If you don’t have such experience you’ll probably still find this article useful, but you may want to read up on IaC in general before moving ahead with using it for your Organization.
Tooling
Any discussion of Infrastructure-as-Code inevitably involves the language and tooling of the code itself. You may already have your own preferred IaC tooling.
When dealing with IaC and AWS Org Ops it’s useful to consider three different tools / types of tool:
- Scripts (Shell / PowerShell / Python scripts, invoking the AWS CLI)
- AWS CloudFormation
- Everything else (including AWS CDK, Terraform / OpenTofu, Pulumi)
CloudFormation is AWS’ general-purpose IaC tool. It’s a language and service that AWS has supported for many years, and I expect will continue to support for many more to come. It has an excellent backwards-compatibility history, in other words I can take a template file that was used to deploy resources several years ago and expect that template file to still work today.
CloudFormation has good, but not universal, coverage of AWS - i.e. while most resource types can be deployed with CloudFormation, not all can.
The main drawback to CloudFormation is that it has a somewhat tricky template syntax, requiring coding in YAML or JSON files. For that reason I mostly prefer to use AWS CDK since it uses the CloudFormation service under the covers, but allows me to define resource deployment in a higher level programming language.
You might find this list surprising, after all if your company has standardized on CDK or Terraform then why bother with the less expressive tools? My reasoning is two-fold: the bootstrapping problem, and questions of longevity.
Bootstrapping the chicken and egg
Many aspects of Org Ops involve bootstrapping: setting up an environment for other tasks. But a problem is that you also need to bootstrap the tools you need for Org Ops itself.
Using AWS CDK as an example: CDK is AWS’ IaC tool for expressing resource deployment with high-level languages, like TypeScript and Python. To run CDK in an Account you first need to deploy CDK Bootstrap resources (including an S3 bucket). You can’t deploy those resources with CDK itself because you haven’t bootstrapped CDK yet. And so we have a “chicken and egg” problem.
There are two ways to work around this issue, both of which I use myself at different times:
- Don’t use higher-level IaC tools for explicit or implicit resources that the IaC tools themselves rely on. Instead, for example, use plain CloudFormation for managing CDK’s environment.
- Alternatively, bootstrap dependencies manually, and bring them under higher-level IaC management later. E.g. for setting up users that can run CDK: create your Identity Center resources manually at first, and then manage most of Identity Center using CDK once it’s bootstrapped.
The main reason I called out CloudFormation explicitly in the earlier list is because it has no environment dependencies beyond having a user with appropriate permissions - which you need for running anything in the web console or CLI anyway.
Long-lived resources need a long-lived tool
Some of your Organization resources are going to last a long time. For example your management Account is likely to exist for as long your company does! And certain resources, like your database for human users, and centralized auditing tools, may well live as long.
There’s a tension here between multi-year-lifetime resources and what can be, frankly, the “IaC tool of the week”. In other words, don’t manage crucial org-wide resources that could live for a decade using a tool that might not make it until next year. I’ve seen many companies use either home-grown tools, or new open-source / commercial tools, that become unsupported dead ends long before the resources they manage are retired.
Picking the tool for the job
It’s these two concerns that cause me to recommend that you pick an IaC tool on a case-by-case basis when dealing with Org Ops, rather than necessarily sticking to “the company standard”. And there are other concerns that question whether using IaC at all makes sense for particular tasks. Here’s a flowchart to help you decide:
Picking an IaC tool for an Organization resource
- Do you need / would you be better off just using the web console? For example the initial Organization initialization must be performed through the web console, and Identity Center provides a perfectly adequate UI for creating users.
- If you’re not using the web console, would it still make more sense for now just to perform a manual command via the CLI? While I’m a fan of IaC, I’m also a pragmatist. If there’s a task that’s only going to be run once, or very rarely, I’m fine with just performing it manually, and documenting it.
- If you’re going to automate deployment then use your preferred higher level IaC tooling unless the resource is impacted by the two concerns I described in the earlier section…
- … and if it is impacted then default to CloudFormation unless CloudFormation isn’t possible, or is more annoying than just using a script
As I said, this process isn’t exact. And what’s more it may make sense to revisit some decisions over time - especially as and when your company grows, or (as I explained in the bootstrapping section) once you have initial resources available and are moving to a position of long-term maintenance.
There’s one exception to this process which is if you’re using AWS StackSets, which themselves require CloudFormation. I cover StackSets later in this article.
Prerequisite - AWS terminal environment
Any tooling you use will likely need you to perform commands from a terminal with AWS credentials available, and probably with the AWS CLI available too. I’m not going to go into detail on how to set this up, however here are a quick few points.
If you don’t already have a terminal environment configured then a quick way to get started to perform small tasks, and give you an idea of how things work, is to use the Cloud Shell built into the AWS Web Console. Cloud Shell comes preconfigured with the same permissions as the browser it was launched from; with various pre-installed tools; and a few other things.
Longer-term you’re probably going to want to use your regular desktop terminal. To install and configure the AWS CLI see the documentation. Note that if you’re using Identity Center for human user access to AWS then you’ll want to read the page on Identity Center authentication.
Here’s what I do, on a Mac.
- I install AWS CLI V2 using AWS’ own installer (docs)
- I use
direnv
, installed via Homebrew, and configured for my shell. - Under my development tree I have multiple sub-directories, each corresponding to a different Organization and Account (e.g. ~/src/superior-widgets/org-ops/). Each directory has a .envrc file, which contains something like the following line, naming the AWS profile to use in that directory. Here’s an example where I write one example to my terminal:
> cat ~/src/superior-widgets/org-ops/.envrc
export AWS_PROFILE=superiorwidgets-orgops
- Each profile is represented in my ~/.aws/config file, e.g. as follows for an Account where I have admin permissions:
[profile superiorwidgets-orgops]
sso_start_url = https://d-1111111111.awsapps.com/start
sso_region = us-east-1
sso_account_id = 123456789012
sso_role_name = AdministratorAccess
region = us-east-1
output = json
- When I start a session in a terminal I run
aws sso login
. This requires me to authenticate in Identity Center in a browser first. - I can now run AWS CLI commands (and other tools that call the AWS API), which uses the profile in the .envrc file.
That’s enough talk about tools, let’s get on to how we can apply IaC to Organization-scope and Organization-managed resources.
Organization-scope resources
Organization-scope resources are like those I’ve described so far in previous articles in this series. Organization-scope resources are deployed once per Organization but have access to, or are of use to, multiple Accounts within the Organization.
An example is Identity Center and the users and permissions configured within it - we deploy an (Organization-scope) Identity Center instance precisely once per Organization.
There are many possible Organization-scope resource types - take a look at AWS’ own list of services that integrate with Organizations for an idea how many, and those are just the ones that have explicit integration. There are even more resource types (like top level DNS, which I’ll address in a future article) that can have impact to multiple Organization Accounts without having any particular ties to the Organizations service.
Using IaC with Organization-scope resources is a lot like using IaC for application resources - we define the resources in source code, and then run the IaC tool to perform the deployment.
Picking the Account and Region
The main question that is different to “regular” application or platform IaC, for these types of resource, is picking what Account and Region to deploy them to.
Account
There are three possible scenarios here.
The first is that you’re deploying an Organization-scope resource that has no required association to your Organization’s management Account. For example if you’re deploying a shared VPC (which I’ll cover in a future article), you can deploy the VPC in whichever Account you choose. I typically use the same Org Ops Account that has my other cross-Organization resources, but it’s up to you.
The next scenario is if you’re deploying a resource that can only be deployed in the Organization management Account. There are very few of those, but there are some. This is the least preferable option since, as a rule, I and AWS recommend doing as little work as possible in the management Account.
The final option is that you’re deploying a resource which would be deployed by default to the management Account, but for which you have delegated administration. For example, when I covered Identity Center in the previous article I described the process by which you could delegate administration for Identity Center resources to your Org Ops Account. If you do this then you can run your IaC tooling within the context of the Org Ops Account instead of the management Account. Many AWS Services that have Organization features can use a delegated administrator Account. For precise documentation see this page.
Region
Whenever you deploy a resource in AWS you have to specify not just the Account, but the region too. There are a few things to consider when defining the region for Organization resources:
- Are you deploying changes or components to a service you’ve previously deployed? In which case you probably want to use the existing service’s deployed region. For example any updates or additions to Identity Center resources (like permissions and assignments) should be performed in the same region that you originally used when you initialized the Identity Center instance.
- Are you deploying resources that AWS require or recommend you use in a specific region? If so, choose that one.
- Are you deploying to multiple regions? E.g. if you have workloads in multiple regions, and you want to use a shared VPC, you will likely need to deploy a separate shared VPC per region.
- Otherwise default to your Organization’s “primary” region, e.g. the one where you’ve deployed Identity Center
Organization-scope example
As I proceed through more of these articles I’ll introduce some better examples for this, but here’s the basic idea of using CloudFormation to deploy some Organization-scope resources. In this case I want to create a user group in Identity Center with a couple of users in it.
NB: I don’t particularly recommend using CloudFormation for this particular task - CloudFormation makes it fiddly, and the web console UI works well. But it makes for a fairly self-contained example that fits within the general rules of tool choice I described earlier.
First of all I create a CloudFormation template, which looks as follows (with the actual IDs scrubbed):
AWSTemplateFormatVersion: 2010-09-09
# Required for Fn::ForEach
Transform: 'AWS::LanguageExtensions'
Parameters:
IdentityStoreId:
Type: String
Default: 'd-1234567890'
MikeUserId:
Type: String
Default: '12345678-1234-1234-1234-123456789012'
JaneUserId:
Type: String
Default: '23456789-1234-1234-1234-123456789012'
Resources:
WidgetDeveloperGroup:
Type: AWS::IdentityStore::Group
Properties:
IdentityStoreId: !Ref IdentityStoreId
DisplayName: Widget Developers
# Define a GroupMembership resource - one for each user in the group
Fn::ForEach::WidgetDevelopers:
- Username
- [ 'Mike', 'Jane' ]
- ${Username}InWidgetDevelopers:
Type: AWS::IdentityStore::GroupMembership
Properties:
IdentityStoreId: !Ref IdentityStoreId
GroupId: !GetAtt WidgetDeveloperGroup.GroupId
MemberId:
# Assign the *value* of '${Username}UserId',
# from the Parameters section, to UserId
UserId: !Ref
# Fn::Sub required here (rather than !Sub) because within a !Ref
Fn::Sub: ${Username}UserId
I save this to a file named template.yaml - but it can be anything you want.
A few things to note:
- CloudFormation doesn’t specifically have the idea of “global constants”, but we can use
Parameters
withDefault
values as a work around. IdentityStoreId
,MikeUserId
, andJaneUserId
are the IDs of the Identity Store and users that I’ve previously created - Identity Store must be initialized through the web console, and Identity Store users can’t currently be defined in CloudFormation. To find these values I look in the web console then save them to my template file.- We create the group itself (
WidgetDeveloperGroup
) first, and then the assignment of users to the group (Fn::ForEach::WidgetDevelopers
) must be performed separately, for each user. CloudFormation at least makes this possible without duplication using theForEach
function, but it’s some moderately tricky syntax.
To deploy this template I can run the following command. I do so in my Org Ops Account because I already delegated administration (see previous article) and in region us-east-1 because that’s where I initialized Identity Center. I don’t need to explicitly specify the Account or region when I run the command since they are part of my AWS Profile configuration, as I described earlier.
> aws cloudformation deploy --template-file template.yaml --stack-name organization
This creates a stack within CloudFormation, and the new group (with users in it), is now in Identity Center.
Deployed Groups in Identity Center
Organization-managed, Account-scope, resources
So far I’ve discussed resources that are each scoped to the Organization. Some other resources are deployed and scoped to an Account, however we’d ideally like to manage them at an Organization level, perhaps for the following reasons:
- To reduce duplication. We might want to deploy resources in almost identical ways across multiple Accounts or Regions
- To reflect ownership. The people that care the most about the resource may be the Organization administrators themselves, rather than the administrators of member Accounts.
- To specialize skills. Some resources rarely change, but are deployed similarly across Accounts. And sometimes such resources come with a high risk if managed incorrectly, e.g. opening an insecure hole into the company’s systems. One good way to mitigate such concerns is to have a common group of people manage such components and centralize the definition of them.
An example where we want to reduce duplication is environment management for higher level IaC tools. AWS CDK needs to be bootstrapped in every Account and region in which you use it, but the definition of such bootstrapping is often identical, or very similar, across an entire Organization. Similarly, Terraform needs somewhere to store its state, and you might choose to use Account resources (e.g. S3 buckets) for this.
An example where we want to specialize skills are when we define roles allowing access from external systems, such as GitHub Actions for deployment automation. The roles need to be deployed within the Accounts whose resources need to be controlled by the external system. However such roles are security sensitive, and are likely very similar across the company.
Whatever the reason, for this type of resource we’re usually deploying multiple resources across Accounts and regions, but from one infrastructure definition. And this requires a little more assistance from our tooling than vanilla IaC, because with most IaC tools resources can only be deployed in one Account and region at a time.
There are at least a couple of ways to implement this:
- Use CloudFormation StackSets to express the Organization-wide scope within the deployment process itself
- Use deployment automation (CI/CD) to fan out the deployment of one version-controlled project across multiple Accounts / regions
If you’re using a non-AWS-developed IaC tool it may also have its own support for multi-Account targets, but that’s out of scope here (and beyond my knowledge at this time!)
CloudFormation StackSets
With CloudFormation StackSets you first write the definition of the resources to be deployed in each target Account (and target region). Then you deploy a StackSet resource to an administrator Account - this contains the target Stack definition and a description of which Accounts to target. The CloudFormation Service then handles deploying to each Account on your behalf.
A particularly useful aspect of StackSets are that they can be configured using your AWS Organization, and specifically Organization Units (OUs) - the hierarchical groups within your Organizational Account tree structure. If you define a StackSet to be deployed to an OU the following occurs:
- When you deploy using StackSets then CloudFormation will look up all the Accounts in the OU, and use those as its target Accounts
- When a new Account is added (and optionally when it is removed), then CloudFormation will notice the change and automatically deploy the Stack to the new Account without you needing to explicitly deploy the StackSet again
As an example - say that you have all your application Accounts within an OU named “Applications”, and you create a StackSet that defines your CDK bootstrap environment with a configuration to deploy to all Accounts within the “Applications OU”. When you deploy the StackSet then the target Stack will be deployed to all the current Accounts in the Applications OU. And if you move a new Account into the Applications OU in the future then the CloudFormation service will automatically deploy your CDK bootstrap stack to that Account without any further effort on your part.
Deploying with StackSets
Fan out with deployment automation (CI/CD)
Another option for Organization-managed, Account-scope, resources is to build the multi-Account nature on top of the IaC process, and invoke your IaC tooling once per Account / region, but using the same definition each time.
Using the same example as above we can define the CDK Bootstrap still as a CloudFormation template, but run CloudFormation separately for each Account and Region. If we’re running CloudFormation from our deployment automation (CI/CD) tooling we can configure the set of Accounts and Regions within the automation configuration, and let the automation tool fan-out the deployment across all the targets.
If you use GitHub Actions as your automation tooling you may (for example) choose to use manually repeated definitions within your workflow file; matrix strategies; or use a workflow per target.
Deploying with GitHub Actions
Which technique to use?
While I love the idea of StackSets, they have a couple of issues which mean I don’t use them very often:
- They require writing a CloudFormation template for at least the target Account resources (CDK has a construct for StackSets, but it’s still in “experimental” status), and you need to be least somewhat knowledgeable about using CloudFormation in general.
- Sometimes the state management for StackSets can get funky. And the issue with that is that if you want to reset a StackSet you have some amount of shared state over multiple targets.
StackSets really come into their own when you have 5 target Accounts or more - at that point the automatic integration with Organization OUs can save a lot of time (and CI/CD execution costs). On the other hand the more Accounts you have, the more state management issues can bite you down the road.
There are also drawbacks to the fan out approach:
- It requires an extra tool
- It duplicates explicit configuration of Accounts and regions which already exists within your AWS Organization 1
- It requires running the deployment for all targets when a new target is added (this will usually result in a safe “no change” deployment for each existing target, but it’s wasted time and deployment resource cost)
My rule of thumb is - don’t use StackSets for something where deleting the underlying resource and starting again would cause a customer-facing outage, and don’t use StackSets where you expect fewer than 3 target Account + Region combinations in the long term. But if you are happy with using CloudFormation, and want to save some CI/CD effort, then give them a try.
If you’re interested in seeing an example of using StackSets for something simple then AWS have an example of using them for CDK Bootstrapping. This example uses far too much “ClickOps” for my liking, but hopefully you should get the gist.
I’ll also give an example of using both StackSets and the fan out approach in the next article in this series.
I’m considering writing more on this area, with more examples, in future - so keep your eyes peeled on my social / RSS feeds, mentioned at the end of this article
Next Steps
This article has covered how we automate the deployment of Organization resources via Infrastructure-as-Code. But we can also automate the deployment process itself, rather than requiring an engineer to manually run it. This is because we can use deployment automation, otherwise known as CI/CD, techniques to provide a fully hands-off environment driven by validated updates to a source control tool.
I’ll cover how to think about CI/CD for Organization resources in the next article in this series.
Summary
In this article I explained how to apply the technique of Infrastructure-as-Code (IaC) to resources related to your AWS Organization. I recommended you consider three areas:
- Decide whether IaC is, in fact, the best technique to use on a resource-by-resource basis - and if it is then similarly consider which IaC tooling works best for that resource at this point in the evolution of your Organization.
- Consider how to apply IaC techniques for Organization-scope resources - those that are deployed once but used across your Organization. For those resources think carefully about which Account and which region they are best deployed to.
- Consider how to apply IaC techniques for resources that are scoped to several Accounts and/or regions, but managed as one across the Organization. I showed two different techniques for collectively automating these types of resouce - by using CloudFormation StackSets, and by using your deployment automation (CI/CD) tools to fan out a regular IaC process to multiple targets.
IaC has many speed, effort, and safety benefits. It can bring those benefits to your overall Org Ops strategy when applied with a few specific nuances.
Feedback + Questions
If you have any feedback or questions then feel free to email me at mike@symphonia.io, or contact me at @mikebroberts@hachyderm.io on Mastodon, or at @mikebroberts.com on BlueSky.
-
In theory the CI/CD workflow could actually call the AWS Organizations API to dynamically load targets, but I haven’t done that or seen it in practice ↩︎