How to prepare for the coming CPU confusion
The software industry is about to have a big shake-up.
For nearly 15 years most server-side software has been both developed, and deployed, on x86 CPUs. From laptops to serverless, desktops to VMs, the microprocessor family used by environments has been the same.
But the times, they are a changin’, and this period of CPU architecture stability is coming to a close. Why? Because of the coming of ARM.
In this article I explain what this change is, why people are excited about it, why it's going to be tricky, and most importantly how you can start preparing to embrace it. Here's a quick sneak-peek - the trick is to use ARM-based build servers for pre-merge checks. But I'm getting ahead of myself, so let's start at the beginning.
Laptops and Servers - no longer ‘armless
ARM-based CPUs are all around us because they power most smartphones and tablets. ARM's main benefit over x86 has always been its significantly better power consumption, but that has historically come at the trade-off of lower performance. However ARM is no longer a slouch when it comes to speed, to the point where it is now comparable with x86 for many use cases. This is causing the following two changes to occur:
- ARM is replacing x86 in laptops and workstations. For example, Apple is moving to ARM-based “Apple Silicon” CPUs for its computers. The first ARM Macs are due to be available before the end of 2020.
- ARM is also replacing x86 CPUs in the cloud. Amazon Web Services’ (AWS) second generation of ARM-based “Graviton” processors are proving to be extremely capable. Companies like Honeycomb are seeing approximately 50% cost reduction for at least equivalent performance by switching their AWS hosts from x86 to ARM.
I expect that in 5 - 10 years’ time most workloads, both local and remote, will be running on ARM. But until then we're going to be living in a mixed up world.
The future will be here very soon, it just won't be evenly distributed
Before the end of 2020 some software developers will start working locally on ARM-based Macs, but the software they write will likely be running remotely on x86 platforms.
On the other hand, most developers will continue running locally on x86, but even now some companies are deploying their software to run remotely on ARM servers.
This heterogeneity of environments is going to be with us for a while. And without some planning it's going to hurt.
Cats and dogs living together, mass hysteria!
The difference between running on ARM and running on x86 isn't as simple as just using two different libraries, or even two different operating systems. ARM and x86 are fundamentally different CPU types, with different capabilities, restrictions and behaviors. This introduces a problem - just because something works locally in a development environment doesn't mean it will work at all, or work with the same performance behavior, in a deployment environment with a different CPU.
We've been here before. Before 2006 it was pretty common to have mixed up environments - coding on x86 and deploying to Sun SPARC; coding on PowerPC and deploying to x86. Everything was mostly fine … apart from when it wasn't.
The good news is that most server-side developers these days aren't writing “native” code. In other words they are coding without any particular assumptions about the underlying CPU. This higher-level coding world means that if my language and libraries are available for both ARM and x86, then in theory I can work locally in one, and deploy remotely to the other, with the minimum of fuss.
But, of course, anything that can go wrong will go wrong. And there are plenty of opportunities for subtle, occasional, problems to creep in. So what should we do about this?
Ain't nothing like the real thing
One response to this problem is to use simulators and virtual environments. For example, Apple has announced that it is launching Rosetta 2 to allow people to run software compiled for x86 on an ARM-based Mac. In a development world, simulators will catch most problems. But not all of them, especially not ones related to performance.
The best solution is to test our code on the same CPU architecture that we deploy to. This is made easier by using cloud-based testing environments.
Cloud-based testing comes in two basic forms:
- Integration testing, where we deploy our software to production-like environments
- Unit and functional tests, which we run on build servers
By using cloud-based environments that use the production-target CPU architecture, rather than our development workstation CPU architecture, we can more accurately test how our applications will act, and perform, on their target CPU architecture.
In other words, say your organization uses AWS, and you want to switch to ARM for some of your workloads today. Likely none, or very few, of your developers have ARM-based laptops or desktops, so how do you catch problems? The key are the automated build environments you likely already use.
Bring on the Build Servers
Build servers, otherwise known as “Continuous Integration” servers, are used at various times in a software development lifecycle:
- “Pre-merge”, e.g. for validating individual branches / pull requests
- “Post-merge”, e.g. for checking the stability of a source code mainline
- As part of a continuous delivery pipeline.
AWS has a build server product named CodeBuild, which I'm a fan of, but there are many other similar cloud-based services, like Github Actions, CircleCI, etc.
There's no rule that these build servers need to have the same CPU architecture as your developers are using in their laptops and workstations. In fact they can instead be the same CPU as your production environment. Already today, AWS CodeBuild supports both x86 and ARM environments.
By switching your cloud-based build and test servers to use your production-target CPU architecture you will be able to validate your software on your target CPU before you deploy to production.
The delight is in the details
There's a key point in the previous section that is a huge help in this heterogeneous world, and that's being able to use the production CPU type when performing a pre-merge test.
Consider the following scenario. A developer using Python, working locally on an x86 laptop, adds a new library dependency to a project that is deployed to ARM in production. They build and test locally, and everything works fine. They push this change to a branch, and open a pull request (PR) . The “pre-merge” PR build server is configured to run on ARM, just like the production environment. The build server attempts to build and test this change, but the new library dependency has a native code component for which there is no ARM version available yet. Therefore the PR build fails, and the developer is informed.
Because we've used ARM for the pre-merge build server we've gained the following benefits:
- We've identified that there's a CPU architecture problem before deploying to production
- Furthermore, we've noticed the problem before merging the change to the source code mainline, and therefore no other developers have been impacted by this change
- The developer was able to get fast feedback of a CPU architecture-related problem, because they are working on their own automatically verified branch, without needing an ARM-based local machine
Get your house in order
OK, so when we switch our production to servers to ARM, we change our build and test servers to ARM, great. But what about today, how can we start to prepare for this move?
Most importantly you’ll want to consider making your workflow ready for when you switch to ARM. That means
- Use cloud-based build servers in your delivery pipeline if you don't already
- Setup automated PR pre-merge checks, also using cloud-based build servers
However, there's a valuable further step you can make: setup parallel, non-blocking “shadow” tests, that use ARM, even while you still use x86 in production.
In other words when you trigger your existing x86 based “Continuous Integration” build server, also trigger an ARM based server with the same build and test process. This ARM server shouldn't block work, publish production artifacts, or perhaps even impact developers at all at the moment. Instead it acts as a canary for “ARM readiness” of your projects. Using the results of these checks will help you prioritize as you migrate your workloads to ARM.
- The software industry is moving from x86 to ARM CPUs because of huge cost improvements, with comparable or better performance
- During this transition period we need to adapt our processes for heterogeneous environments where development and production occur on different CPU architectures
- To reduce risk we should be able to switch our pre-merge, “Continuous Integration”, and delivery pipeline environments to different CPU architectures, to match production
- Build services like AWS CodeBuild already support ARM environments today
- It's especially useful to run pre-merge automated tests on production-CPU build servers since this will allow developers to get feedback about possible x86 vs ARM problems before they merge their code with mainline, even without having to replace their own development machines
- You can get ahead of this shift by making sure your cloud-based automation is ready, and also start running parallel “shadow” tests with ARM to get early feedback about which of your projects are ARM-ready
Need help deciding whether your AWS-based delivery pipelines are ready for these changes? Then drop me a line at firstname.lastname@example.org .