Infrastructure as Code Tool Selection

At Digital Diagnostics, we operate a devops guild where all the great minds in infrastructure converge to decide the direction of the company’s computing foundations. It is here, where one of our very first orders of business was to discuss, and decide upon a universal framework where we would define our infrastructure-as-code (IaC).

Digital Diagnostics - Technology Options

For those not familiar with the concept of IaC, think of it like this: In much the same way you write software to automate manual or repetitive tasks, IaC is capturing all of your infrastructure, and its accompanying configuration as code to allow automated, and repeatable environment creation. Wikipedia is a great starting point if you would like to delve deeper into the subject.¹

 Of the IaC tools proposed, we collectively decided to investigate the use of four potential solutions: 

 

  • Terraform (standard): 
Terraform Logo
Terraform Logo

 

  • Terraform CDK: 
Terraform Cloud Logo
Terraform Cloud Logo
  • Pulumi: 
Pulumi Logo
Pulumi Logo
  • AWS CDK: 
AWS Cloud Development Kit Logo
AWS Cloud Development Kit Logo

 

Terraform (standard) 

Let’s begin by examining the advantages of standard Terraform. First, and foremost, this is a well-known open source tool among the industry. Terraform provides standalone binaries for the most popular mainstream operating systems as well. Computing resources are defined using a standardized template known as Hashicorp Configuration Language (HCL).  To prevent multiple users from altering the same infrastructure at once, and saving us from corruption is the built-in locking mechanism

Terraform supports multiple providers, along with a few local resources as well

There is also a fair bit of flexibility in terms of how you manage and store state data. State data being how Terraform reconciles how resources are defined in code, with what resources already exist.⁴

Now let’s perform a quick popularity check. At the time of this article, and according to Google, Terraform receives more interest than both Pulumi, and Cloudformation.⁵

So, with many of these benefits, what worked against us with regards to Terraform? The most obvious drawback is that Terraform is not in and of itself, a fully featured programming language. The other main consideration is that all our current infrastructure is already implemented in Pulumi. This would require a refactoring effort for some of our services.
 

Terraform CDK 

The next tool is an additional offering from Hashicorp on top of their standard Terraform tool.  Terraform CDK is Hashicorp’s answer to Pulumi with regards to support for higher level programming languages. Unfortunately for this tool, on the product’s own github page: “This experimental repository contains software which is still being developed and in the alpha testing stage. It is not ready for production use.”⁶ 

 

Pulumi 

Pulumi is the tool that was used to provision our current infrastructure. It is build on top of Terraform and supports various programming languages. Pulumi has native support for secrets management; something that lacks in Terraform, and even allows authorization policy to be defined in code as well.⁷

Unfortunately, Pulumi has a few important disadvantages too. Pulumi requires its users to be proficient with fully functional programming languages. Typically this skillset falls under a software developer’s realm, as opposed to infrastructure engineers who must be proficient in operating system, security, and networking concepts.⁸ Some infrastructure engineers do in fact having programming experience, but that tends to be a harder requirement to procure. Another significant drawback is merely the numbers game in terms of capacity to support the products. The entity officially supporting Pulumi is a much smaller company (< 50 employees) as compared to Amazon (~ 1,000,000), and Hashicorp (> 1000 employees).⁹,¹⁰ 

 

AWS CDK 

Amazon’s own CDK tool has some powerful advantages given the company’s size and tight integration with AWS in general. Like Pulumi, and Terraform CDK, this tool supports constructs in fully featured programming languages. This strength of Amazon is also its greatest weakness. AWS CDK only integrates with AWS, so vendor lock-in is an issue, along with a mismatch in use cases. Here at Digital Diagnostics, we have distributed infrastructure across multiple providers, including locally managed resources on premises. For this reason alone, it pretty much disqualifies itself upon initial assessment. 

 

Additional considerations in tool selection 

During our assessment, it was not sufficient to select a tool based solely on popularity, or a user experience that we personally liked as an organization. We are bound to a set of requirements imposed by FDA by virtue of being a technology company within the medical device space. Will the tool we select adapt itself well to our workflows as defined by regulation? On top of that consideration, do we have enough experience in-house with the tool we select? Does the tool align with our longerterm strategies, and projected hiring needs? What are the risks in adopting the tool, and are they acceptable to us? 

 

Democratic decision making 

Making collaborative decisions that stick in a fast paced team is harder than you might think. The livelihood and career path for software engineers are based on which technologies they gain experience and proficiency in. As you can imagine, the stakes are high for everyone! When approaching a subject that is technologically complex, and at times emotionally charged, the decision-making process should be highly structured. The reason being is that teams tend to make decisions based on what they know or have experience with rather than deciding on technology choices that are right for the business. Balancing out these various needs often means these types of decision making processes become more of an art, than a straight-forward science.  

In spite of our best efforts at clearly defined requirements, we need to be aware of bias and careful to eliminate bias. To add to the irony of it all, disagreements should be sought out, and embraced during this processFor it is in this phase of discovery that great ideas may be presented and scrutinized. Under ideal conditions, the best ideas make it to the top and get decided on. 

 

The engineering department here at Digital Diagnostics has chosen the following decisionmaking rubric on a per topic basis: 

   HARD NO    – You will not budge from that position, but you must specify the reason. 

   SOFT NO     – You don’t support it, but you could be convinced otherwise. 

   SUPPORT    – This is not your first choice (or you don’t have direct experience with it), but you will support the team’s decision. 

        YES          – You strongly believe this is an excellent choice for our organization, and it can fulfill all our current and future needs. 

 

Using these categories, the team collaborated in an online document in preparation of the larger discussion. Here’s a peek of what that looked like for our team: 

Software Options
Software Options

 

By asking pointed questions and requiring team members to take a position, it quickly becomes apparent where team members are aligned with one another and where team members are not on the same page. To convey a more visceral experience to the reader, imagine asking and answering the following questions among your peers: 

  • Why Terraform? 
  • Why not Terraform? 
  • Why Terraform CDK? 
  • Why not Terraform CDK? 
  • Why Pulumi? 
  • Why not Pulumi? 
  • Why AWS CDK? 
  • Why not AWS CDK? 
  • Are there any other needs we haven’t considered yet? 

 

As you can see in the table above and as you might imagine from the conversation revolving around that list of questions, certain team members took a position of absolutes. For situations of conflict like this one, Digital Diagnostics has adopted as one of its core behaviors: “Disagree, and commit”. The idea here is to clearly communicate your objection, but to ultimately support the decision arrived at consensus. The previous sections dedicated to each tool, and the final decision below is an amalgamation of every team member’s input in writing and during discussion in meetings. 

 

Final Decision 

Ultimately the dev-ops guild decided to adopt standard Terraform as our de facto infrastructure-as-code tool. Given the mindshare around the various tools, supported use cases, inhouse experience, and size of supporting entities, Terraform won out by a majority vote. 

 

References:

  1. https://en.wikipedia.org/wiki/Infrastructure_as_code
  2. https://www.terraform.io/docs/language/state/locking.html
  3. https://registry.terraform.io/browse/providers
  4. https://trends.google.com/trends/explore?date=all&geo=US&q=pulumi,terraform
  5. https://trends.google.com/trends/explore?date=all&geo=US&q=terraform,cloudformation
  6. https://github.com/hashicorp/terraform-cdk
  7. https://www.pulumi.com/docs/get-started/crossguard/
  8. https://www.linkedin.com/company/pulumi
  9. https://www.washingtonpost.com/technology/2020/10/29/amazon-hiring-pandemic-holidays/
  10. https://www.hashicorp.com/blog/our-newest-milestone-from-two-employees-to-1-000-and-still-growing

 

Disclaimer of Review:

Information contained in this post regarding any specific person, commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not constitute or imply its endorsement, recommendation, or favoring by Digital Diagnostics Inc. (“Digital Diagnostics”), its directors or employees.

It is your responsibility to verify and investigate providers, products and services. Please consult your own professional advisor for all advice concerning medical, legal or financial matters in connection with the services needed. Digital Diagnostics assumes no liability of any kind for the content of any information transmitted to or received by any person in connection with the person’s use of this post.

Please refer to privacy policy regarding all content here within.