We want to take a deeper dive into CloudFormation stacks. CloudFormation stacks are a great tool for developers to provision AWS resource in a structured, repeatable way that also has the added benefit of making updates and teardowns far more reliable than if you were to do it with the individual resource APIs. It is the recommended approach for teams to utilize these stacks for the majority of their workloads and easily integrates into CI/CD pipelines.
Being so powerful, the complexity of CloudFormation templates can quickly become overwhelming and shortcuts or mistakes can occur. One common solution to this problem is to define a set of rules for the stack resources to be evaluated against. These rules can be generically defined, or specific to a particular teams needs. For example, a common issue that teams face is S3 buckets being incorrectly exposed. A rule may be defined that prevents this from occuring or notifies security teams.
Pre-Deploy vs. Post-Deploy Analysis
There are two discinct approaches to performing CloudFormation analysis; pre-deploy and post-deploy. Pre-deploy analysis reviews the content of the templates before they are created or updated whereas post-deploy analysis will look at the resultant state of the resources created/updated by the CloudFormation action.
Pre-deploy analysis will catch problems before they have a chance to manifest themselves in the environment. It is a more security-concious approach but has the drawback of being significantly more difficult to predict or simulate the result.
Post-deploy has a much clearer picture of the state of resources and the result of the stacks actions, however some damage may already have been done the moment the resources are placed in this state. Amazon GuardDuty is a service which will alert on an Amazon-managed pre-defined set of rules on all resources within your account and is an example of a post-deploy analysis and alerting tool.
Validating templates before deployment
Let's discuss how a pre-deploy tool might work. The following example is written in Python 3:
This is a very rudamentary evaluator that looks for all primitive types (strings, integers, booleans) within the template and evaluates against the ruleset.
Consider the following template:
A rule that prevents S3 buckets from being publicly exposed may choose to interogate the AccessControl property of any AWS : : S3 : : Bucket resource for a public ACL and alert or deny based on that. This is how the majority of pre-deployment analysis pipelines work. Things can get tricky though when you involve the CloudFormation intrinsic functions, like Ref. Now consider the following template:
You'll quickly notice that even if a tool were to iterate through all properties in every Map and List, they would never find the "PublicRead" keyword intact. It's very common to join strings, or reference mappings in templates so a string-matching approach would be fairly ineffective.
CloudFormation resource specification
AWS produces a JSON-formatted file called the AWS CloudFormation Resource Specification. This file is a formal definition of all the possible resource types that CloudFormation can process. It includes all resources, their properties and information about those fields such as whether or not they need recreation when they are modified.
We can use this file to evaluate the properties for each resource and apply rulesets directly to individual properties, rather than the template as a whole. With logic around the processing of the intrinsic functions, we have created an open-source CloudFormation template simulator that is easily deployable in any environment.
The template simulator can be found at https://github.com/kablamo/cfn-simulator.
Now consider the following template:
The above template will actually evaluate to produce a public S3 bucket. This is because the S3 buckets "AccessControl" property uses characters from the "TopicName" attribute of the SNS topics. The format of these attributes is not formally documented in the CloudFormation resource specification, nor anywhere else. This means that there is currently no effective way to truly verify that resources with the intrinsic functions "Ref" or "Fn::GetAtt" are truly valid against the defined ruleset.