Provisioning AWS Instances Using Terraform Modules

- By Devashish Meena on October 02, 2018

In the previous post, we walked through the steps to provision an AWS Network using Terraform Modules. If you missed the post, here's the link to it: Provisioning AWS Network using Terraform Modules and the full code is available here: https://github.com/ric03uec/prov_aws_vpc_terraform.

In this post, we'll provision a few more components using Terraform modules. Additionally, we'll use the generated state file from the previous post as an input data source in this workflow. So it is highly recommended that you run that code first.

Since Terraform makes it super easy to create and destroy infrastructure, you should be able to spin up all components for testing and destroy them when done. I've probably done this at least ten times to test the code for this blog !!!

Scenario

For this post, we'll do the following,

- Use the network state file from the previous post as a source to get VPC, Subnet, NAT and other information

- Provision 1 instance in public subnet and associate a Security Group with public access with it

- Provision 1 instance in each private subnets and associate a Security Group with internal-only access with them

- Test SSH access to each instance with correct keys

- TFD(Terraform destroy) the whole thing and, get a coffee

The instances will use the existing network topology. Visually, our setup will look like this:

aws_instances-2

 

aws_vpc

For the impatient, the full source is available here: https://github.com/ric03uec/prov_aws_ec2_terraform. For the project to work out of the box, it needs to be cloned in the same directory as prov_aws_vpc_terraform project.

Data Sources

We'll use Terraform Data Sources to import information from a different state file and use it in the current workflow. Data sources make it really easy to consume the output of a separate Terraform setup. The output can be stored locally or on one of the supported Backends. This helps to break down components with clearly defined boundaries.

The first project, prov_aws_vpc_terraform, is responsible for ensuring correct provisioning of network components. Irrespective of the way the code is structured in that project, it needs to expose the variables defined in its output.tf file. Any dependent projects can then import the state file to consume the exposed variables and further enhance the infrastructure.

Lets walk through the main.tf file in project root. We'll first import the state file, since all the network related variables will be fetched from it. The following snippet reads the file and loads it in terraform_remote_state.network variable.  

###############################################################################
# Load Network data from remote statefile
###############################################################################
data "terraform_remote_state" "network" {
  backend = "local"

  config = {
    path = "${path.module}/../prov_aws_vpc_terraform/terraform.tfstate"
  }Issues
}

 

Public Instances

Next, we'll get to the module to provision instances in public subnet. We are abstracting out the variables that are needed to execute the module. This method will come in handy if we want to swap out the current data source with something else like S3 or etcd or consul without changing the implementation of the module itself. The following snippet initializes the modules by passing required variables.

module "public_instances" {
  source = "./instances-public"

  pub_sn        = "${data.terraform_remote_state.network.public_subnet}"
  pub_sn_az     = "${data.terraform_remote_state.network.public_subnet_az}"
  pub_sn_access = "${data.terraform_remote_state.network.public_subnet_access}"
  pub_sn_id     = "${data.terraform_remote_state.network.public_subnet_id}"
  pub_sg_id     = "${data.terraform_remote_state.network.public_security_group_id}"
}

Inside the module, we're just provisioning an EC2 Instance in the public subnet and associating a public IP address to it. The AMI ID and instance type are part of the variables.tf file for the module but these can be moved either to the project root or to a separate project that just holds the VPC-wide variables information.

 

Private Instances

For the sake of this simplicity, we've used a single module to provision instances in both private subnets. For these too, the AMI ID and instance type are part of variables.tf file but can be moved around based on the requirements.

Private instances should only be accessible from the bastion host using their respective private key files. The access restrictions are enforced using two methods. First, only SSH traffic (TCP, port 22) is allowed into the two private subnets from bastion host and second, each subnet has a different private key. This makes it easier for the admin(s) to revoke or rotate keys in one subnet without affecting the other. It also enables them to set different rules each of these subnets. A typical use case can be that one subnet serves as the data-tier with components like DB, cache(redis/memcached), Vault and file storage. Other subnet can house all the internal services like report processing, image compression, analytics, log aggregation etc. 

 

Access Control

As mentioned in the previous post, only the NAT instance(bastion host) will be accessible over the Internet for SSH. To log into machines in public or private subnets, we'll need to make an additional hop via this NAT instance. To do this, we're generating following SSH Keypairs

- Bastion Host access key: Once the VPC provisioning is complete, a pem keyfile will be placed in id_rsa_bastion.pem file on the local machine. This will be used to SSH into the bastion host.

- Public Subnet access key: This is automatically copied to /home/ec2-user/.ssh/id_rsa_<public_subnet_name> file on the bastion host and will be used to log into instances in public subnet.

- Private Subnet access key: The private keys for SSH access to both subnets are also automatically copied to /home/ec2-user/.ssh/id_rsa_<private_subnet_name> files on the bastion host.

 

Verify

Lets verify our setup is working as designed.

- I've run all the commands mentioned here to bring up the three instances. The previous project already created the NAT Instance, subnets, VPC and security groups.

Screenshot_2018-10-01_17-06-38 

- On my local machine, there's a file called id_rsa_bastion.pem in project root. Running  ssh -i id_rsa_bastion.pem ec2-user@<bastion-host-public-ip successfully drops me into the bastion host. So far so good. The /home/ec2-user/.ssh folder on the host should contain all the three private keys that we'll use to ssh into respective subnets.

- Next, I'll try to log into the instance pub_peru_0 which is in public subnet. Running ssh -i .ssh/id_rsa_pub_peru ubuntu@<pub_peru_0_IP> logs me into the instance correctly. Sweet ! 

- Lets log out of the public instance try to log into one of the private instances now. Running ssh -i .ssh/id_rsa_pri_cali ubuntu@<pri_elnino_0_IP> gives, well, Permission denied (publickey). error.  That should be expected since we used the wrong private key here. Using the correct key works and we're able to ssh into the private instance successfully.

Conclusion

We've used the constructs provided by Terraform to incrementally enhance our infrastructure and add some instances to it. We've also tried to make the components as secure as possible from permissions and external access point of view. But the questions we asked in the previous post still remain.

- Where is the state file stored?

- Whose Access/Secret keys are being used to run terraform commands?

- How do we make sure everyone runs same terraform version?

- What's the guarantee that terraform commands runs on the same platform (OS, kernel, bash etc) every single time?

- Who ran this, when and why?

We keep asking these questions because the answers to these will dictate how the same codebase will scale from a one person effort to a team of any size. If you've already answered these for your organization or team(s), we'd love to hear about it. 

 

Edit: This post sparked insightful conversations on the devOps subreddit thread. You should definitely take some time to go through that to understand various abstraction strategies used by different Orgs: https://www.reddit.com/r/devops/comments/9l346b/how_do_you_use_terraform_modules_for_instance/  

Edit 2: For the sake of simplicity, we'll keep this post and respective code as-is. A lot of comments questioned the usage of modules in this fashion. Although we agree with most of the comments, the main purpose of this post was to demonstrate the use of modules and Terraform statefile sharing, which it fulfills.

Topics: terraform, Automation, devops