Amazon EKS Workers
Overview
This service contains Terraform and Packer code to deploy a production-grade EC2 server cluster as workers for Elastic Kubernetes Service (EKS) on AWS.
EKS architecture
Features
Deploy self-managed worker nodes in an Auto Scaling Group
Deploy managed workers nodes in a Managed Node Group
Zero-downtime, rolling deployment for updating worker nodes
Auto scaling and auto healing
For Nodes:
- Server-hardening with fail2ban, ip-lockdown, auto-update, and more
- Manage SSH access via IAM groups via ssh-grunt
- CloudWatch log aggregation
- CloudWatch metrics and alerts
Learn
note
This repo is a part of the Gruntwork Service Catalog, a collection of reusable, battle-tested, production ready infrastructure code. If you’ve never used the Service Catalog before, make sure to read How to use the Gruntwork Service Catalog!
Under the hood, this is all implemented using Terraform modules from the Gruntwork terraform-aws-eks repo. If you are a subscriber and don’t have access to this repo, email support@gruntwork.io.
Core concepts
To understand core concepts like what is Kubernetes, the different worker types, how to authenticate to Kubernetes, and more, see the documentation in the terraform-aws-eks repo.
Repo organization
- modules: the main implementation code for this repo, broken down into multiple standalone, orthogonal submodules.
- examples: This folder contains working examples of how to use the submodules.
- test: Automated tests for the modules and examples.
Deploy
Non-production deployment (quick start for learning)
If you just want to try this repo out for experimenting and learning, check out the following resources:
- examples/for-learning-and-testing folder: The
examples/for-learning-and-testing
folder contains standalone sample code optimized for learning, experimenting, and testing (but not direct production usage).
Production deployment
If you want to deploy this repo in production, check out the following resources:
examples/for-production folder: The
examples/for-production
folder contains sample code optimized for direct usage in production. This is code from the Gruntwork Reference Architecture, and it shows you how we build an end-to-end, integrated tech stack on top of the Gruntwork Service Catalog.How to deploy a production-grade Kubernetes cluster on AWS: A step-by-step guide for deploying a production-grade EKS cluster on AWS using the code in this repo.
Manage
For information on registering the worker IAM role to the EKS control plane, refer to the IAM Roles and Kubernetes API Access section of the documentation.
For information on how to perform a blue-green deployment of the worker pools, refer to the How do I perform a blue green release to roll out new versions of the module section of the documentation.
For information on how to manage your EKS cluster, including how to deploy Pods on Fargate, how to associate IAM roles to Pod, how to upgrade your EKS cluster, and more, see the documentation in the terraform-aws-eks repo.
Reference
- Inputs
- Outputs
Required
autoscaling_group_configurations
any(required)Configure one or more self-managed Auto Scaling Groups (ASGs) to manage the EC2 instances in this cluster. Set to empty object ({}) if you do not wish to configure self-managed ASGs.
cluster_instance_ami
string(required)The AMI to run on each instance in the EKS cluster. You can build the AMI using the Packer template eks-node-al2.json. One of cluster_instance_ami
or cluster_instance_ami_filters
is required. Only used if cluster_instance_ami_filters
is null. Set to null if cluster_instance_ami_filters is set.
cluster_instance_ami_filters
object(required)Properties on the AMI that can be used to lookup a prebuilt AMI for use with self managed workers. You can build the AMI using the Packer template eks-node-al2.json. One of cluster_instance_ami
or cluster_instance_ami_filters
is required. If both are defined, cluster_instance_ami_filters
will be used. Set to null if cluster_instance_ami is set.
object({
# List of owners to limit the search. Set to null if you do not wish to limit the search by AMI owners.
owners = list(string)
# Name/Value pairs to filter the AMI off of. There are several valid keys, for a full reference, check out the
# documentation for describe-images in the AWS CLI reference
# (https://docs.aws.amazon.com/cli/latest/reference/ec2/describe-images.html).
filters = list(object({
name = string
values = list(string)
}))
})
eks_cluster_name
string(required)The name of the EKS cluster. The cluster must exist/already be deployed.
managed_node_group_configurations
any(required)Configure one or more Node Groups to manage the EC2 instances in this cluster. Set to empty object ({}) if you do not wish to configure managed node groups.
Optional
additional_security_groups_for_workers
list(optional)A list of additional security group IDs to be attached on worker groups.
list(string)
[]
alarms_sns_topic_arn
list(optional)The ARNs of SNS topics where CloudWatch alarms (e.g., for CPU, memory, and disk space usage) should send notifications.
list(string)
[]
allow_inbound_ssh_from_cidr_blocks
list(optional)The list of CIDR blocks to allow inbound SSH access to the worker groups.
list(string)
[]
allow_inbound_ssh_from_security_groups
list(optional)The list of security group IDs to allow inbound SSH access to the worker groups.
list(string)
[]
asg_custom_iam_role_name
string(optional)Custom name for the IAM role for the Self-managed workers. When null, a default name based on worker_name_prefix will be used. One of asg_custom_iam_role_name and asg_iam_role_arn is required (must be non-null) if asg_iam_role_already_exists is true.
null
asg_default_enable_detailed_monitoring
bool(optional)Default value for enable_detailed_monitoring field of autoscaling_group_configurations.
true
asg_default_instance_root_volume_encryption
bool(optional)Default value for the asg_instance_root_volume_encryption field of autoscaling_group_configurations. Any map entry that does not specify asg_instance_root_volume_encryption will use this value.
true
asg_default_instance_root_volume_iops
number(optional)Default value for the asg_instance_root_volume_iops field of autoscaling_group_configurations. Any map entry that does not specify asg_instance_root_volume_iops will use this value.
null
asg_default_instance_root_volume_size
number(optional)Default value for the asg_instance_root_volume_size field of autoscaling_group_configurations. Any map entry that does not specify asg_instance_root_volume_size will use this value.
40
asg_default_instance_root_volume_throughput
number(optional)Default value for the asg_instance_root_volume_throughput field of autoscaling_group_configurations. Any map entry that does not specify asg_instance_root_volume_throughput will use this value.
null
asg_default_instance_root_volume_type
string(optional)Default value for the asg_instance_root_volume_type field of autoscaling_group_configurations. Any map entry that does not specify asg_instance_root_volume_type will use this value.
standard
asg_default_instance_type
string(optional)Default value for the asg_instance_type field of autoscaling_group_configurations. Any map entry that does not specify asg_instance_type will use this value.
t3.medium
asg_default_max_pods_allowed
number(optional)Default value for the max_pods_allowed field of autoscaling_group_configurations. Any map entry that does not specify max_pods_allowed will use this value.
null
asg_default_max_size
number(optional)Default value for the max_size field of autoscaling_group_configurations. Any map entry that does not specify max_size will use this value.
2
asg_default_min_size
number(optional)Default value for the min_size field of autoscaling_group_configurations. Any map entry that does not specify min_size will use this value.
1
asg_default_multi_instance_overrides
any(optional)Default value for the multi_instance_overrides field of autoscaling_group_configurations. Any map entry that does not specify multi_instance_overrides will use this value.
[]
asg_default_on_demand_allocation_strategy
string(optional)Default value for the on_demand_allocation_strategy field of autoscaling_group_configurations. Any map entry that does not specify on_demand_allocation_strategy will use this value.
null
asg_default_on_demand_base_capacity
number(optional)Default value for the on_demand_base_capacity field of autoscaling_group_configurations. Any map entry that does not specify on_demand_base_capacity will use this value.
null
asg_default_on_demand_percentage_above_base_capacity
number(optional)Default value for the on_demand_percentage_above_base_capacity field of autoscaling_group_configurations. Any map entry that does not specify on_demand_percentage_above_base_capacity will use this value.
null
asg_default_spot_allocation_strategy
string(optional)Default value for the spot_allocation_strategy field of autoscaling_group_configurations. Any map entry that does not specify spot_allocation_strategy will use this value.
null
asg_default_spot_instance_pools
number(optional)Default value for the spot_instance_pools field of autoscaling_group_configurations. Any map entry that does not specify spot_instance_pools will use this value.
null
asg_default_spot_max_price
string(optional)Default value for the spot_max_price field of autoscaling_group_configurations. Any map entry that does not specify spot_max_price will use this value. Set to empty string (default) to mean on-demand price.
null
asg_default_tags
list(optional)Default value for the tags field of autoscaling_group_configurations. Any map entry that does not specify tags will use this value.
list(object({
key = string
value = string
propagate_at_launch = bool
}))
[]
asg_default_use_multi_instances_policy
bool(optional)Default value for the use_multi_instances_policy field of autoscaling_group_configurations. Any map entry that does not specify use_multi_instances_policy will use this value.
false
asg_iam_instance_profile_name
string(optional)Custom name for the IAM instance profile for the Self-managed workers. When null, the IAM role name will be used. If asg_use_resource_name_prefix
is true, this will be used as a name prefix.
null
asg_iam_role_already_exists
bool(optional)Whether or not the IAM role used for the Self-managed workers already exists. When false, this module will create a new IAM role.
false
asg_iam_role_arn
string(optional)ARN of the IAM role to use if iam_role_already_exists = true. When null, uses asg_custom_iam_role_name to lookup the ARN. One of asg_custom_iam_role_name and asg_iam_role_arn is required (must be non-null) if asg_iam_role_already_exists is true.
null
asg_security_group_tags
map(optional)A map of tags to apply to the Security Group of the ASG for the self managed worker pool. The key is the tag name and the value is the tag value.
map(string)
{}
asg_use_resource_name_prefix
bool(optional)When true, all the relevant resources for self managed workers will be set to use the name_prefix attribute so that unique names are generated for them. This allows those resources to support recreation through create_before_destroy lifecycle rules. Set to false if you were using any version before 0.65.0 and wish to avoid recreating the entire worker pool on your cluster.
true
autoscaling_group_include_autoscaler_discovery_tags
bool(optional)Adds additional tags to each ASG that allow a cluster autoscaler to auto-discover them. Only used for self-managed workers.
true
aws_auth_merger_namespace
string(optional)Namespace where the AWS Auth Merger is deployed. If configured, the worker IAM role will be mapped to the Kubernetes RBAC group for Nodes using a ConfigMap in the auth merger namespace.
null
cloud_init_parts
map(optional)Cloud init scripts to run on the EKS worker nodes when it is booting. See the part blocks in https://www.terraform.io/docs/providers/template/d/cloudinit_config.html for syntax. To override the default boot script installed as part of the module, use the key default
.
map(object({
# A filename to report in the header for the part. Should be unique across all cloud-init parts.
filename = string
# A MIME-style content type to report in the header for the part. For example, use "text/x-shellscript" for a shell
# script.
content_type = string
# The contents of the boot script to be called. This should be the full text of the script as a raw string.
content = string
}))
{}
cluster_instance_associate_public_ip_address
bool(optional)Whether or not to associate a public IP address to the instances of the self managed ASGs. Will only work if the instances are launched in a public subnet.
false
cluster_instance_keypair_name
string(optional)The name of the Key Pair that can be used to SSH to each instance in the EKS cluster.
null
custom_egress_security_group_rules
map(optional)A map of unique identifiers to egress security group rules to attach to the worker groups.
map(object({
# The network ports and protocol (tcp, udp, all) for which the security group rule applies to.
from_port = number
to_port = number
protocol = string
# The target of the traffic. Only one of the following can be defined; the others must be configured to null.
target_security_group_id = string # The ID of the security group to which the traffic goes to.
cidr_blocks = list(string) # The list of IP CIDR blocks to which the traffic goes to.
}))
{}
custom_ingress_security_group_rules
map(optional)A map of unique identifiers to ingress security group rules to attach to the worker groups.
map(object({
# The network ports and protocol (tcp, udp, all) for which the security group rule applies to.
from_port = number
to_port = number
protocol = string
# The source of the traffic. Only one of the following can be defined; the others must be configured to null.
source_security_group_id = string # The ID of the security group from which the traffic originates from.
cidr_blocks = list(string) # The list of IP CIDR blocks from which the traffic originates from.
}))
{}
dashboard_cpu_usage_widget_parameters
object(optional)Parameters for the worker cpu usage widget to output for use in a CloudWatch dashboard.
object({
# The period in seconds for metrics to sample across.
period = number
# The width and height of the widget in grid units in a 24 column grid. E.g., a value of 12 will take up half the
# space.
width = number
height = number
})
{
height = 6,
period = 60,
width = 8
}
dashboard_disk_usage_widget_parameters
object(optional)Parameters for the worker disk usage widget to output for use in a CloudWatch dashboard.
object({
# The period in seconds for metrics to sample across.
period = number
# The width and height of the widget in grid units in a 24 column grid. E.g., a value of 12 will take up half the
# space.
width = number
height = number
})
{
height = 6,
period = 60,
width = 8
}
dashboard_memory_usage_widget_parameters
object(optional)Parameters for the worker memory usage widget to output for use in a CloudWatch dashboard.
object({
# The period in seconds for metrics to sample across.
period = number
# The width and height of the widget in grid units in a 24 column grid. E.g., a value of 12 will take up half the
# space.
width = number
height = number
})
{
height = 6,
period = 60,
width = 8
}
enable_cloudwatch_alarms
bool(optional)Set to true to enable several basic CloudWatch alarms around CPU usage, memory usage, and disk space usage. If set to true, make sure to specify SNS topics to send notifications to using alarms_sns_topic_arn
.
true
enable_cloudwatch_metrics
bool(optional)Set to true to add IAM permissions to send custom metrics to CloudWatch. This is useful in combination with https://github.com/gruntwork-io/terraform-aws-monitoring/tree/master/modules/agents/cloudwatch-agent to get memory and disk metrics in CloudWatch for your Bastion host.
true
enable_fail2ban
bool(optional)Enable fail2ban to block brute force log in attempts. Defaults to true.
true
external_account_ssh_grunt_role_arn
string(optional)If you are using ssh-grunt and your IAM users / groups are defined in a separate AWS account, you can use this variable to specify the ARN of an IAM role that ssh-grunt can assume to retrieve IAM group and public SSH key info from that account. To omit this variable, set it to an empty string (do NOT use null, or Terraform will complain).
""
managed_node_group_custom_iam_role_name
string(optional)Custom name for the IAM role for the Managed Node Groups. When null, a default name based on worker_name_prefix will be used. One of managed_node_group_custom_iam_role_name and managed_node_group_iam_role_arn is required (must be non-null) if managed_node_group_iam_role_already_exists is true.
null
managed_node_group_iam_role_already_exists
bool(optional)Whether or not the IAM role used for the Managed Node Group workers already exists. When false, this module will create a new IAM role.
false
managed_node_group_iam_role_arn
string(optional)ARN of the IAM role to use if iam_role_already_exists = true. When null, uses managed_node_group_custom_iam_role_name to lookup the ARN. One of managed_node_group_custom_iam_role_name and managed_node_group_iam_role_arn is required (must be non-null) if managed_node_group_iam_role_already_exists is true.
null
node_group_default_capacity_type
string(optional)Default value for capacity_type field of managed_node_group_configurations.
ON_DEMAND
node_group_default_desired_size
number(optional)Default value for desired_size field of managed_node_group_configurations.
1
node_group_default_enable_detailed_monitoring
bool(optional)Default value for enable_detailed_monitoring field of managed_node_group_configurations.
true
node_group_default_instance_root_volume_encryption
bool(optional)Default value for the instance_root_volume_encryption field of managed_node_group_configurations.
true
node_group_default_instance_root_volume_size
number(optional)Default value for the instance_root_volume_size field of managed_node_group_configurations.
40
node_group_default_instance_root_volume_type
string(optional)Default value for the instance_root_volume_type field of managed_node_group_configurations.
gp3
node_group_default_instance_types
list(optional)Default value for instance_types field of managed_node_group_configurations.
list(string)
null
node_group_default_labels
map(optional)Default value for labels field of managed_node_group_configurations. Unlike common_labels which will always be merged in, these labels are only used if the labels field is omitted from the configuration.
map(string)
{}
node_group_default_max_pods_allowed
number(optional)Default value for the max_pods_allowed field of managed_node_group_configurations. Any map entry that does not specify max_pods_allowed will use this value.
null
node_group_default_max_size
number(optional)Default value for max_size field of managed_node_group_configurations.
1
node_group_default_min_size
number(optional)Default value for min_size field of managed_node_group_configurations.
1
node_group_default_subnet_ids
list(optional)Default value for subnet_ids field of managed_node_group_configurations.
list(string)
null
node_group_default_tags
map(optional)Default value for tags field of managed_node_group_configurations. Unlike common_tags which will always be merged in, these tags are only used if the tags field is omitted from the configuration.
map(string)
{}
node_group_launch_template_instance_type
string(optional)The instance type to configure in the launch template. This value will be used when the instance_types field is set to null (NOT omitted, in which case node_group_default_instance_types
will be used).
null
node_group_names
list(optional)The names of the node groups. When null, this value is automatically calculated from the managed_node_group_configurations map. This variable must be set if any of the values of the managed_node_group_configurations map depends on a resource that is not available at plan time to work around terraform limitations with for_each.
list(string)
null
node_group_security_group_tags
map(optional)A map of tags to apply to the Security Group of the ASG for the managed node group pool. The key is the tag name and the value is the tag value.
map(string)
{}
ssh_grunt_iam_group
string(optional)If you are using ssh-grunt, this is the name of the IAM group from which users will be allowed to SSH to the EKS workers. To omit this variable, set it to an empty string (do NOT use null, or Terraform will complain).
ssh-grunt-users
ssh_grunt_iam_group_sudo
string(optional)If you are using ssh-grunt, this is the name of the IAM group from which users will be allowed to SSH to the EKS workers with sudo permissions. To omit this variable, set it to an empty string (do NOT use null, or Terraform will complain).
ssh-grunt-sudo-users
tenancy
string(optional)The tenancy of the servers in the self-managed worker ASG. Must be one of: default, dedicated, or host.
default
use_exec_plugin_for_auth
bool(optional)If this variable is set to true, then use an exec-based plugin to authenticate and fetch tokens for EKS. This is useful because EKS clusters use short-lived authentication tokens that can expire in the middle of an 'apply' or 'destroy', and since the native Kubernetes provider in Terraform doesn't have a way to fetch up-to-date tokens, we recommend using an exec-based provider as a workaround. Use the use_kubergrunt_to_fetch_token input variable to control whether kubergrunt or aws is used to fetch tokens.
true
use_kubergrunt_to_fetch_token
bool(optional)EKS clusters use short-lived authentication tokens that can expire in the middle of an 'apply' or 'destroy'. To avoid this issue, we use an exec-based plugin to fetch an up-to-date token. If this variable is set to true, we'll use kubergrunt to fetch the token (in which case, kubergrunt must be installed and on PATH); if this variable is set to false, we'll use the aws CLI to fetch the token (in which case, aws must be installed and on PATH). Note this functionality is only enabled if use_exec_plugin_for_auth is set to true.
true
use_managed_iam_policies
bool(optional)When true, all IAM policies will be managed as dedicated policies rather than inline policies attached to the IAM roles. Dedicated managed policies are friendlier to automated policy checkers, which may scan a single resource for findings. As such, it is important to avoid inline policies when targeting compliance with various security standards.
true
use_prefix_mode_to_calculate_max_pods
bool(optional)When true, assumes prefix delegation mode is in use for the AWS VPC CNI component of the EKS cluster when computing max pods allowed on the node. In prefix delegation mode, each ENI will be allocated 16 IP addresses (/28) instead of 1, allowing you to pack more Pods per node.
false
worker_k8s_role_mapping_name
string(optional)Name of the IAM role to Kubernetes RBAC group mapping ConfigMap. Only used if aws_auth_merger_namespace is not null.
eks-cluster-worker-iam-mapping
worker_name_prefix
string(optional)Prefix EKS worker resource names with this string. When you have multiple worker groups for the cluster, you can use this to namespace the resources. Defaults to empty string so that resource names are not excessively long by default.
""
Map of Node Group names to ARNs of the created EKS Node Groups.
The ARN of the IAM role associated with the Managed Node Group EKS workers.
The name of the IAM role associated with the Managed Node Group EKS workers.
Map of Node Group names to Auto Scaling Group security group IDs. Empty if cluster_instance_keypair_name
is not set.
The ID of the common AWS Security Group associated with all the managed EKS workers.
A CloudWatch Dashboard widget that graphs CPU usage (percentage) of the Managed Node Group EKS workers.
A CloudWatch Dashboard widget that graphs disk usage (percentage) of the Managed Node Group EKS workers.
A CloudWatch Dashboard widget that graphs memory usage (percentage) of the Managed Node Group EKS workers.
A CloudWatch Dashboard widget that graphs CPU usage (percentage) of the self-managed EKS workers.
A CloudWatch Dashboard widget that graphs disk usage (percentage) of the self-managed EKS workers.
A CloudWatch Dashboard widget that graphs memory usage (percentage) of the self-managed EKS workers.
The ARN of the IAM role associated with the self-managed EKS workers.
The name of the IAM role associated with the self-managed EKS workers.
The ID of the AWS Security Group associated with the self-managed EKS workers.
The list of names of the ASGs that were deployed to act as EKS workers.