aws_emr_cluster
Provides an Elastic MapReduce Cluster, a web service that makes it easy to process large amounts of data efficiently. See Amazon Elastic MapReduce Documentation for more information.
Example Usage
resource "aws_emr_cluster" "emr-test-cluster" {
name = "emr-test-arn"
release_label = "emr-4.6.0"
applications = ["Spark"]
ec2_attributes {
subnet_id = "${aws_subnet.main.id}"
emr_managed_master_security_group = "${aws_security_group.sg.id}"
emr_managed_slave_security_group = "${aws_security_group.sg.id}"
instance_profile = "${aws_iam_instance_profile.emr_profile.arn}"
}
master_instance_type = "m3.xlarge"
core_instance_type = "m3.xlarge"
core_instance_count = 1
tags {
role = "rolename"
env = "env"
}
bootstrap_action {
path = "s3://elasticmapreduce/bootstrap-actions/run-if"
name = "runif"
args = ["instance.isMaster=true", "echo running on master node"]
}
configurations = "test-fixtures/emr_configurations.json"
service_role = "${aws_iam_role.iam_emr_service_role.arn}"
}
The aws_emr_cluster resource typically requires two IAM roles, one for the EMR Cluster
to use as a service, and another to place on your Cluster Instances to interact
with AWS from those instances. The suggested role policy template for the EMR service is AmazonElasticMapReduceRole,
and AmazonElasticMapReduceforEC2Role for the EC2 profile. See the Getting
Started
guide for more information on these IAM roles. There is also a fully-bootable
example Terraform configuration at the bottom of this page.
Argument Reference
The following arguments are supported:
name- (Required) The name of the job flowrelease_label- (Required) The release label for the Amazon EMR releasemaster_instance_type- (Required) The EC2 instance type of the master nodecore_instance_type- (Optional) The EC2 instance type of the slave nodescore_instance_count- (Optional) number of Amazon EC2 instances used to execute the job flow. Default0log_uri- (Optional) S3 bucket to write the log files of the job flow. If a value is not provided, logs are not createdapplications- (Optional) A list of applications for the cluster. Valid values are:Hadoop,Hive,Mahout,Pig, andSpark.Case insensitiveec2_attributes- (Optional) attributes for the EC2 instances running the job flow. Defined belowbootstrap_action- (Optional) list of bootstrap actions that will be run before Hadoop is started on the cluster nodes. Defined belowconfigurations- (Optional) list of configurations supplied for the EMR cluster you are creatingservice_role- (Optional) IAM role that will be assumed by the Amazon EMR service to access AWS resourcesvisible_to_all_users- (Optional) Whether the job flow is visible to all IAM users of the AWS account associated with the job flow. Defaulttruetags- (Optional) list of tags to apply to the EMR Cluster
ec2_attributes
Attributes for the Amazon EC2 instances running the job flow
key_name- (Optional) Amazon EC2 key pair that can be used to ssh to the master node as the user calledhadoopsubnet_id- (Optional) VPC subnet id where you want the job flow to launch. Cannot specify thecc1.4xlargeinstance type for nodes of a job flow launched in a Amazon VPCadditional_master_security_groups- (Optional) list of additional Amazon EC2 security group IDs for the master nodeadditional_slave_security_groups- (Optional) list of additional Amazon EC2 security group IDs for the slave nodesemr_managed_master_security_group- (Optional) identifier of the Amazon EC2 security group for the master nodeemr_managed_slave_security_group- (Optional) identifier of the Amazon EC2 security group for the slave nodesinstance_profile- (Optional) Instance Profile for EC2 instances of the cluster assume this role
bootstrap_action
name- (Required) name of the bootstrap actionpath- (Required) location of the script to run during a bootstrap action. Can be either a location in Amazon S3 or on a local file systemargs- (Optional) list of command line arguments to pass to the bootstrap action script
Attributes Reference
The following attributes are exported:
id- The ID of the EMR Clusternamerelease_labelmaster_instance_typecore_instance_typecore_instance_countlog_uriapplicationsec2_attributesbootstrap_actionconfigurationsservice_rolevisible_to_all_userstags
Example bootable config
NOTE: This configuration demonstrates a minimal configuration needed to boot an example EMR Cluster. It is not meant to display best practices. Please use at your own risk.
provider "aws" {
region = "us-west-2"
}
resource "aws_emr_cluster" "tf-test-cluster" {
name = "emr-test-arn"
release_label = "emr-4.6.0"
applications = ["Spark"]
ec2_attributes {
subnet_id = "${aws_subnet.main.id}"
emr_managed_master_security_group = "${aws_security_group.allow_all.id}"
emr_managed_slave_security_group = "${aws_security_group.allow_all.id}"
instance_profile = "${aws_iam_instance_profile.emr_profile.arn}"
}
master_instance_type = "m3.xlarge"
core_instance_type = "m3.xlarge"
core_instance_count = 1
tags {
role = "rolename"
dns_zone = "env_zone"
env = "env"
name = "name-env"
}
bootstrap_action {
path = "s3://elasticmapreduce/bootstrap-actions/run-if"
name = "runif"
args = ["instance.isMaster=true", "echo running on master node"]
}
configurations = "test-fixtures/emr_configurations.json"
service_role = "${aws_iam_role.iam_emr_service_role.arn}"
}
resource "aws_security_group" "allow_all" {
name = "allow_all"
description = "Allow all inbound traffic"
vpc_id = "${aws_vpc.main.id}"
ingress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
depends_on = ["aws_subnet.main"]
lifecycle {
ignore_changes = ["ingress", "egress"]
}
tags {
name = "emr_test"
}
}
resource "aws_vpc" "main" {
cidr_block = "168.31.0.0/16"
enable_dns_hostnames = true
tags {
name = "emr_test"
}
}
resource "aws_subnet" "main" {
vpc_id = "${aws_vpc.main.id}"
cidr_block = "168.31.0.0/20"
tags {
name = "emr_test"
}
}
resource "aws_internet_gateway" "gw" {
vpc_id = "${aws_vpc.main.id}"
}
resource "aws_route_table" "r" {
vpc_id = "${aws_vpc.main.id}"
route {
cidr_block = "0.0.0.0/0"
gateway_id = "${aws_internet_gateway.gw.id}"
}
}
resource "aws_main_route_table_association" "a" {
vpc_id = "${aws_vpc.main.id}"
route_table_id = "${aws_route_table.r.id}"
}
###
# IAM Role setups
###
# IAM role for EMR Service
resource "aws_iam_role" "iam_emr_service_role" {
name = "iam_emr_service_role"
assume_role_policy = <<EOF
{
"Version": "2008-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": "elasticmapreduce.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
EOF
}
resource "aws_iam_role_policy" "iam_emr_service_policy" {
name = "iam_emr_service_policy"
role = "${aws_iam_role.iam_emr_service_role.id}"
policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Resource": "*",
"Action": [
"ec2:AuthorizeSecurityGroupEgress",
"ec2:AuthorizeSecurityGroupIngress",
"ec2:CancelSpotInstanceRequests",
"ec2:CreateNetworkInterface",
"ec2:CreateSecurityGroup",
"ec2:CreateTags",
"ec2:DeleteNetworkInterface",
"ec2:DeleteSecurityGroup",
"ec2:DeleteTags",
"ec2:DescribeAvailabilityZones",
"ec2:DescribeAccountAttributes",
"ec2:DescribeDhcpOptions",
"ec2:DescribeInstanceStatus",
"ec2:DescribeInstances",
"ec2:DescribeKeyPairs",
"ec2:DescribeNetworkAcls",
"ec2:DescribeNetworkInterfaces",
"ec2:DescribePrefixLists",
"ec2:DescribeRouteTables",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSpotInstanceRequests",
"ec2:DescribeSpotPriceHistory",
"ec2:DescribeSubnets",
"ec2:DescribeVpcAttribute",
"ec2:DescribeVpcEndpoints",
"ec2:DescribeVpcEndpointServices",
"ec2:DescribeVpcs",
"ec2:DetachNetworkInterface",
"ec2:ModifyImageAttribute",
"ec2:ModifyInstanceAttribute",
"ec2:RequestSpotInstances",
"ec2:RevokeSecurityGroupEgress",
"ec2:RunInstances",
"ec2:TerminateInstances",
"ec2:DeleteVolume",
"ec2:DescribeVolumeStatus",
"ec2:DescribeVolumes",
"ec2:DetachVolume",
"iam:GetRole",
"iam:GetRolePolicy",
"iam:ListInstanceProfiles",
"iam:ListRolePolicies",
"iam:PassRole",
"s3:CreateBucket",
"s3:Get*",
"s3:List*",
"sdb:BatchPutAttributes",
"sdb:Select",
"sqs:CreateQueue",
"sqs:Delete*",
"sqs:GetQueue*",
"sqs:PurgeQueue",
"sqs:ReceiveMessage"
]
}]
}
EOF
}
# IAM Role for EC2 Instance Profile
resource "aws_iam_role" "iam_emr_profile_role" {
name = "iam_emr_profile_role"
assume_role_policy = <<EOF
{
"Version": "2008-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
EOF
}
resource "aws_iam_instance_profile" "emr_profile" {
name = "emr_profile"
roles = ["${aws_iam_role.iam_emr_profile_role.name}"]
}
resource "aws_iam_role_policy" "iam_emr_profile_policy" {
name = "iam_emr_profile_policy"
role = "${aws_iam_role.iam_emr_profile_role.id}"
policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Resource": "*",
"Action": [
"cloudwatch:*",
"dynamodb:*",
"ec2:Describe*",
"elasticmapreduce:Describe*",
"elasticmapreduce:ListBootstrapActions",
"elasticmapreduce:ListClusters",
"elasticmapreduce:ListInstanceGroups",
"elasticmapreduce:ListInstances",
"elasticmapreduce:ListSteps",
"kinesis:CreateStream",
"kinesis:DeleteStream",
"kinesis:DescribeStream",
"kinesis:GetRecords",
"kinesis:GetShardIterator",
"kinesis:MergeShards",
"kinesis:PutRecord",
"kinesis:SplitShard",
"rds:Describe*",
"s3:*",
"sdb:*",
"sns:*",
"sqs:*"
]
}]
}
EOF
}
See the source of this document at Terraform.io