Skip to main content

google_dataproc_cluster resource

Syntax

A google_dataproc_cluster is used to test a Google Cluster resource

Beta Resource

This resource has beta fields available. To retrieve these fields, include beta: true in the constructor for the resource.

Examples

describe google_dataproc_cluster(project: 'chef-gcp-inspec', region: 'europe-west2', cluster_name: 'inspec-dataproc-cluster') do
  it { should exist }
  its('labels') { should include('label' => 'value') }
  its('config.master_config.num_instances') { should cmp '1' }
  its('config.worker_config.num_instances') { should cmp '2' }
  its('config.master_config.machine_type_uri') { should match 'n1-standard-1' }
  its('config.worker_config.machine_type_uri') { should match 'n1-standard-1' }
  its('config.software_config.properties') { should include('dataproc:dataproc.allow.zero.workers' => 'true') }
end

describe google_dataproc_cluster(project: 'chef-gcp-inspec', region: 'europe-west2', cluster_name: 'nonexistent') do
  it { should_not exist }
end

Properties

Properties that can be accessed from the google_dataproc_cluster resource:

  • cluster_name: The name of the cluster, unique within the project and region.

  • labels: Labels to apply to this cluster. A list of key->value pairs.

  • config: Configuration for the cluster

    • config_bucket: The Cloud Storage staging bucket used to stage files, such as Hadoop jars, between client machines and the cluster.

    • gce_cluster_config: Common config settings for resources of Google Compute Engine cluster instances, applicable to all instances in the cluster.

    • master_config: The config settings for Compute Engine resources in an instance group, such as a master or worker group.

      • num_instances: The number of VM instances in the instance group. For master instance groups, must be set to 1.

      • instance_names: The list of instance names.

      • image_uri: The Compute Engine image resource used for cluster instances.

      • machine_type_uri: The Compute Engine machine type used for cluster instances

      • disk_config: Disk option config settings

        • boot_disk_type: Type of the boot disk. Valid values are “pd-ssd” or “pd-standard”

        • boot_disk_size_gb: Size in GB of the boot disk.

        • num_local_ssds: Number of attached SSDs, from 0 to 4.

      • is_preemptible: Specifies if this instance group contains preemptible instances.

      • managed_group_config: The config for Compute Engine Instance Group Manager that manages this group. This is only used for preemptible instance groups.

        • instance_template_name: The name of the Instance Template used for the Managed Instance Group.

        • instance_group_manager_name: The name of the Instance Group Manager for this group

    • worker_config: The config settings for Compute Engine resources in an instance group, such as a master or worker group.

      • num_instances: The number of VM instances in the instance group. For master instance groups, must be set to 1.

      • instance_names: The list of instance names.

      • image_uri: The Compute Engine image resource used for cluster instances.

      • machine_type_uri: The Compute Engine machine type used for cluster instances

      • disk_config: Disk option config settings

        • boot_disk_type: Type of the boot disk. Valid values are “pd-ssd” or “pd-standard”

        • boot_disk_size_gb: Size in GB of the boot disk.

        • num_local_ssds: Number of attached SSDs, from 0 to 4.

      • is_preemptible: Specifies if this instance group contains preemptible instances.

      • managed_group_config: The config for Compute Engine Instance Group Manager that manages this group. This is only used for preemptible instance groups.

        • instance_template_name: The name of the Instance Template used for the Managed Instance Group.

        • instance_group_manager_name: The name of the Instance Group Manager for this group

    • secondary_worker_config: The config settings for Compute Engine resources in an instance group, such as a master or worker group.

      • num_instances: The number of VM instances in the instance group. For master instance groups, must be set to 1.

      • instance_names: The list of instance names.

      • image_uri: The Compute Engine image resource used for cluster instances.

      • machine_type_uri: The Compute Engine machine type used for cluster instances

      • disk_config: Disk option config settings

        • boot_disk_type: Type of the boot disk. Valid values are “pd-ssd” or “pd-standard”

        • boot_disk_size_gb: Size in GB of the boot disk.

        • num_local_ssds: Number of attached SSDs, from 0 to 4.

      • is_preemptible: Specifies if this instance group contains preemptible instances.

      • managed_group_config: The config for Compute Engine Instance Group Manager that manages this group. This is only used for preemptible instance groups.

        • instance_template_name: The name of the Instance Template used for the Managed Instance Group.

        • instance_group_manager_name: The name of the Instance Group Manager for this group

    • software_config: Specifies the selection and config of software inside the cluster

      • image_version: The version of software inside the cluster. It must be one of the supported Cloud Dataproc Versions, such as “1.2” (including a subminor version, such as “1.2.29”), or the “preview” version.

      • properties: The properties to set on daemon config files. Property keys are specified in the prefix:property format, for example core:hadoop.tmp.dir

      • optional_components: The set of optional components to activate on the cluster. Possible values include: COMPONENT_UNSPECIFIED, ANACONDA, HIVE_WEBHCAT, JUPYTER, ZEPPELIN, HBASE, SOLR, and RANGER Possible values:

        • COMPONENT_UNSPECIFIED
        • ANACONDA
        • HBASE
        • RANGER
        • SOLR
        • HIVE_WEBHCAT
        • JUPYTER
        • ZEPPELIN
    • initialization_actions: Specifies an executable to run on a fully configured node and a timeout period for executable completion.

      • executable_file: Cloud Storage URI of the executable file

      • execution_timeout: Amount of time executable has to complete

    • encryption_config: Encryption settings for the cluster.

      • gce_pd_kms_key_name: The Cloud KMS key name to use for PD disk encryption for all instances in the cluster.
    • security_config: Kerberos config holder.

      • kerberos_config: Kerberos related configuration.

        • enable_kerberos: Flag to indicate whether to Kerberize the cluster.

        • rootprincipal_password_uri: The cloud Storage URI of a KMS encrypted file containing the root principal password.

        • kms_key_uri: The uri of the KMS key used to encrypt various sensitive files.

        • keystore_uri: The Cloud Storage URI of the keystore file used for SSL encryption.

        • truststore_uri: The Cloud Storage URI of a KMS encrypted file containing the password to the user provided keystore.

        • key_password_uri: The Cloud Storage URI of a KMS encrypted file containing the password to the user provided key.

        • truststore_password_uri: The Cloud Storage URI of a KMS encrypted file containing the password to the user provided truststore.

        • cross_realm_trust_realm: The remote realm the Dataproc on-cluster KDC will trust, should the user enable cross realm trust.

        • cross_realm_trust_admin_server: The admin server (IP or hostname) for the remote trusted realm in a cross realm trust relationship.

        • cross_realm_trust_shared_password_uri: The Cloud Storage URI of a KMS encrypted file containing the shared password between the on-cluster Kerberos realm and the remote trusted realm, in a cross realm trust relationship.

        • kdc_db_key_uri: The Cloud Storage URI of a KMS encrypted file containing the master key of the KDC database.

        • tgt_lifetime_hours: The lifetime of the ticket granting ticket, in hours.

        • realm: The name of the on-cluster Kerberos realm.

  • region: The region in which the cluster and associated nodes will be created in.

  • project_id: The Google Cloud Platform project ID that the cluster belongs to.

  • virtual_cluster_config: Optional. The virtual cluster config is used when creating a Dataproc cluster that does not directly control the underlying compute resources, for example, when creating a Dataproc-on-GKE cluster (https://cloud.google.com/dataproc/docs/guides/dpgke/dataproc-gke-overview). Dataproc may set default values, and values may change when clusters are updated. Exactly one of config or virtual_cluster_config must be specified.

  • status: Output only. Cluster status.

  • status_history: Output only. The previous cluster status.

  • cluster_uuid: Output only. A cluster UUID (Unique Universal Identifier). Dataproc generates this value when it creates the cluster.

  • metrics: Output only. Contains cluster daemon metrics such as HDFS and YARN stats.Beta Feature: This report is available for testing purposes only. It may be changed before final release.

GCP permissions

Ensure the Cloud Dataproc API is enabled for the current project.

Thank you for your feedback!

×