Labels on Cloud Dataflow Instances for Cloud BigTable Exports

Google Cloud Platform has the notion of resource labels which are often useful in tracking ownership and cost reporting.

Google Cloud BigTable supports data exports as sequence files. The process uses Cloud Dataflow. Cloud Dataflow ends up spinning up Compute Engine VMs for this processing, up to –maxNumWorkers.

I wanted to see what this was costing to run on a regular basis, but the VMs are ephemeral in nature and unlabelled. There’s an option not mentioned on the Google documentation to accomplish this task!

$ java -jar bigtable-beam-import-1.4.0-shaded.jar export --help=org.apache.beam.runners.dataflow.options.DataflowPipelineOptions
org.apache.beam.runners.dataflow.options.DataflowPipelineOptions:
 Options that configure the Dataflow pipeline.

--labels=<Map>
 Labels that will be applied to the billing records for this job.

Marvelous!