A Quick Tour of Google Cloud Build

TL;DR

Google Cloud Build is a workflow manager and execution engine built on container image primitives. Once you get past some product specific syntax and jargon, the tool will feel familiar to anyone who has experience with Docker containers, and is a powerful and simple way to create complex build or code execution workflows.

Intro

I recently gave a workshop where we used Google Cloud Build to run unit tests on Apache Airflow tasks, and I was reminded that Cloud Build is one of the most under appreciated tools in the Google Cloud ecosystem. While Cloud Build is an alternative to tools like Jenkins and AWS CodeBuild, it also has characteristics in common with AWS Lambda and Google App Engine. Cloud Build can expose significant resources (up to 28.8 GB of memory and 32 CPU cores) and run multiple containers on the same machine, either in sequence or in parallel. Unlike AWS Lambda or Google Cloud Functions, Cloud Build supports long running tasks. In this post, I’ll walk through a few simple examples to get you started running Cloud Build jobs.

Running the Examples

Create a GCP project and open the Google Cloud Shell. For each example, create the specified cloudbuild.yaml file and other required files in the same directory. Wherever the project ID ternary-sandbox shows up, replace it with your project ID. Execute the cloudbuild.yaml by running
gcloud builds submit
from the Cloud Shell.

Getting Past Obscured Functionality

I can usually understand what an AWS Lambda function does after staring at the code for 30 seconds, but what does this cloudbuild.yaml file do? (Note: this is not one of our runnable examples.)

steps:
- name: 'gcr.io/cloud-builders/docker'
  args: ['build', '-t', 'gcr.io/my-project/my-image', '.']
  timeout: 500s
- name: 'gcr.io/cloud-builders/docker'
  args: ['push', 'gcr.io/my-project/my-image']
- name: 'gcr.io/cloud-builders/kubectl'
  args: ['set', 'image', 'deployment/my-deployment', 'my-container=gcr.io/my-project/my-image']
  env:
  - 'CLOUDSDK_COMPUTE_ZONE=us-east4-b'
  - 'CLOUDSDK_CONTAINER_CLUSTER=my-cluster'

In fact, all functionality in Cloud Build stems from the very simple concept of running actions in containers. Cloud Build can utilize a different container image in each step and integrates with Docker Hub, allowing us off the shelf access to a vast ecosystem of functionality. In addition, Google provides a collection of Cloud Builders, containers with pre-installed build tools. These are available in Google and community supported flavors. Google also provides a container registry in each GCP project that is deeply integrated with Cloud Build, so I can build custom containers and utilize them in any subsequent step or build.

Let’s take another look at the above code. Somewhat confusingly, the name field of each step in the YAML is not the name or description of the step, but a container URI. One of my complaints with Cloud Build, at least as it is currently documented, is that Google doesn’t encourage inline documentation of jobs. Fortunately, this is easy to fix using standard YAML comments.

Steps:
# This step uses Docker to build my image.
- name: 'gcr.io/cloud-builders/docker'
  args: ['build', '-t', 'gcr.io/my-project/my-image', '.']
  timeout: 500s
# This step pushes my image to the container registry for `my-project`.
- name: 'gcr.io/cloud-builders/docker'
  args: ['push', 'gcr.io/my-project/my-image']
# Deploy `my-image` to Kubernetes.
- name: 'gcr.io/cloud-builders/kubectl'
  args: ['set', 'image', 'deployment/my-deployment', 'my-container=gcr.io/my-project/my-image']
  env:
  - 'CLOUDSDK_COMPUTE_ZONE=us-east4-b'
  - 'CLOUDSDK_CONTAINER_CLUSTER=my-cluster'

Google could improve Cloud Build by specifying standard documentation fields in their cloudbuild.yaml spec, for example description fields in each step. This would create documentation that is both machine readable, and programmatically convertible to formatted documentation, much like Python docstrings.

Naming Things is Hard

The Cloud Build documentation commits the age old sin of using product specific jargon for standard practices of the IT trade. Let’s talk about cloud builders and custom build steps.

Cloud Builders

A cloud builder is simply a container image with some standardized tooling built in. These come in two basic flavors: Google supported and community supported. For example, there is a Google supported Gcloud Builder, which includes the gcloud, bq (BigQuery) and gsutil (Cloud Storage) command line tools; the Ansible builder is community supported.

Entrypoints

Each Cloud Builder has a standard entry point. If I don’t specify a command for the Gcloud Builder, then the step runs gcloud with the arguments specified. The below cloudbuild.yaml runs gcloud help.

steps:
- name: 'gcr.io/cloud-builders/gcloud'
  args: ['help']

In practice, these builders often have many useful entry points. The cloudbuild.yaml below runs gsutil ls.

steps:
- name: 'gcr.io/cloud-builders/gcloud'
  entrypoint: 'gsutil'
  args: ['ls']

  # List the buckets in my current project.

Custom Build Steps

A custom build step is just an execution step that doesn’t utilize a Cloud Builder. In other words, we can run any image we’ve built, any image on Docker hub, etc. If I want to run Python code, I can just reference the Docker Hub standard Python images.

steps:
- name: 'python:3.6'
  entrypoint: 'python'
  args: ['--version']
  #Shows the version of Python in the container.

Integration with Container Registry

Every GCP project includes a container registry. I’m working in a project with ID ternary-sandbox. The URI for an Apache Airflow image I’ve created is gcr.io/ternary-sandbox/airflow. I can get the version of Airflow installed in the container as follows.

steps:
- name: 'gcr.io/ternary-sandbox/airflow'
  entrypoint: 'airflow'
  args: ['version']

Building a custom container

We can easily build a custom container for use in future builds and push it to our container registry. For this, we use the Docker Cloud Builder. Let’s build an Airflow container - we’ll need both a Dockerfile and a cloudbuild.yaml for this example.

Dockerfile:

FROM python:3.7
 
RUN pip install apache-airflow[gcp]

cloudbuild.yaml:

steps:
- name: 'gcr.io/cloud-builders/docker'
  entrypoint: 'docker'
  args: [ 'build', '-t', 'gcr.io/ternary-sandbox/airflow', '.' ]
images: ['gcr.io/ternary-sandbox/airflow']

The Docker Cloud Builder includes standard Docker tooling. We’re simply running
docker build -t gcr.io/ternary-sandbox/airflow .
The images key tells Cloud Build where it should push the image, our project container registry in this case. Note that the tag we set in the build command must match the target image URL.

We can run almost any workflow

If you’re familiar with Docker, then you’ve probably now realized that we can run any code we like - hence my comparison of Cloud Build with general purpose execution engines like AWS Lambda and Google App Engine. (There is a significant caveat - Cloud Build is great for running code in a workflow, but is of no use for hosting services that need to respond to incoming requests.) In this example, we’ll pull data from the AWS IP ranges API, which lists current IP ranges for AWS services, and write the data to Google Cloud Storage. We’ll pull and write from a shell script.

pull-data.sh:

set -e
 
curl -O https://ip-ranges.amazonaws.com/ip-ranges.json
gsutil cp ip-ranges.json gs://ternary-sandbox

cloudbuild.yaml:

steps:
- name: 'gcr.io/cloud-builders/gcloud'
  entrypoint: 'bash'
  args: [ 'pull-data.sh' ]
  # This step reads data from an API and writes it to Cloud Storage.

The cloudbuild.yaml specifies the Gcloud Builder so we have access to the gsutil CLI for pushing to Cloud Storage.

BlogMatt HousleyApril 8, 2020