Running Google Cloud Vertex AI through Python

Google Cloud’s Vertex AI is pretty awesome for machine learning.

But sometimes it gets pretty tedious running everything through the site’s user interface. It’s especially sigh-worthy when a workflow involves multiple iterations of different datasets and models.

Fortunately, it’s pretty straightforward to run the process of dataset generation in BigQuery, instantiating a dataset for use in Vertex AI, and finally running the Vertex AI model through a Python script or a Jupyter notebook.

Set up the environment #

To interact with Google Cloud and its various components, you’ll first need to load up the appropriate packages.

Note: I’m assuming that your data is already stashed somewhere in BigQuery.

from google.cloud import aiplatform, bigquery, storage

If you don’t already have the packages installed, run the following commands first (and if you’re doing this from within a Jupyter Notebook, replace the pip with a %pip):

pip3 install google-cloud-aiplatform
pip3 install google-cloud-bigquery
pip3 install google-cloud-cloud-storage

Set up the conenction #

Before you can do anything, you’ll have to authenticate to Google Cloud via the command line.

project_id = '<your Google Cloud project>'
region = '<your Google Cloud region>'
client = bigquery.Client(project=project_id) # this actually sets up the connection

Set up the data #

A common workflow involves creating a table in BigQuery that will eventually be used for model training.

It might look something like this:

# define your SQL query
sql = f"""
  CREATE OR REPLACE TABLE <project_id.database.table_name>
  ... bunch of SQL stuff
"""

client.query(sql)

If you want to make sure the fields you specified made it through, you can run:

sql_check = f"""
  SELECT column_name
  FROM <project_id.database>.INFORMATION_SCHEMA.COLUMNS
  WHERE table_name = '<table_name>'
"""

client.query(sql_check).to_dataframe()

Kick off Vertex AI #

Initialize #

First, initialize the Vertex AI platform.

aiplatform.init(project = <project_id>, location = <region>)

Dataset #

Once that’s all done, define the dataset to be used in the Vertex AI job (assuming the data resides in BigQuery):

vai_dataset = aiplatform.TabularDataset.create(
  display_name = <whatever you want to name the datset>,
  bq_source = "bq://<path to a bigquery table>"
)

Train model #

Now it’s finally time to actually train the model. In this case, we’re going to assume that we’re working with tabular data for a classification.

(For more details, check out the official documentation).

Be forewarned: this could take a while (and cost a pretty penny):

job = aiplatform.AutoMLTabularTrainingJob(
  dataset = vai_dataset,
  target_column = <whatever your target column is>,
  ... # there's a whole bunch of parameters. Definitely check the official documentation
)

Once this runs to completion, you’ll have a nicely trained model that can be used for further evaluation and deployment.