How to: Read a Google Cloud Storage .csv to a Vertex Pandas Dataframe

I often find it useful to stash .csv files in a Google Cloud Storage (GCS) bucket to be accessed from a Vertex JupyterLabs notebook.

Once the file(s) are accessible from Vertex, I can do all sorts of things with it.

One of the most common operations I perform in this scenario is to use the data in the .csv to create a Pandas dataframe that can in turn be used to enrich other data that I have stored in a BigQuery implementation.

import pandas as pd
from google.cloud import storage
from io import BytesIO

def generate_df(bucket_name_input: str, files: list) -> pd.DataFrame:
    client = storage.Client()
    bucket_name = bucket_name_input
    bucket = client.get_bucket(bucket_name)
    
    df = pd.DataFrame()
    
    for file in files:
        # this construct assumes the files are similarly structured
        blob = bucket.get_blob(file)
        content = blob.download_as_string()
        file_df = pd.read_csv(BytesIO(content))
        
        df = df.append(file_df)
        
    return(df)