Python Pandas + Other Transformations

Most of the commands here are from Pandas. Some are fairly common data manipulation commands not from Pandas.

Package #

import pandas as pd

data = pd.read_csv("<path to file>")

data = {...} # some dictionary

df = pd.DataFrame.from_dict(data)

By default, keys become the columns. Option orient='index' changes things so keys become rows.

Specify what to keep:

data.loc[:, ["<column name", "<column name". ...]]

Specify what to drop:

data_sub = data.drop(labels = <'list of column names', axis = 1>)

Sometimes, it’s necessary to treat a field full of numbers as a categorical variable.

data['<field>'] = pd.Categorical(data.<field>)

# or

data['field'] = data.<field>.astype('<category type>')

data.loc[data.<field> == '<some string to match>' ', '<field>'] = '<new value>'

Left join:

pd.merge(df1, df2, how = "left", left_on = "<df1 column>", right_on = "<df2 column>")

References:

Check the data types of a data frame

data.dtypes

Quick ways to get a sense of what the data looks like.

data.head()

data.tail()

Get column names:

data.columns

Or restructure as a list:

list(data.columns)