Python Pandas + Other Transformations

Most of the commands here are from Pandas. Some are fairly common data manipulation commands not from Pandas.

Package #

import pandas as pd

Data #

Read CSV (or other delimited format) #

data = pd.read_csv("<path to file>")

Create dataframe from dictionary #

data = {...} # some dictionary

df = pd.DataFrame.from_dict(data)

By default, keys become the columns. Option orient='index' changes things so keys become rows.

Ref

Transformation #

Subset columns #

Specify what to keep:

data.loc[:, ["<column name", "<column name". ...]]

Specify what to drop:

data_sub = data.drop(labels = <'list of column names', axis = 1>)

Make categorical #

Sometimes, it’s necessary to treat a field full of numbers as a categorical variable.

data['<field>'] = pd.Categorical(data.<field>)

# or

data['field'] = data.<field>.astype('<category type>')

Conditionally replace cell value #

data.loc[data.<field> == '<some string to match>' ', '<field>'] = '<new value>'

Join dataframes #

Left join:

pd.merge(df1, df2, how = "left", left_on = "<df1 column>", right_on = "<df2 column>")

References:

Analysis #

Data types #

Check the data types of a data frame

data.dtypes

Preview #

Quick ways to get a sense of what the data looks like.

data.head()
data.tail()

Column names #

Get column names:

data.columns

Or restructure as a list:

list(data.columns)