Python CSV module

Module #

import csv

Reading csv #

with open('<file>.csv', 'r') as csv_file:
    csv_reader = csv.reader(csv_file)

    # print each line
    for line in csv_reader:
        print(line)

Reading specific column, specify based on column index:

with open('<file>.csv', 'r') as csv_file:
    csv_reader = csv.reader(csv_file)

    # print each line, by index
    for line in csv_reader:
        print(line[2])

To skip first line (usually the field name), add a next():

with open('<file>.csv', 'r') as csv_file:
    csv_reader = csv.reader(csv_file)

    next(csv_reader)

    # print each line, by index
    for line in csv_reader:
        print(line[2])

Specify a delimiter other than a comma:

with open('<file>.csv', 'r') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter='\t')
    ...

Writing csv #

Let’s say we’re reading a file, and replacing delimiter with tabs, and writing to another csv.

with open('<file>.csv', 'r') as csv_file:
    csv_reader = csv.reader(csv_file)

    with open('<target>.csv', 'w') as new_file:
        csv_writer = csv.writer(new_file, delimiter='\t')

        # each line in source, write it out
        for line in csv_reader:
            csv_writer.writerow(line)

Once run, <target>.csv should exist.

The cool thing about delimiter – if the specified delimiter appears as a value in the source file, then it gets wrapped in double quotes to avoid confusion.

Read csv with dictionary reader #

Creates a dictionary per record. Makes it easier to parse information by being able to specify by key.

with open('<file>.csv', 'r') as csv_file:
    csv_reader = csv.DictReader(csv_file)

    # print specific field
    for line in csv_reader:
        print(line['<key>'])

Write csv with dictionary writer #

With dictionary writer, need to specify field names.

If there are extra fields, specify extrasaction='ignore' to ignore them.

with open('<file>.csv', 'r') as csv_file:
    csv_reader = csv.reader(csv_file)

    with open('<target>.csv', 'w') as new_file:
        fieldnames = ['<key1>', '<key2>']
        csv_writer = csv.DictWriter(new_file, fieldnames=fieldnames, delimiter='\t', extrasaction='ignore')

        csv_writer.writeheader() # explicitly specify to write headers
        
        for line in csv_reader:
            csv_writer.writerow(line)

Modifying content #

If there’s anything that needs to be changed record by record, it should happen before the .writerow(line) call.

For instance, if we want to fill a column with a set of 0’s to achieve a desired character length for a string:

with open('<file>.csv', 'r') as csv_file:
    csv_reader = csv.reader(csv_file)

    with open('<target>.csv', 'w') as new_file:
        fieldnames = ['<key1>', '<key2>']
        csv_writer = csv.DictWriter(new_file, fieldnames=fieldnames, delimiter='\t', extrasaction='ignore')

        csv_writer.writeheader() # explicitly specify to write headers
        
        for line in csv_reader:
            line[fieldnames[0]] = line[fieldnames[0]].zfill(10) 
            csv_writer.writerow(line)

Pandas write to csv #

Alternatively, Pandas can be used to write a dataframe to csv.

import pandas as pd

some_dataframe("<path to file>.csv", index=False)

The index=False piece removes the index.

Resources #