Python file objects

Open file #

f = open('<path to file>')

Specify if it’s for reading (‘r’), writing (‘w’), appending (‘a’), or reading and writing (‘r+').

Open for reading:

f = open('<path to file>', 'r')

File name:

print(f.name)

File mode that it’s open with:

print(f.mode)

If reading the file in increments, this indicates the position in the file that it’s been read up to:

print(f.tell())

Close file #

After file is opened, it needs to be closed.

If the file isn’t closed, there could be leaks that lead to errors.

f.close()

Context manager #

Usually, working with files with a context manager is preferable.

The benefit: it allows us to work with files within block of code. After the block is complete, it closes the file. Super sanitary.

with open('<path to file>', 'r') as f:
    pass

Reading small files #

This is OK if the file is small:

with open('<path to file>', 'r') as f:
    f_contents = f.read()
    print(f_contents) # prints contents of file

Print individual lines of the file, where each line is treated as a distinct element.

with open('<path to file>', 'r') as f:
    f_contents = f.readlines()
    print(f_contents) # prints contents of file

Running .readline() gets one line at a time, each run moves to the next line.

with open('<path to file>', 'r') as f:
    f_contents = f.readline()
    print(f_contents, end ='') # prints contents of first line. 'end' argument specifies how line ends (defaults to '\n')

    f_contents = f.readline() # this reads the next line
    print(f_contents, end ='')

Reading large files #

with open('<path to file>', 'r') as f:
    for line in f:
        print(line, end = '')

^ this goes through one line at a time. It doesn’t go through everything all at once.

Alternateively, use .read() with a specification for how many characters get read.

Note that each run of .read() advances the position in the file.

with open('<path to file>', 'r') as f:
    f_contents = f.read(100)
    print(f_contents, end='')

    # this picks up where the previous chunk left off at
    # if there's nothing left to read, it returns an empty string
    f_contents = f.read(100)
    print(f_contents, end='')

If we don’t know how large the file is, use a loop.

with open('<path to file>', 'r') as f:
    
    size_to_read = 10

    f_contents = f.read(size_to_read)

    while len(f_contents) > 0:
        print(f_contents, end ='')
        f_contents = f.read(size_to_read)

Change position of file interaction #

Specify precisely where. 0 value moves the position to the start.

f.seek(0)

Writing to files #

Proper way to write:

with open('<path to file>', 'w') as f:
    f.write('<some text>') 

w – if the file does not exist, it creates the file. If the file already exists, it overwrites the file. Use a to append to an existing file.

Writing a file in read mode causes an error:

with open('<path to file>', 'r') as f:
    f.write('<some text>') 
... not writable

Read from one file, write to another #

with open('<path to read file>', 'r') as rf: # rf == "read file"
    with open('<path to write file>', 'w') as wf: # wf == "write file"
        for line in rf:
            wf.write(line)

Could be useful for transforming and serialzing.

Interacting with images #

When interacting with images, you need to work in binary mode (rb, wb), not text.

with open('<path to file>.jpg', 'rb') as rf: 
    with open('<path to target>.jpg', 'wb') as wf:
        for line in rf:
            wf.write(line)

Instead of line by line, do so in chunks:

with open('<path to file>.jpg', 'rb') as rf: 
    with open('<path to target>.jpg', 'wb') as wf:
        chunk_size = 4096
        rf_chunk = rf.read(chunk_size)
        
        # keep reading until nothing left to read
        while len(rf_chunk) > 0:
            wf.write(rf_chunk)
            rf_chunk = rf.read(chunk_size)