Python JSON

Python JSON #

Just some simple pointers on how to work with JSON, originating either from APIs or JSON files.

Modules #

First up, load some tooling to interact with JSON. I most frequently utilize json or pandas.

import json
import pandas as pd

Reading JSON from a single valid file #

A .json file might represent a single complete JSON object (as opposed to a single file with multiple complete JSON objects).

Assume there’s an example.json file.

with open('example.json') as f:
    data = json.load(f)

Reading JSON file with multiple lines #

Individual .json files might contain multiple json objects. The individual lines are valid json, but the file as a whole is not valid json.

Each json object has to be read in separately, one at a time. Then, convert each object to a dict with json.loads. This dict objects can then be added to a single list.

some_list = []

with open('<file>') as f:
    for json_object in f:
        some_dict = json.loads(json_object)
        some_list.append(some_dict)

# extract elements
for thing in some_list:
    print(thing["<key1>"], thing["<key2"])

Reference: PYnative: Python Parse multiple JSON objects from file

Reading JSON data from an API #

import json
from ulrlib.request import urlopen

with urlopen("https://<API path>?format=json") as response:
    source = response.read() # get response from the website, comes out as a string

data = json.loads(source)

source in its raw form might not be very readable. To make it more readable, add some formatting:

print(json.dumps(data, indent=2))

Extracting data #

data['people'] # this returns just what's in the 'people' key, which is a list of dictionaries

Looping through data #

for person in data['people']:
    print(person)

If we want to loop and extract a specific value:

for person in data['people']:
    print(person['name'])

If we want to grab multiple values:

for person in data['people']:
    print(person['name'], person['emails'])

Structuring data pulled from JSON #

Let’s say the goal is to take some JSON, restructure it, and put it into a dictionary object.


pricing = dict()

for thing in data['details']:
    name = thing['details']['name']
    price = thing['details']['price']
    pricing[name] = price

print(pricing['<some key in the dictionary object>'])

Dump data into JSON object #

For instance, delete phone and re-assign:

for person in data['person']:
    del person['phone'] # modifies data in place

new_example_json = json.dumps(data)

Note that dumps is “dump S” for “dump (JSON) string”.

Add some formatting to dump by adding indents to each level.

new_example_json = json.dumps(data, indent=2)

Optionally, sort keys.

new_example_json = json.dumps(data, sort_keys=True)

Write to json file #

Assuming data is a JSON object:

with open('target.json', 'w') as f:
    json.dump(data, f)

For readability, add some formatting, such as indents:

with open('target.json', 'w') as f:
    json.dump(data, f, indent=2)

json list to Pandas dataframe #

Sometimes the contents of a JSON object includes a list of dictionaries. It’s often useful to convert it to a flat dataframe.

for thing in json_list:
    df_thing = pdf.DataFrame.from_records(thing)

Reference: Stack Overflow

If there’s a column that appears in the dataframe that’s structured as a dictionary, this can be flattened out and extracted into distinct columns in the dataframe with pd.json_normalize.

Resources #