Python JSON #
Just some simple pointers on how to work with JSON, originating either from APIs or JSON files.
Modules #
First up, load some tooling to interact with JSON. I most frequently utilize json
or pandas
.
import json
import pandas as pd
Reading JSON from a single valid file #
A .json file might represent a single complete JSON object (as opposed to a single file with multiple complete JSON objects).
Assume there’s an example.json
file.
with open('example.json') as f:
data = json.load(f)
Reading JSON file with multiple lines #
Individual .json files might contain multiple json objects. The individual lines are valid json, but the file as a whole is not valid json.
Each json object has to be read in separately, one at a time. Then, convert each object to a dict
with json.loads
. This dict
objects can then be added to a single list.
some_list = []
with open('<file>') as f:
for json_object in f:
some_dict = json.loads(json_object)
some_list.append(some_dict)
# extract elements
for thing in some_list:
print(thing["<key1>"], thing["<key2"])
Reference: PYnative: Python Parse multiple JSON objects from file
Reading JSON data from an API #
import json
from ulrlib.request import urlopen
with urlopen("https://<API path>?format=json") as response:
source = response.read() # get response from the website, comes out as a string
data = json.loads(source)
source
in its raw form might not be very readable. To make it more readable, add some formatting:
print(json.dumps(data, indent=2))
Extracting data #
data['people'] # this returns just what's in the 'people' key, which is a list of dictionaries
Looping through data #
for person in data['people']:
print(person)
If we want to loop and extract a specific value:
for person in data['people']:
print(person['name'])
If we want to grab multiple values:
for person in data['people']:
print(person['name'], person['emails'])
Structuring data pulled from JSON #
Let’s say the goal is to take some JSON, restructure it, and put it into a dictionary object.
pricing = dict()
for thing in data['details']:
name = thing['details']['name']
price = thing['details']['price']
pricing[name] = price
print(pricing['<some key in the dictionary object>'])
Dump data into JSON object #
For instance, delete phone and re-assign:
for person in data['person']:
del person['phone'] # modifies data in place
new_example_json = json.dumps(data)
Note that dumps
is “dump S” for “dump (JSON) string”.
Add some formatting to dump by adding indents to each level.
new_example_json = json.dumps(data, indent=2)
Optionally, sort keys.
new_example_json = json.dumps(data, sort_keys=True)
Write to json file #
Assuming data
is a JSON object:
with open('target.json', 'w') as f:
json.dump(data, f)
For readability, add some formatting, such as indents:
with open('target.json', 'w') as f:
json.dump(data, f, indent=2)
json list to Pandas dataframe #
Sometimes the contents of a JSON object includes a list of dictionaries. It’s often useful to convert it to a flat dataframe.
for thing in json_list:
df_thing = pdf.DataFrame.from_records(thing)
Reference: Stack Overflow
If there’s a column that appears in the dataframe that’s structured as a dictionary, this can be flattened out and extracted into distinct columns in the dataframe with pd.json_normalize
.