We've already learned about pickle, so why do we need another way to (de)serialize Python objects to(from) disk or a network connection? There are three major reasons to prefer JSON over pickle:
- When you're unpickling data, you're essentially allowing your data source to execute arbitrary Python commands. If the data is trustworthy (say stored in a sufficiently protected directory), that may not be a problem, but it's often really easy to accidentally leave a file unprotected (or read something from network). In these cases, you want to load data, and not execute potentially malicious Python code!
-
Pickled data is not easy to read, and virtually impossible to write for humans. For example, the pickled version of
{"answer": [42]}
looks like this:(dp0 S'answer' p1 (lp2 I42 as.
In contrast, the JSON representation of {"answer": [42]}
is ....
{"answer": [42]}
. If you can read Python, you can read JSON; since
all JSON is valid Python code!
- Pickle is Python-specific. In fact, by default, the bytes generated by Python 3's pickle cannot be read by a Python 2.x application! JSON can be read by virtually any programming language - just scroll down on the official homepage to see implementations in all major and some minor languages.
So how do you get the JSON representation of an object? It's simple,
just call [json.dumps
][]:
import json obj = {u"answer": [42.2], u"abs": 42} print(json.dumps(obj)) # output: {"answer": [42.2], "abs": 42}
Often, you want to write to a file or network stream. In both Python 2.x
and 3.x you can call [dump
][] to do that, but in 3.x the output must
be a character stream, whereas 2.x expects a byte stream.
Let's look how to load what we wrote. Fittingly, the function to load is
called loads
(to load from a string) / load
(to load from a stream):
import json obj_json = u'{"answer": [42.2], "abs": 42}' obj = json.loads(obj_json) print(repr(obj))
When the objects we load and store grow larger, we puny humans often need some hints on where a new sub-object starts. To get these, simply pass an indent size, like this:
import json obj = {u"answer": [42.2], u"abs": 42} print(json.dumps(obj, indent=4))
Now, the output will be a beautiful
{ "abs": 42, "answer": [ 42.2 ] }
I often use this indentation feature to debug complex data structures.
The price of JSON's interoperability is that we cannot store arbitrary Python objects. In fact, JSON can only store the following objects:
- character strings
- numbers
- booleans (
True
/False
) None
- lists
- dictionaries with character string keys
Every object that's not one of these must be converted - that includes
every object of a custom class. Say we have an object alice
as
follows:
class User(object): def __init__(self, name, password): self.name = name self.password = password alice = User('Alice A. Adams', 'secret')
then converting this object to JSON will fail:
>>> import json >>> json.dumps(alice) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python3.3/json/__init__.py", line 236, in dumps return _default_encoder.encode(obj) File "/usr/lib/python3.3/json/encoder.py", line 191, in encode chunks = self.iterencode(o, _one_shot=True) File "/usr/lib/python3.3/json/encoder.py", line 249, in iterencode return _iterencode(o, 0) File "/usr/lib/python3.3/json/encoder.py", line 173, in default raise TypeError(repr(o) + " is not JSON serializable") TypeError: <__main__.User object at 0x7f2eccc88150> is not JSON serializable
Fortunately, there is a simple hook for conversion: Simply define a
default
method.:
def jdefault(o): return o.__dict__ print(json.dumps(alice, default=jdefault)) # outputs: {"password": "secret", "name": "Alice A. Adams"}
o.__dict__
is a simple catch-all for user-defined objects, but we can
also add support for other objects. For example, let's add support for
sets by treating them like lists:
def jdefault(o): if isinstance(o, set): return list(o) return o.__dict__ pets = set([u'Tiger', u'Panther', u'Toad']) print(json.dumps(pets, default=jdefault)) # outputs: ["Tiger", "Panther", "Toad"]
For more options and details (ensure_ascii
and sort_keys
may be
interesting options to set), have a look at the official
documentation for JSON. JSON is available by default in Python 2.6
and newer, before that you can use simplejson as a fallback.
You might also like :
*) The self variable in python explained
*) Python socket network programming
*) [*args and **kwargs in python explained][]
Comments