Saturday, January 13, 2018

When json.dumps(json.loads(x)) != x

I hit a minor bug in some code where it seemed like .readlines() was stripping lines and .writelines() wasn't putting them back in. I started adding them back manually, but it irked me.

Following up on a twitter conversation about it, I saw that it wasn't actually readlines and writelines that was the problem. When I read the data, I was doing a json.loads() to turn the data from a string into a dictionary, and then later using json.dumps() to turn that (now modified) dictionary back into a string. See the problem yet?

code demonstrating that loading and then dumping a json string with new lines removes them

json.loads turns the string, however messy, into a nice, neat python object. But json.dumps creates the neatest possible json form of that object. So any extra white space, including those new lines I expected, gets stripped out in the process.

I'm currently still of the opinion that readlines should strip out those extra lines, and writelines should put them in. If you're counting the new line character as the way you know to go on to a new thing, they how can it also be part of that first thing? Shouldn't it behave more like .split('\n')? But I'm open to convincing on this.