A Cautionary Tale of Open Data

Disclaimer – some of the ideas within this blog post are not mine. I had lunch today with a good friend and colleague and he said some stuff which made real sense. I thought I would write them up and share, I’m saying this with a disclaimer as it provides me with 2 things – covering my butt if he finds out I’ve discussed his ideas and wants to be credited (which he will be) and me being able to pass them off as someone else’s ideas if you disagree. A win-win situation, if you’re into such phrases. Edit: Jim Morton (aka @premierkissov) is the man behind the chat behind the blog.

So, a cautionary tale….

Neil* likes to dabble with creating websites, he’s not bad at it, but he’s certainly no genius. His local council, Southfordsley District Council* have just started publishing some open data. It’s not a bad site, with some interesting excel spreadsheets, the odd RSS feed and a few shape files. During a flash of inspiration, Neil decides he is going to build a website using some of this open data (because that’s what all the cool geeks are talking about) to provide a useful service to the residents of Southfordsley District Council. How inspiring and fabulous, I hear you say. What a lovely chap with a heart of gold, devoting his time and skills to do something for me, my hero.

So Neil wants to build an application that pulls together the information about the council’s leisure facilities, a map of their locations, usage statistics and a schedule of classes and swim sessions. That way, the lucky residents of Southfordsley can, with a flick of a wrist and a click of their mouse, decide whether or not they want to go swimming today at 1pm because the trend shows this is a quiet time of day at their nearest pool and it’s before the school lessons take place. Hopefully the changing room floor will be relatively pubic hair free, heaven!

So Neil sets about building this modern technological marvel. The swimming pool usage statistics are up on the website in Excel format so that is nice and easy. So are the locations of all the swimming pools. The opening times, however, are not on the site, so Neil flexes his scraper muscles and sets about collecting this data with some funky snake related programming called Python (is that related to ASP?). The swimming sessions are also not available as open data, so Neil scrapes all of this as well. This guy really knows how to party!

Lovely. The site is pulled together with a bit of code here, and a bit of code there, and before you can say “Public Data Corporation” the new “Swimming in Southfordsley” is ready for launch. Word spreads quickly about this new site and very soon he is receiving well over 10 hits per day, yes, you heard me, 10 hits! He is a local hero, there is even a little bit about him in the local paper wearing speedos, a swimming hat and holding his laptop.

But oh, those salad days didn’t last for long. Within a couple of months things had changed and poor Neil was no longer “resident of the month”. The District Council website didn’t take into account data scrapers and they merrily changed their table data willy nilly, without a second thought for “Swimming in Southfordsley”. So very quickly Neil’s website became out of date and that’s where the real trouble started. John* turned up thinking he was going to Swim Fit, but ended up in Water Babies, 25 year old Marilyn* wanted to do bums, tums and thighs and found herself with the 50+ men only session (they didn’t mind too much), and when Steven* turned up, the pool was shut. The town was in uproar, and with his head hanging in shame, Neil shut down his website, never to utter the words “open data” again.

*All names have been changed to protect identities.

So, what is the moral of this story? Open data is not straightforward and it will only be done correctly if it is done in a sustainable manner. Data needs to be taken from source, not rehashed into a spreadsheet that is out of date before you have even published it. Consideration needs to be given to what people might do with the data, it needs to be easy for them to do something with it, and we need to ensure the data stream is durable and reliable. Would you use a website if you knew it was displaying incorrect information? No, you would go somewhere else.

I don’t have the answers, but I do know that open data won’t survive if councils believe throwing a few spreadsheets at us is the way forward.

We also need to know what people are doing with the data. Someone has already mentioned in a blog post somewhere (sorry, I’ve forgotten where now) about creating a GetTheApp site to post when you have done something with data. A great idea that could show what can be done.

The work Tim Davies is doing with his http://opendatacookbook.net/ site is fabulous and a much-needed resource. But councils, please don’t make the fruits of his labour rotten because the data is a bit off.

Advertisements

3 Responses

  1. Councils and independent developers need to be talking to each other and making each other aware of their various aims, aspirations and constraints.

    If councils use their own datastores/websites as the definitive source for data for internal projects then they’re far more likely to be publishing that data in a useful format and keeping it up to date.

    If data can be pulled live from a stable URL then developers can code their apps to automatically keep the data up to date by refreshing the file as often as necessary. Once a day wouldn’t be unreasonable.

    While this is (hopefully) a contrived example, you can list the opening times of your council facilities on http://opening-times.co.uk/ which has a very useful API for developers.

  2. Can you say “brittle”?

    Pehaps Neil would have been better off putting in a formal FOI request.

    This would have perhaps :

    -Got the council to sit up and listen
    -Identified the source of the data, the person responsible for updating their pool session data

    and perhaps lead to a meaningful relationship between them, and perhaps the publication of the data as csv or similar.

    Except of course that people leave and move around in councils.

  3. “Pehaps Neil would have been better off putting in a formal FOI request.”

    Please god, no!

    As Adrian says, councils need to talk to the consumers/developers and vice versa. I’m now trying to figure out how we engage, encourage local people (not the usual suspects) to engage.

    From the council point of view, it is all a bit suck-it-and-see at the moment. We’ve got some idea of who’s out there, but not a complete picture.

    And, we’ve only got limited resources – that is, a small bit of me – to think about presentation, structure, linked data, formats and all that – y’know – trivial stuff.

    Great post, btw!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: