One of the key aspects of OpenCorporates has always been that word ‘open’. Open here means something very specific. It means freedom (as in free speech), not just free beer.
The open source movement has travelled this road before, not without some struggle, and so one of the key moments in the early days of the open data community was to quickly agree on what it meant by ‘open’ – the result was the Open Definition:
A piece of content or data is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and share-alike.
However, when open data becomes a sexy meme, seized upon by governments that don’t even have a Freedom of Information law, still less actively publishing open data, and companies whose idea of open data is ‘all rights reserved’, it’s important that we emphasise why this is so critical, and why it sets OpenCorporates apart.
In a world of Big Data, when power comes from the ability to combine data together, unless you have power to use, reuse and redistribute, you are on the powerless side – whether you a citizen, a small company, an NGO, or a government department.
And openness is about another thing too – provenance, or where you’ve got the data from and when (it’s why Wikipedia keeps a record of who edited what). With a complex and important dataset such as companies, which requires a degree of crowd-sourcing, it’s even more important that you show the provenance, and make that available as open data too.
These then have been the guiding principles behind OpenCorporates from the beginning, and why openness as in free speech (although we like free beer too) is so important to us, and why we’ve concentrated on this and on working with the community to add more countries, rather than sexy visualisations.
But despite this, it matters how easy it is to reuse the data. Our Google Refine reconciliation service was (and remains) groundbreaking, but to be honest, there was a fair bit of scope for improvement in other areas. And so now over the past few weeks, we’ve been working on making it easier to get the information out.
The first step, is exposing all of what you see on the web page as data, and while we have always done this for companies, we’re now rolling it out sitewide. What does this mean in real terms? If means that if you want the latest filings for a company as data you can have them; if you want to know (as data) the source of the official journal notice about a company you’ve got it. You can see information on the bottom right of each page, together with the ways we make it available (also as RSS feeds and RDF in places).
We’re now working on ways of making the faceted search function available as data, and would welcome feedback on that or any other aspects of the API. And of course, with every API call, and every contribution you make to OpenCorporates, every tweet about a company on OpenCorporates, you can be sure you’re contributing to a database that’s not only free as in free beer but free as in free speech too.
Oh, and that milestone? Less than two months after hitting 20 million companies, we’ve broken the 25 million mark, and are fast heading for 30 million.