The past few months have been pretty busy at OpenCorporates, with literally hundreds of commits, improvements, bugfixes and tweaks. The underlying search has been improved (with more improvements planned), we’re now pulling in more data about companies, and we’ve added more countries too.
…Or rather, we’ve added more jurisdictions, as it’s not just countries that register companies. In the US, for example, companies are registered by the States rather than the Federal government, and last month we added the first of the US States, somewhat arbitrarily choosing Michigan, with just shy of a million companies.
This meant we needed to choose the short name structure we were going to use for the states/jurisdictions, and in the end, following suggestions from users, we went for ISO 3166-2, or rather an lower-case underscore version of it. So a Michigan company with company number 090657 (LA-Z-BOY INCORPORATED, since you ask) would have the URL of http://opencorporates.com/companies/us_mi/090657 (and in the linked data world has the resource uri of http://opencorporates.com/id/companies/us_mi/090657). Simple, predictable, guessable and meaningful.
Of course, this meant we had to reconsider our use of ‘uk’ to represent, er, UK companies, as the ISO 3166 code for the UK is actually ‘gb’ (and, no, United Kingdom and Great Britain are not the same). After much discussion, we decided to follow ISO standard and go for ‘gb’, and redirect all existing ‘uk’-based URLs to ‘gb’ ones (as we write, the solr search index is rebuilding all the uk companies to appear as gb companies — it’s a big index, but in the next few hours there may be a little strangeness in the filtering, though this doesn’t affect the reconciliation service).
The advantage of this is clear: if you use 3166 or 3166-2 to identify countries and areas (and it seems to be the most wide-spread standard on the web), you can seamlessly guess OpenCorporates URLs. And if you come across OpenCorporates URLs (or linked data URIs) that have been used to identify companies, you will automatically know the country or area we’re referring to.
Nice, though it is to come up with good URLs, and add millions more companies, we’ve also been adding more data, and in an epic bit of coding Rob wrote an importer for the WIPO trademarks (FTP, huge XML files, etc, etc), and as a consequence we’ve matched trademarks to 9154 companies in all the jurisdictions we’ve got companies for. And as we add more jurisdictions we’ll also be matching up more trademarks to the companies that own them.
Finally, we’ve also been hearing of OpenCorporates being used in some interesting situations to quickly search for companies, including in some offshore tax offices to help identify companies in other offshore territories :-). Do let us know of other similar uses.
p.s. We’ll be at the Open Data Campaigning Camp on March 24 in Oxford, to show how the reconciliation service can be used by campaigning groups.