Towards a global lookup service for corporate ids

We’ve often stated that the goal of OpenCorporates is simple (but huge): to create an openly licensed database containing every corporate legal entity in the world, and provide open URI ids for them. We base this URI on the ID issued by the company register, as this brings multiple benefits:

  • No Monopoly ID system  Because we don’t produce the IDs, we can’t claim any IP rights in the URI and this allows it to be freely used without restriction and without even any reference to us
  • Knowable and useful If you know the URI you know the company number and vice versa – this makes it suitable as an ID even if we haven’t yet imported the data for that jurisdiction
  • A route to more data  Critically any database of companies shouldn’t be a dead end, but a route to other sources of information, and the entry on the company register is one of the most important, usually having statutory filings, current status, and even occasionally director information.

However, one of the questions we often get asked is about other identifier systems. Couldn’t, shouldn’t, mightn’t we use these instead, and if you are using them, how can OpenCorporates help tie them together?

And it’s true, there are a multitude of them – tax numbers, for example, ticker codes, charity registration numbers, etc – some of which at first, particularly from a US perspective, appear to do the job of acting as a corporate identifier. We covered in detail the problems with this approach in this blog post for the Sunlight Foundation, so we won’t go through them again here, other than to stress that what they identify isn’t the corporate entity but typically a record of that corporate entity’s activity or registration in a specific field (e.g. being given tax-exempt status or a banking licence, or the issuance of a particular type of security).

That’s not to say, however, that these identifiers aren’t useful. In fact they are a vital link between the corporate entity and some of those other activities – e.g. it’s really useful to know if a company has a banking licence, and if so what the ID of that licence is. We’ve been importing and matching some IDs for a while, as a look at this screengrab of Bermuda company Signet Jewelers Limited shows. As you can see, we’ve matched this to an entry on the US Security & Exchange Commission’s EDGAR Register.

Now, we’ve gone a tiny step further and started to explicitly list the alternative identifiers for companies where we’ve got that data, as you can see below:

But we’ve not stopped there. We’ve also created web pages for each of those IDs (and for those in the linked data community, we’ve done dereferenceable URIs, although we’re still working on the RDF representation):

A couple of important points:

  • Notice the URL: http://opencorporates.com/identifiers/cik/832988 You can see that it’s based just on the ID for the identifying system – in this case 832988 – and a ‘short code’ for the identifying system, in this case ‘cik‘ for the US Security & Exchange Commission’s Central Index Key. That means that just like the URLs and URIs for the companies, you can use these identifiers without reference to OpenCorporates, use them to go direct to the register, and avoid issues of monopoly IDs. Another example is http://opencorporates.com/identifiers/ccew/292326 (292326 is the identifier on the Charity Commission for England & Wales register, aka the Charity Number).
  • You’ll notice that we’re also showing the associated piece of data we have for that ID, and through that, if we’ve matched it, the company that we believe it relates too. That’s important, both from clarifying the difference between an entry on a register the legal entity, and from providing a link to other data.

There’s a list of the identifying systems we’ve imported at least some data for at http://OpenCorporates.com/identifiers (and just like the individual identifiers there are dereferenceable URIs for each of these systems for the linked data community), but please do let us know others we should be including (we’re looking at the US FDIC bank register at the moment).

We’ll also be making this available through our open API on the next versioned release, but in the meantime you can already access the information as JSON and XML, simply by putting ‘.json’ or ‘.xml’ onto the URL, for example, http://opencorporates.com/identifiers/cik/832988.json and http://opencorporates.com/identifiers/cik/832988.xml.

It’s at an early stage, very much an experimental feature, yet already is showing great promise. Of course we’ve imported only a fraction of the ID systems that exist out there, and many of the IDs we’ve imported we haven’t yet matched to corporate entities (e.g. virtually all the EU VAT numbers), usually because this information isn’t made available in a consistent and open way. There are also bound to be useful comments about how we could improve things, or where there are bugs (that’s one of the benefits to doing open data).

But critically just by adding these simple features, and creating open URIs, it allows the development of that holy grail of corporate data: a global and open identifier lookup system.