This weekend, the EU published its public draft of its Business Vocabulary (along with Person and Location), to help make it easier for organisations within Europe, including governments themselves, to exchange information relating to companies.
Now if I haven’t already lost your attention, I think this is pretty important, and not just those who handle corporate data, but to also to all those interested in openness in general, for three reasons:
- The vocabulary democratises the ability to share this important information, removing the need for restrictive central registers, which are inevitably tied up with process, governance and access issues;
- The results are free of IP restrictions;
- The fast, lightweight process was an example of a huge organisation (the European Commission) for once being focused on solving immediate problems rather than grandiose undeliverable: in short the EC went for a Minimum Viable Product.
I think it’s worth tackling these in a little more detail, but feel free to skip the first or second and skip to the process bit, if that’s what floats your boat.
[I should also state that OpenCorporates was a member of the Working Group that put the vocabulary together and was fairly heavily involved in the Business Vocabulary discussions, having arguably done more to open company information than any other organisation in the recent history. It's also worth saying that this process wouldn't have achieved anything had it not be for the excellent work of the W3c's Phil Archer and the EC's Piotr Madziar and Vassilios Peristeras]
1 The Business Vocabulary
Like including mathematical equations in a book, the phrase ‘business vocabulary’ is an excellent way of putting off any ‘normal’ people who might otherwise be interested. Yet like the protocols that underlie the internet, getting this sort of thing right matters, and really all we really mean by business vocabulary is not some heavyweight XML schema, but a lightweight set of agreed terms and principles that remove the barriers for communication.
In this case it’s very lightweight, as there as there’s only really one critical part: what is a company, and how do we identify it? Despite the simplicity of the question, this is an area in which it’s easy to tie yourself into knots, should you want to. Countries have very different ways of thinking of companies (are partnerships or sole traders companies, for example?), and different ways of creating them too, sometimes handling them centrally, or giving the job to regional registers or courts.
The Business Vocabulary neatly sidesteps this, instead focusing on two core elements: a Legal Entity and a Formal Identifier. If you’re going to exchange information about a company it needs to be a legal entity, and it needs to have a (single) formal identifier. And that’s pretty much it. The only thing to add is that the formal identifier is made up of two parts: the identifier (e.g. a ‘company number’) and an issuing authority (e.g. a company register), which would ideally be identified by a URI.
See that wasn’t so bad was it ;-)
There are a handful of other properties that are listed, including the date of issue of the identifier, the registered address, company type, some of which are more clearly defined than others, but really that’s it. But already, it allows company registers around Europe to start publishing their data, and consuming other company registers’ data (for example, to understand the status of home companies for foreign branches they have registered) without the need for a highly centralised clearing house with its own closed system of data exchange.
It’s also worth stressing that this solution is not tied to any particular representation. It could be turned into a string identifier, linked data, or XML of some sort. Whichever is used, transformation from one to the other should be easy.
2. Open data and open standards
One of the best outcomes of the process, is that the resultant Business Vocabulary is genuinely open, unencumbered by IP restrictions. More than this, however, the whole process was focused on this outcome, with all agreeing this from the start. (In fact all participants were required to explicitly agree that their contributions would be free of IP restrictions, meaning the contributed use cases and discussions on them can also be openly published.)
This means, for example, that the list of identifying authorities also needs to be free of IP restrictions, and it’s this sort of detail which really matters when we’re talking about open standards – one solution, particularly if the vocabulary is to be used outside the EU (which it certainly could be) would be for the W3c to maintain and publish this list, given its interest in the semantic web and IP-free solutions.
3. Minimum Viable Product process
Although it was never called this, the concept of a Minimum Viable Product seemed embedded in the process from the start, and it took a very different route to other governmental ones I’ve been involved with. Contributions were encouraged to made on a wiki, conference calls were held weekly with a strict one hour limit, and we were given a target deadline of the end of January (we started in November). No long meetings. No backroom deals. Admittedly by technology startup standards this may be slow, but for government bureaucracy standards this was definitely agile, and the tight timings really focused people’s minds on results.
So congratulations Phil & co and let’s hope we not just get some useful feedback on the vocabulary, but that other governmental organisations can learn from the process too.