This blog post is a somewhat wonkish explanation of how do we model that information (in layman’s terms, how we understand the information and store as data), and in particular, how do that in a way that works not just for the UK data, but for similar data from other sources. In this post, we’ll list the problems we’ve faced, the choices we made, and where we’re going next with this work. We believe that it’s important to be transparent about this, and like all our modelling, the schema for this is in the public domain under an open licence.
What is in the data from Companies House?
First it’s worth saying that the data from the UK’s PSC register is not reporting just who the beneficial owners are, i.e. the individuals who ultimately control companies, but also:
- companies that control other companies;
- controlling entities that are neither individuals nor companies, what they call legal_persons, for example government departments, some partnerships, and positions (e.g. the Pope);
- the fact that there is no beneficial owner publicly reported, and the reason for that – this is what we’re internally describing as ‘null statements’;
- the mechanisms of control (each entity may control the company through multiple mechanisms).
This is presented by Companies House in a number of different ways – as a list of PSCs, as PSC Statements (for example why the company has not reported a PSC, or whether a particular PSC is redacted), and as exemptions (why it is exempt from reporting a PSC). The PSC objects might be an individual PSC, a parent company, or a legal person. This gives a lot of rich information, but does present challenges for modelling the data.
Inside the PSC Object the nature of control is represented using a code list (i.e. an enumerated list of allowed values), in which the types and values of the control are embedded, for example, ‘voting-rights-50-to-75-percent’, which means “The person holds, directly or indirectly, more than 50% but not more than 75% of the voting rights in the company.”
As an aside, Companies House has been pretty good about responding to questions about the PSC register on their developer forums, but we’d love to have seen them doing what GDS often do, and blogged about the thinking behind their decisions.
How to model this data
For OpenCorporates, understanding the data from Companies House is just the first step. A much more tricky question is how we should model it, bearing in mind that the decisions Companies House took are informed by a UK-specific context (specifically the UK legislation), and their existing data and representations, whereas we at OpenCorporates needed to come up with a model which works not just for this data but for similar data from other jurisdictions, and related data such as other corporate relationships, and other beneficial ownership data.
In the context of beneficial ownership, there’s precious little genuine beneficial ownership data out there, and what there is tends to be of fairly low quality and granularity. So, pretty much the only certainty is that we’ll have to change the model and the data, either because the underlying data/concepts have changed, or because we’ve made errors in interpreting/implementing… or most likely both (which is why it’s so important to do this sort of work in public).
We have an internal methodology for dealing with this, and we base the decisions on:
- a coherent understanding of the underlying data, and the concepts underpinning it;
- the applicability of the model across multiple jurisdictions and domains;
- what’s going to work best for our users.
We were helped in this by the groundbreaking work OpenCorporates had previously done on modelling corporate relationships, which meant that not only had we faced many of these issues before, but that we knew (sometimes from painful experience) where the quick wins were… and where the issues were too.
The first big decision we took – after much debate and whiteboarding – was not to separate the PSC objects from the other statements, as Companies House had done. We did this for two reasons:
- All are types of statement made about the control of a company, even if the statement was saying there is no control, or that the controlling person is redacted for security reasons; the only difference is that one contains embedded relationship data.
- When you look at the PSC objects, they are sometimes saying rather different things – in the case of a company controlling another company, it is part of a corporate structure, with potentially many layers above it, and with maybe a person at the top… or maybe not. That’s quite different from a person who directly controls the company.
The next decision was not to go down the Companies House route of listing control mechanisms as dumb codes, e.g.’voting-rights-50-to-75-percent’, but to convert them to structured data. This, in truth, was a no-brainer for us, as it was not only the most practical way of having a model that was applicable across multiple jurisdictions and domains, but was also essential for giving our users (particularly our API users) the data in a form that they needed.
Finally, we decided to model the both the controlling companies, legal entities and individual PSCs as controlling entities, but with a boolean flag of ‘ultimate_beneficial_owner’ for the individual persons to signify that there was a declaration that they were a beneficial owner, based on the FATF definition (‘the natural person(s) who… exercise ultimate effective control over a legal person or arrangement‘).
With all this we ended up with the idea of a ControlStatement:
A Control Statement is a statement by a company or entity about a the control of a company.
Pretty straightforward, but this actually includes a number of subtleties that are worth spelling out:
- A Control Statement could be more verbosely be called a ControlRelationshipStatement, as it is a statement about relationships of control.
- However unlike a traditionally modelled Control Relationship, it can handle Null Relationships, i.e. where there is no controlling entity. This is important, as there’s a significant between there being no controlling entities and the company not reporting on its controlling entities – as is the case for most companies in the world, and indeed given that it won’t be until a year’s time that all UK companies are reporting on their controlling entities. In a global context, capturing this difference is essential.
- In general a company can have many ControlStatements, and statements/relationships should be stored at the most granular level possible. So in the case of the UK data, a company can have 0-n controlling entities (PSCs in the UK Companies House representation), and 0-n statements about why it isn’t publishing information about a PSC (or PSCs). Each of these should be stored as individual ControlStatements, as they may change independently of the others
- The mechanisms by which control is exercised is stored as a structured control mechanism object. This allows not just the mechanism (e.g. share_ownership or voting_rights) to be captured, but also the details of that control, e.g. the percentage of the shares owned, or in the case of Companies House, the range the shareholding falls within
- Despite the statement about granularity of ControlStatements, we believe a ControlStatement can have more than one controlling entity, as for example a share or shares can be held jointly by parties (as opposed to being held individually by them). Companies House’s model doesn’t seem to allow for this, and we’re not sure how they will cope with the situation where, for example, two individuals jointly hold 90% in a company – would they consider each party to have 45% each, or that both sides have 90%? So, for the moment, ControlStatements in effect have 0..1 controlling parties, but can have 0..n
Finally, because of the way we derive corporate relationships from the underlying statements, by using the Control Statements approach, OpenCorporates internal system converts the control statements to control relationship links in our underlying graph database, allowing us to compute the corporate structures from them.
What this all looks like
To finish off, we thought it would be useful to walk through with a few simple examples.
First, let’s take a simple company (say, 1 ACE TRAININGS LIMITED, picked pretty much at random), with a person (Mr. Rizwan Ahmad Majeed) who owns the vast majority of shares. In this case, the Companies House data records this person’s control as “ownership-of-shares-75-to-100-percent”. In fact, reading the Companies House documentation, it’s clear that ownership of the shares is >75% and <= 100% (such nuances are important when combining the data).
Given Mr Majeed is an individual we can say that he is an Ultimate Beneficial Owner:
You can also see more details about the beneficial ownership information for this company. In this case, it’s very straightforward…
… but in others it’s rather more complex.
…but because that company has an Ultimate Beneficial Owner, we’re able to calculate the UBO for the child company:
and show both controlling entities and controlled entities on the parent company page:
Clicking through to the ‘details’ link to the right of mechanisms block shows the underlying structured data we’ve stored in the ControlStatement, including the detail about those percentages. Here’s the one about control of 1 ACE TRAININGS LIMITED:
All of this is of course available through the API, and we’ll be blogging further about that in the coming weeks. Finally, one thing we’re certain of: this isn’t the last word on modelling Beneficial Ownership data. As more and more such data is released, our understand of it will evolve, and with it the data model. For now, if you’ve got any questions, ping us in our Slack channel, and if you’re a skilled ruby coder who loves dealing with hard data issues, do get in touch. We’re hiring!