Crunching the numbers on Open XBRL Accounts Data

Icon designed by Scott Lewis

It’s been 4 months since we started importing UK company accounts data as XBRL (a worldwide standard format that allows company accounts to be easily readable by computers), thanks to the groundbreaking work by UK Companies House, who followed on from previous open data releases and published them as open data.

Now that we have over a year’s worth of accounts statements for many UK companies,  we can take the opportunity to look at the numbers behind electronic filings.

What’s the state of XBRL in the UK?

While Companies House and HMRC are committed to the XBRL format for future accounts submissions, it is not yet universal – accounting software vendors are taking longer than expected to create a software that produces the correct output, and unfortunately filing in this way is not yet mandatory.

So just how many UK companies are filing accounts electronically?

For January 2014 (1st Jan – 1st Feb)

65.0% of annual accounts filed electronically

Source: Companies House

This tallies with the 137,845 annual company accounts that are available as XBRL data through OpenCorporates for January 2014. For well over half the companies in the UK to be publishing accounts as open data is a great step forward for company transparency, particularly as it is not yet mandatory, and most of the companies are SMEs. However, when you dig a little deeper there’s a worrying wrinkle to this picture.

PLCs – public companies that are not so public

So, what about the big boys: Public Limited Companies? They have far more resources available than those SMEs that are doing such a great job of filing accounts as open data. Well, here the picture is not so rosy; in fact, it’s pretty darned embarrassing as only a minority are filing XBRL. In fact just how bad this is can be seen by a quick filter of UK companies on OpenCorporates:

Screen Shot 2014-03-17 at 00.14.30

That’s right: just 364 ‘Active’ PLCs have filed accounts as XBRL (you can also get the result of this query as open data using the new OpenCorporates API). What proportion of the total number of PLCs is this? Shockingly it’s just 5% (there are about 6,700 ‘Active’ PLCs), and even worse, the vast majority (over 90% of these) are not the pillars of industry and commerce you might expect, but dormant companies. To be fair, there is a problem in that Group Accounts cannot yet be filed electronically, but Companies House tells us these only make up just over 30% of PLC accounts.

So that’s something like 65% of PLCs that could be filing as XBRL, but that weren’t – the opposite of the SME situation.

This is crazy, and we’re sure was not what was intended by the Department of Business Innovation & Skills when the rules were drawn up. It also has particular relevance given the government is going to be building a new Beneficial Ownership register, and one proposal is that listed companies would not have to register on it (though not all PLCs are listed companies).

Many PLCs do publish glossy annual reports on their own websites but the key difference here is that the facts and figures are not easily available as data (involving either re-keying the data from the reports, or buying the data from someone who has done just this). This not only reduces transparency and worsens corporate governance, but is also a barrier to data reuse and innovation, and we would call on the government to make XBRL filing mandatory for all PLCs (and for Companies House to quickly support Group Accounts electronic filing). 

One for the armchair auditors

The other advantage of publishing free accounts data in this way is that it breaks down a cost barrier in examining a wide range of accounts for trends and anomalies. Under the paid accounts model there may not have been much interest in a fairly nondescript company run from the suburbs of Croydon, but when you can see they’ve issued £2.5 billion in share capital it might warrant a closer look. We’ll be making this sort of analysis and filtering accessible through the website (and API) in the future, but do contact us if you’d like to do something cool with the dataset, or if you are interested in doing academic research on the data.

Digging deeper with the new API

Finally, with the release of the OpenCorporates’ new API you can now get the core financials for companies that have filed as XBRL. Not only that, but you can also see where we got this from, and even download the original XBRL to analyse it yourself. You can find more details about the new API here:

Xaviexavierr is a developer working for OpenCorporates who likes to talk about anything to do with Ruby, Clojure or jazz guitar. Sometimes he likes to combine all three to make unlistenable music.  

Posted in api, open data, opencorporates | Tagged , , , , , | Leave a comment

New API version released: corporate networks, accounts, and more

We’re proud to announce that version 0.3 of the OpenCorporates API is now live, with many, many new features, improvements (and even a few bug fixes). Along the way, we’ve also given it a visual refresh:

OpenCorporates API front page

The major improvements are:

  • Access to corporate networks – our open corporate network work is truly ground-breaking, and providing access via the API for others to use and interpret is a major step forward in the world of open corporate networks.
  • Access to the underlying data behind corporate network data. As explained in the blog post explaining the underlying modelling of the corporate network data, we actually have very, very granular data on which the networks are built, and we’re making this available in the interests of full transparency, so that cool new interpretations can be made from it, and because it allows the data to be incorporated at a much more granular level that has previously been available (important in some industries such as credit metric scores).
  • Improved provenance/granularity. Internally we’re in the process from migrating from our existing ‘datum’ model to ‘statements’, which provide much more detail, more granularity (particular around changes in data) and more detailed provenances. This is the power behind the corporate network data, and will ultimately become the default. We’ll be writing about this further in the future, with tutorials on how to use it, but we think this is a major step forward in business information modelling.
  • Access to financial accounts data Thanks to UK Companies House releasing their XBRL data as open data we have over 1.2 million UK companies with accounts data, and will be extending this with other sources in the future
  • Rich filtering and ordering of company searches allowing users to access companies that have been added or incorporated by status, jurisdiction and many other facets. This is an incredibly powerful feature, and has obvious immediate usage, such as newly incorporate companies with a given industry code, or newly dissolved ones with Goldman in the name, etc
  • Secure access via https This will like be enforced for all calls in the future
  • API usage data – if you have an API key, it’s important that you know how many API calls you’ve made, and how many you’ve got left. This information is now available via an API call and on your OpenCorporates account page:

Screen Shot 2014-02-27 at 12.02.16

Full documentation for v0.3 is available on the API website and this version will be available as beta for the next week or so (i.e. you’ll need to specify it in the API request), after which it will become the default and version 0.1 will be retired. You can keep using version 0.2 (the default at time of writing this post) by specifying it in the API request, and this will be supported until v0.4 comes out.

We’ve also had major improvements to the getting hold of API keys, and non-open-data users can now purchase API plans online and get immediate access from as little as £99/year (previously it was a manual and somewhat opaque process). Just go to (you’ll need to register for OpenCorporates if you haven’t already done so). Roll on open corporate data!

Posted in api, milestone, networks, open data, opencorporates | Tagged , , , , , , | Leave a comment

Understanding corporate networks. Part 4: how we record the data

In parts 1, 2 and 3 of this series, we explored the complex world of corporate control, and how it is described in various regulations. We found that a company may control other companies in many different ways, from majority and minority share holdings to contractual relationships.

At OpenCorporates we believe corporate control data is the cornerstone of a useful Open Corporate Data platform. Without it, there’s no way to link apparently unconnected regulated events that are in fact related. For example, in 2009, the City of Prineville, Oregon granted a 15 year property tax exemption to a company called Vitesse LLC. This, in fact, is a tax exemption that ultimately benefits Facebook Inc:


Therefore, one of our goals at OpenCorporates is to capture corporate control information as Open Data. There are already sources of corporate hierarchy data online, but these are not open, and we consider this a problem.

Why? Not because you have to pay to access this kind of information (though this shuts out many potential users, and thus has negative effects on data quality). Rather, it’s because these datasets are a proprietary asset: their value relies on the fact they are hard to reproduce. Such vendors, therefore, have an interest in not telling you where their data came from. By contrast, at OpenCorporates we ensure that every statement we make has a full provenance, so you can check our facts – which both gives you confidence, and allows you to let us know if we’ve made any mistakes.

As we’ve seen, the notion of “control” is hard to define, and depends on regulations that both vary between jurisdictions, and often rely on subjective judgement to interpret. This is the main reason why it’s important to know the sources for data: it allows you to understand those judgements, or even create your own models of control based on the underlying data, if you have the time! If you don’t have the time, of course, the control models we come up will hopefully be good enough.

The main purpose of this article is to describe how we model our data in detail. We want our users to understand what the data on OpenCorporates means, and to decide whether the control models work for them. We also think this granular, provenanced approach to facts is innovative, and rather than attempt to patent it, we’d rather share it with the whole community. 

How we model the data

In the core OpenCorporates database, every bit of data can be described as a:


What do each of these terms mean? I’ll describe them in reverse order:


A “provenance” is everything you need to decide if you want to trust a statement made by OpenCorporates. Here’s one:

A provenance on OpenCorporates

A provenance on OpenCorporates

The most important part is the source: the primary reference where you can find the information yourself. This should always be something you can check yourself – normally a link to a website, or a copy of an original document.

The provenance also tells you who found the information (in this case some automated software known as a “bot”), when it was found, and the confidence that our interpretation of the source is correct. Crucially, the only sources which we consider trustworthy enough to include by default are regulatory sources, by which we mean official records like government registries, and notices which are backed by law.

Companies or Placeholders

A “placeholder” is the term OpenCorporates uses to describe something we believe is probably a company. For example, this is how Facebook’s subsidiaries are listed in a recent regulatory filing:

Subsidiaries of Facebook in their 10-K SEC filing, 2012

Subsidiaries of Facebook in their 10-K SEC filing, 2012

This seems to be telling us that a company called Vitesse, LLC, registered in Delaware, is a subsidiary of Facebook, Inc. But there are various reasons this might not be the case, such as:

  • The person inputting the text may have made a typing error, or written the wrong place of incorporation
  • The name of Vitesse, LLC may have changed since the filing…
  • …and there may now even be another, different company called Vitesse, LLC.

Thus, being pedantic, this document is telling us that these are probably companies. At OpenCorporates, we take pride in pedantry, so we refuse to call Vitesse, LLC a company until we can prove it exists, with reference to an entry in the official corporate register for Delaware (and of course there are some company registers that do not make this information freely available). Until then, we call it a placeholder.

When we feel we can reliably say that a placeholder is, in fact, a company, we create a new record in our system which we call a “company reconciliation link”, and record the provenance for that link as a separate data point.


A “statement” is a fact or assertion that we’ve derived from a primary source. There are various types of statement in OpenCorporates, such as “Licences” (a permission for a company to engage in a regulated activity) and “Subsidiary Relationships”:

Example statement from OpenCorporates

Example statement from OpenCorporates

Technically, a statement is composed of several bits of information:

  1. The data point. In the example above, this is “There is a subsidiary relationship that existed on December 31, 2012″. The data point is derived directly from information in the primary source.
  2. The subject company or placeholder. In the example above, this is Facebook, Inc.
  3. The object company or placeholder. In this case, Vitesse, LLC.
  4. The verbs linking the respective companies to the data point. In this case, Facebook, Inc “has a subsidiary”, and Vitesse, LLC “is a subsidiary”. Internally, we call these “placeholder data links

Optionally, there may also be company reconciliation links linking the placeholders to companies (as described above).

Here’s how we represent the structure of a statement internally:


The schema for an OpenCorporates statement

And here’s the same structure, with the data from the Vitesse, Inc example we’re using:

How a statement about Facebook is recorded

How a statement about Facebook is recorded

Crucially, every component in the diagram above also comes with a provenance. This means that we know:

  • where the data came from (in this case, the SEC);
  • where the placeholders and data links came from (in this case, they were inferred by software, but could be inferred by a site editor);
  • where the reconciliation links came from (either a person matching placeholders to companies, or software again); and
  • where the companies came from (corporate registers, invariably).

How we model corporate control networks

So those are the bits of information that lie behind any assertion we make on OpenCorporates.  When it comes to corporate relationships of control, such as the nice tree diagrams excerpted above, we are primarily interested in the following types of statement:

  • Subsidiaries: statements that X is a subsidiary of Y
  • Share Holdings: statements that X holds shares in Y
  • Acquisitions: statements that X acquired Y. Often these statements are derived from press releases, so are not considered as reliable as other kinds of statements.
  • Branches: entities permitted to operate in a jurisdiction but with a legal personality registered elsewhere.

As we’ve seen, to make statements about control relationships, you have to make a number of assumptions – for example, what percentage of share ownership constitutes control. You also have to think about the confidence level you’re willing to accept; is a press release a sufficiently reliable source for you? Or are you only interested in regulated information of the kind available in corporate registers?

This kind of decision involves a considerable amount of judgement, analysis and time, which is why we’ve come up with a way of combining these statements into corporate networks on our network pages. We currently combine subsidiaries, shareholdings and acquisitions to form networks, and are planning to add branches in 2014. Our network pages allow you to set the confidence level you’re willing to accept, and the shareholding level you want to consider as implying control.

Options for controlling which companies to appear in a network view

Options for controlling which companies to appear in a network view

Additionally, if you click on a company name, you can view the provenances for the statements behind its presence in the network, and check them for yourself.

Given the complexity of defining corporate control, it’s impossible to guarantee that any corporate graphs are accurate, but this is precisely the strength of our Open Data approach: we have no incentives to hide the sources of our information, and you have every incentive to help improve our data for the public good.

Our way of modelling networks of corporate control is just one suggested way; we encourage you to use our modelling as inspiration, and build on OpenCorporates‘ open data platform to produce your own models. Let us know what you manage to find out, and we’ll help you share this information with the community!

Posted in howto, networks, open data, opencorporates | Tagged , , , , , , , , , , | Leave a comment

Announcing Open LEIs: a user-friendly interface to the Legal Entity Identifier system

Open LEIs: Making sense of the Global LEI System

Today, OpenCorporates announces a new sister website, Open LEIs, a user-friendly interface on the emerging Global Legal Entity Identifier System.

At this point many, possibly most, of you will be wondering: what on earth is the Global Legal Entity Identifier System? And that’s one of the reasons why we built Open LEIs.

The Global Legal Entity Identifier System (aka the LEI system, or GLEIS) is a G20/Financial Stability Board-driven initiative to solve the issues of identifiers in the financial markets. As we’ve explained in the past, there are a number of identifiers out there, nearly all of them proprietary, and all of them with quality issues (specifically not mapping one-to-one with legal entities). Sometimes just company names are used, which are particularly bad identifiers, as not only can they be represented in many ways, they frequently change, and are even reused between different entities.

This problem is particularly acute in the financial markets, meaning that regulators, banks, market participants often don’t know who they are dealing with, affecting everything from the ability to process trades automatically to performing credit calculations to understanding systematic risk.

The LEI system aims to solve this problem, by providing permanent, IP-free, unique identifiers for all entities participating in the financial markets (not just companies but also municipalities who issue bonds, for example, and mutual funds whose legal status is a little greyer than companies).

Part of OpenCorporates‘ mission has always been to surface and connect all publicly available information about companies to the companies (specifically the legal entities) to which to it refers. As part of that, we’ve been proud to be involved with the LEI system, since the beginning of last year, serving on the initial advisory panel, contributing at events, and most recently playing an active role on the Private Sector Preparatory Group.

Setting up something as significant and important as the LEI system is not trivial and it’s a testament to some of the individuals involved that it’s achieved so much in such a short space of time. Many people, even in financial markets, still haven’t heard of the LEI system, however, and it’s vital not just that the audience is increased, but that end-users  – financial markets, regulators, the wider society – start to use and provide feedback on the data and system.

However, this is not currently easy for users, as the system as a whole hasn’t been fully set up (the Foundation, which will run it, has yet to be founded, for example), even though LEIs are already being used in some markets (notably derivatives). Because of this, there’s an understandable lack of information about the system, and the data. In particular there’s nowhere to browse or search the data, and get a feel for what it’s really like, and where the problems are (though our friends at make a useful data dump of all the LEIs available).

This is where Open LEIs comes in: A browseable, searchable user-focused interface on the LEI system, with five key features.

First, it provides the ability to search the name of every entry, and it’s even smart enough to return those entries with slight mispellings (Mordan Stanley, for example):

Entities called Morgan Stanley in LEI system

You can also choose to search addresses, to find all those with, for example, with “rue Gabriel Lippmann” in them:

Search LEI system for addresses

Second, you can also browse the entire database of 100,000-plus LEIs, drilling down and filtering by country, legal form, or the registering body (the ‘Local Operating Unit’ or LOU in LEI parlance).

All LEIs/Legal Entities in the Global LEI System

This is incredibly powerful, and allows complex queries, particularly when combined with search, such as: Show me all the entities named ‘Goldman’ registered in the Cayman Islands with a legal form of ‘SOCIETE A RESPONSABILITE LIMITEE‘.

Third, there is an entry for every LEI, with a permanent URL for each (the URL is of course solely based on the LEI). This an entry for a Luxembourg company, for example:

LEI Record for 3M Asset Management S.à r.l.

Fourth, it links to OpenCorporates, where applicable, allowing you get additional data on the company. This is one of the areas where dealing with real-world data has exposed a number of data issues, from missing company numbers (in the above entry the business registry identifier is missing, though if you click through the OpenCorporates link it we show the right company) to incorrect or incomplete identifiers, missing jurisdictions, or inconsistent naming of company registers themselves. We’ll be reporting these problems back to the LEI system so that the issues can be resolved as the systems are firmed up (remember, it’s still a system being built).

Fifth, it provides all of this as data – XML and JSON – allowing easy access to the underlying data (note: we expect the underlying schemas to change as the LEI systems data model evolves – if you need a versioned API, please contact us).

Screen Shot 2013-11-26 at 22.52.55

Finally, we hope that the website will both act as a prototype for future work, including by the LEI system itself, and will allow end users to start using the data with a low entry barrier. If there are future features you’d like to see, or have comments on Open LEIs, please email us at

Posted in homepage, open data, openleis, standards | Tagged , , , , , , , , , | Leave a comment

Understanding corporate networks. Part 3: where’s the data?

What is a subsidiary? What do we mean by control? In Part 2 of this series on understanding corporate networks, I explored these questions and how they are answered by regulatory regimes around the world.

It’s one thing to define “control” in the regulations and accounting standards; however, without disclosure, we can’t begin to map corporate networks, which here at OpenCorporates we’re pretty focused on. So where can we find the data?

The obligation to list consolidated entities

In the UK, the situation is surprisingly good… in theory. The UK regulations require every company to disclose its subsidiaries in their annual accounts, which must be submitted to the UK corporate registry each year.

The regulations allow an exemption on grounds of “excessive length”, which companies with large networks invariably adopt; however, they must report a full list of subsidiaries at a later date. This list is typically provided as an annexe to the annual return (rather than the accounts); for example, a full list of Tesco PLC’s subsidiaries can be seen pasted at the end of this document (spoiler alert: it’s a poor-quality scanned image of a table turned on its side!). This complicated picture is made much worse, however, by basic non-compliance: according to some research, 30% of all UK Companies are not asked to submit returns by the regulators.

In the rest of Europe, the situation varies. The good news is that under EU regulations, every member state must collect (and make available) annual accounts from companies. The bad news is that national accounting practices vary considerably. (It’s also a problem for open data that most countries in the EU opt to charge for access to this information – you can see how they compare in our Open Company Data Index.)

The light at the end of the tunnel is the International Financial Reporting Standard (IFRS).  All listed companies in the EU must use IFRS, and countries around the world are increasingly converging on this standard. A recent update to IFRS, due to come into effect in 2014, requires a parent to disclose information about all its subsidiaries and related entities.

In the US, however, the situation is not so good: under US GAAP there is no requirement to provide a list of subsidiaries, and the signs that the authorities are moving with the rest of the world towards IFRS are patchy. So how can we find out anything about corporate networks in the US?

The SEC to the rescue… or not?

In the US, the Securities and Exchange Commission exists to protect investors and ensure an efficient, fair market.  It requires large, listed companies to disclose meaningful financial information to the public, including annual accounts (not so useful to us), and tables that list “significant subsidiaries” (which are potentially very useful).

Here’s one such list of significant subsidiaries, as an example:

A list of subsidiaries in an SEC filing

A list of subsidiaries in an SEC filing

Because annual accounts in the US don’t list subsidiaries, this makes SEC filings one of the most important sources of information about the corporate structure of large companies. However, this crucial source of subsidiary data seems to be drying up.

As we investigated last year with the Wall Street Journal, large companies like FedEx, Microsoft, and Raytheon are quietly dropping the number of subsidiaries they report, even while their corporate structures remain the same, or get more complicated. Google Inc, for example, reported more than 100 subsidiaries in 2009. In 2013, they reported two.

How is this possible? It’s not clear, but when asked, a Google spokesperson said the company is in compliance with SEC rules regarding the disclosure of subsidiaries.

When a company makes a filing, it is asked to list “significant subsidiaries”. There are four ways the regulations that define “significant subsidiaries” could be open to interpretation.

  • The definition of “subsidiary” is an affiliate controlled … directly, or indirectly through one or more intermediaries. As we’ve seen previously, the notion of control is slippery and open to interpretation.
  • The definition of “significant subsidiary” refers to a subsidiary, including its subsidiaries. This could be interpreted as saying that if Google Inc owns Google UK via Google Ireland, then only Google Ireland needs to be reported.
  • The regulations define “significant” in terms of financial flows between the subsidiary and the parent, with anything lower than a 10% threshold not being significant. It is possible to structure a company so that there are enough companies to split financial flows into small chunks, and in fact the larger the number of operations, the more likely it is that no single one reaches that 10% threshold.
  • These financial flows can exclude amounts attributable to any noncontrolling interests; again, open to a range of interpretations regarding control.
  • The guidance states that “information required by any item or other requirement of this form with respect to any foreign subsidiary may be omitted to the extent that the required disclosure would be detrimental to the registrant.”

So, while the UK and the EU have adopted clear regulations regarding the disclosure of subsidiaries, the situation in the US is not great, and appears to be getting worse: neither the US accounting standards, nor the SEC, requires a full list of subsidiaries (and of course, the filings are in the form of free text, often difficult to parse).

However, there is one area where we can still shine a light on US company networks: the financial sector.


Banks have been more strongly regulated than ordinary companies for quite a long time, and the regulatory regime in the US is particularly strong. It is overseen by the Federal Reserve, who are empowered by a raft of legislation (such as Regulation Y, the Bank Holding Company Act, and the Sarbanes-Oxley Act) to gather data whenever there are any structural changes to a banking group.

Much (but not all) of this data is made public on their website, and this is the source for the amazing corporate hierarchy visualisations we released a few months ago.

So how does this Federal Reserve data define “control”?

In Regulation Y, control is defined at a 25% voting equity threshold. It is also defined as “the power to exercise, directly or indirectly, a controlling influence … as determined by the board”. A similar definition is contained in the Bank Holding Company Act, and at least one court has stated that this can mean “the mere potential for manipulation of a bank”.

The Federal Reserve banking data is, therefore, the single best source of structured data that exists for companies in the US.  The information OpenCorporates is able to compile about bank hierarchies shows what could be possible if the data were opened up, especially when compared with the dwindling corporate structure available from the SEC.

Shopping for the most secret jurisdiction

It should come as no surprise that the reporting requirements regarding subsidiaries vary across the world. As we’ve seen, all companies in the UK must disclose their subsidiaries; all listed companies in the EU should report them from 2014; and large listed companies in the US should report “significant” ones.

What about the well-known tax havens? To take one example, the Cayman Islands don’t make any of the information they gather publicly available. They don’t require accounts from most companies, and where they do, they don’t mandate any particular accounting standards.

From the point of view of a company who doesn’t want to disclose its corporate structure, therefore, this is a great opportunity. From the point of view of an investor, consumer or regulator who needs to understand risk, this is a big problem.

There is one more option in our toolbox for reconstructing corporate networks: instead of looking at which entities a company controls, we can ask who controls a particular company.

Walking up the ancestors

The UK legislation, for example, requires the “ultimate parent company” of a company to be identified in the accounts. There can be several parents between the subsidiary and its ultimate parent that are not listed, but this can still give us valuable information.

Other legislation around the world requires companies to disclose who controls them. The Hong Kong Stock Exchange requires all the respective holding companies of a stock to be listed, right up to the ultimate parent company. For example, this filing shows that Tesco Investments Limited (in the Bahamas) controls a company called China ITS (Holdings) Co., Ltd. in Hong Kong, via four intermediary companies in the British Virgin Islands and the Cayman Islands. The BSE exchange in Mumbai requires similar disclosures.

There are likely to be new sources of “bottom-up” control data over the coming years. At the G8 summit in June 2013, the subject of “beneficial ownership” was high on the agenda (a “beneficial owner” is a person who ultimately controls a company). Consultation recently closed on plans to introduce a new register of beneficial owners for the UK, and the definition of control includes both narrow and broad definitions as discussed in Part 2.

We’re still at very early stages of gathering and aggregating this data, so watch this space. However, this piecemeal approach can only ever give partial pictures of the information, which is why we’re working with organisations like the World Bank Institute to open up corporate registers and change the way information like this is recorded.

In Part 4, I’ll recap and summarise the terminology and concepts explored so far, and show how they are represented in the OpenCorporates database.

Posted in homepage, networks, open data, opencorporates | Tagged , , , , , , , | Leave a comment

Another UK company data milestone: Accounts as open data

Update: In record time we’ve starting importing Companies House XBRL data, and extracting the key financial figures from it (you can also click through to the underlying accounts themselves). It will take a few days for this to appear on all filing companies (about 60% of companies have filed XBRL accounts):

Companies House XBRL Accounts Data as open data

Screen Shot 2013-11-01 at 14.11.28

This morning at 10am, the UK made its second significant company-related announcement of the Open Government Partnership annual meeting in London (following on from yesterday’s news that the Beneficial Ownership register will be public).

From today company accounts data will be available to all, as open data. Previously you had to download accounts filings at £1 each from Companies House, and even then what you got were images that you had to manually turn into data, or buy the data from one of the proprietary providers who would send of the filings to countries such as the Philippines or India for rekeying as data (adding costs and errors).

Screen Shot 2013-11-01 at 09.35.31

This great news in itself, bringing all sorts of potential uses, from just giving a greater understanding of individual companies to understanding growth, identifying fraud and even potentially being the first step on the way to open credit ratings.

How is it doing this? For the past few years it’s been encouraging (though not mandating) companies to file their accounts as data, specifically XBRL, an international data standard for accounts data. Now XBRL is not trivial to understand, and neither are company accounts, but by working with accounts package vendors, Companies House have meant that this is largely pain-free for companies filing the data, and so many might not even be aware that they are doing this – they also need to file their accounts to HMRC (the UK tax office) as XBRL.

It’s worth stressing that in publishing this as open data, Companies House is breaking new ground, as this data has never been a paid-for product, and so is a concrete example of the ‘open by default’ the UK committed to under the G8 Open Data Charter.

It also shows just how much Companies House (and the UK government) has achieved in the past year and a bit since it started publishing open company data. Of course we’re looking forward to the day when the whole of the Companies House is open, and the key metric is reuse of this data, not on whether it can make money by restricting access. That decision, of course, is down to the UK government, specifically the Department of Business, Innovation & Skills. But this move brings us closer to that day, and we’re hopeful that this will happen sooner rather than later.

Meanwhile we’re now working hard on importing this data into OpenCorporates, and hope to have this information on the site by the end of today.

Posted in homepage, open data, opencorporates, Uncategorized | Tagged , , , , , , , , , | 2 Comments

British PM gets it: good business requires good, open data

Today, at the Open Government Partnership meeting in London, David Cameron, the UK Prime Minister, has announced that the UK government is going to make the new register of company beneficial ownership public.

This is great news, because it shows the very top of the UK government understands what OpenCorporates has been saying for some time: that public open data on companies is not just about tackling corruption, money laundering and organised crime, but also about creating a straightforward, clear environment for business. And making it public is not just about making the information available to all, it’s the only way it will be high enough quality to be useful.

As the PM says:

There are so many wider benefits to making this information available to everyone. It’s better for businesses here – who will be able to better identify who really owns the companies they’re trading with. It’s better for developing countries – who will have easy access to all this data without submitting endless requests for each line of enquiry. And it’s better for us all to have an open system which everyone has access to –the more eyes that look at this information, the more accurate it will be.

Now comes the real work – making sure the register actually works. We’ve done a huge amount of work in this area, as part of building the world’s first open data corporate network database, and as you can read on our blog posts on the corporate control (we’ve just published the second one), God really is in the detail.

However, we think this is solvable, without putting a burden on the millions of straightforward SME’s for whom beneficial ownership means just the shareholding information (which is already public, though not available as contemporaneous data).

Done right, this will improve competition, too. We think there are 5 key elements to making this work, and particularly in fostering innovation, allowing those ‘many eyes’ and allowing business “to better identify who really owns the companies they’re trading with”:

  • Built on firm foundations – beneficial ownership after all is in most cases just the shareholder data, or the person who benefits/controls the company where this gives the wrong impression. So first the shareholder data held by Companies House (currently up to 3 years out of date, and low quality) needs to be fixed.
  • Contemporaneous – this information can change quickly for the most tricky companies (those not set up as straightforward businesses), such as ones for criminal purposes, Special Purpose Vehicles, and those that make up some large corporate networks.
  • Granular – bald statements of just a name (imagine a transliteration of a translation of Vlad the Destroyer in Arabic)
  • Open data – if the information isn’t a matter of public record, and available as machine-readable openly licenced data we won’t have many eyes checking the data, and we won’t allow the innovation that this can foster
  • Complete – the register needs to include all companies, including listed companies. Listed companies can be controlled by individuals (and many AIM companies are) and it’s important that subsidiaries of such corporations are linked back to their parents as data. It would be perverse in the extreme if the largest companies (e.g. G4S and Serco) and their subsidiaries did not have to live by the same rules as smaller companies.

But let’s not be pessimistic. This is a huge step forward in corporate transparency and accountability, and the UK, and the Prime Minister personally, should be strongly congratulated for making the UK the world leader in this. Rest assured, OpenCorporates is ready to help, and will incorporate the data as soon as its published.

Posted in homepage, networks, open data, opencorporates, standards | Tagged , , , , , , , , | Leave a comment