OpenCorporates Impact Report 2014

Welcome to our first Impact Report. Unlike a formal Impact Report, this is a frank overview of what we achieved in 2014, and, crucially, where we struggled.

Where we started the year

By the end of 2013, OpenCorporates had, in the open data community, already become one of the most significant open data projects in the world.

We had:

  • Over 50 million companies in the database
  • Made significant progress in understanding and modelling corporate structures and networks (thanks to a grant from the Alfred P Sloan Foundation)
  • Just launched OpenLEIs, the first user-friendly interface to the Global Legal Entity Identifier System (which at that time few people had heard of and even fewer had looked at the data)
  • Worked with the World Bank to show how problematic access was to statutory company data in the EU, and Open Government Partnership countries
  • Worked behind the scenes with Global Witness, ONE, Christian Aid and many other civil society groups to persuade the UK Government to create a public beneficial ownership register

And more than that, we’d done this with effectively 3-4 people, most visibly our co-founder and CEO, Chris Taggart (to the extent that many thought that Chris was OpenCorporates).

2014

So from the outside, 2013 looked pretty good, and judged against many targets we were very successful. However, we knew that it wasn’t good enough – our goal was (and is) not to just make headway in the open data community, but to make a real difference to the corporate world, specifically to bring a transparency, clarity and openness to companies using the power of open public data, and in doing so remove some of the incentives companies have for doing bad, and increase the incentives for doing good.

That meant we needed to both broaden our reach outside the open data world, build a viable and vibrant community, and use the strengths of OpenCorporates (and open data) to grow bigger and better. It also meant that we needed organic and sustainable revenue growth to ensure that OpenCorporates would not just still be around in the future but thriving, and driving business transparency.

To do that we focused on three primary areas:

  • The data – get more of it, not just from company registers, but also from other public sources, and focus on quality too
  • The community – broaden our outreach to NGOs, to governments, to journalists, to coders, and create tools and platforms to allow the wider community to contribute open data to OpenCorporates
  • The business world – get business users and proprietary data companies buying data (without the share-alike restriction), and so creating a firm foundation for sustainability and growth. Start to build bridges to encourage companies to be positive actors in transparency and openness

Together, we felt these would make OpenCorporates not just an important open data project, but an important global project (that has openness and open data at its heart). Period. Only by doing that will we have the impact that the world needs, so that everyone who wants to can properly understand who they work for, buy from, trade with, and how those companies interact with our societies and our lives.

How did we do?

In a nutshell: pretty good but there were several things that were even harder than we thought they would be, and a couple areas in which we want to do better this year.

The Data: Core company data

First off, we grew the core company register data by another 20 million or so, and another 30 or so jurisdictions. That’s not bad, but we deliberately held off tackling some countries in the hope that they would publish their company registers as open data.

Specifically we had hoped that, given they signed the G8 Open Data Charter back in 2013, France, Germany and Italy would open their company registers, and while France and Italy have made moves towards carrying out the charter, Germany shows absolutely no sign of doing so – which is pretty astonishing when you think about it.

However, thankfully an increasing number of company registers did move towards open data. Among the jurisdictions to open up company data in 2014 were: Belgium, Latvia, Romania, New York, Australia.

In addition, in a major breakthrough the UK announced that the whole of the data held in the register would be open by the end of the second quarter this year, and we understand other countries are looking at this with interest.

But there have been challenges along the way – we regularly get requests and demands to remove companies from the database, and we refuse them all.

“Your activity placing my full name on the web to allow people to search under my name is damaging me and my business activities. This means I will now have to go about my business under a fake name”.

We also get threats, usually legal ones, but occasionally threatening violence, or other consequences:

“The principal of our firm is a 22% shareholder in Google. I will give you an additional 24 hours to remove this link before we delist the ENTIRE website from search results”

Fortunately, with our advisory board, we long ago established a set of principles to guide us. In fact we consider it a valuable part of the impact we have (this is an impact report after all) that we do not remove such data, and that data on companies and their owners or controllers is not hidden from view. Companies are, after all, artificial legal constructs, created by societies for the benefit of those societies, but owned by a small section of the population.

As well as collecting the data, we’ve also been making it more usable. As part of the work building the data pipeline, we made a major investment in our infrastructure. This came partly in the form of buying some fairly significant hardware thanks to some sterling work from our SysAdmin, Ben (we obviously looked at cloud options for this but ultimately rejected them for cost, control and performance reasons).

It also meant a move of our underlying search engine to ElasticSearch. Neither of these moves were something we made a fuss about, but have led to major improvements in speed of the website (reducing page load times by 50%), stability (we are aiming for 99.99% uptime) and functionality (for example the new API functionality just released). Making the move to new hardware was relatively painless, but changing our search infrastructure was pretty tough, and with the benefit of hindsight we would have done it much sooner, and budgeted more time to do it.

In other data-related news:

  • Our API now serves up to 5 million requests a day
  • We released what we’d learnt about corporate networks and subsidiary relationships in a series of incredibly popular blog posts. We also released the underlying data model for our relationship work so others can benefit from this
  • We spent a reasonably significant amount of time understanding global industry codes and identifier systems – not sexy, but important. The results of this work are now starting to appear in OpenCorporates
  • We are developing quality metrics for our corporate data, and these should be released later this year

The Community

What makes OpenCorporates different is not just the data but the support of the wider open data and transparency communities – NGOs, open government supporters, businesses, lawyers, developers, activists and so many more.

Having our roots in the open data community, it’s not surprising that we have support and friends there. However, making connections outside tended to be… er, organic. It just sort of happened by chance with the goodwill and initiative of people involved.

To achieve the growth we want in the future, however, we knew we would need to build on this and reach out to a much broader community – to more NGOs, academics, governments and the community at large, and enable that wider community to collaborate and contribute to OpenCorporates.

With the help of a second grant from the Alfred P Sloan Foundation we set about building not just the community but the tools to support it too. In particular we wanted to build an end-to-end solution – from finding company-related datasets to enabling others to write bots to convert those datasets as open data.

This proved to be a complex problem – what would the bots look like, who would contribute, how would we manage the process, how do we ensure quality, what are the barriers – both technical and cognitive.

Over the first half of 2014 we tried several different routes, testing them internally and with a few ‘alpha users’, discarding several approaches along the way – for example making it ‘easy’ for contributors to write bots where much of the functionality seemed appealing but meant that it was difficult for them to understand what was going on – ‘too much magic’. It also became clear we needed a largely language-independent solution, so that we could in the short-term support at least Python as well as Ruby (the language we use internally). We also needed bots to run independently and safely from each other, without fighting for resources.

The result is of all these iterations is the Turbot platform. It’s perhaps one of the most important tools we will ever create. Turbot fetches data from bots written by volunteer community members, allows them to be reviewed, normalises them into a format that can be imported straight into OpenCorporates and runs the bot on a schedule so there is no need to run it manually . If something breaks, someone from OpenCorporates can work with the bot writer to fix it.

We’re not finished yet – we are improving it every week – but it’s already working well enough that we are gradually migrating our existing bots over to it. Well enough even that when the Open Data Institute needed a bot-writing/management platform to support their OpenAddresses project and heard about Turbot, they pretty much immediately decided to use it. Later in 2015 we plan to work with the ODI to open source the platform.

To complement the sourcing of missions and giving our community outreach a home, we developed Missions (thanks Xavier!) – a platform to show and allow users to sign up for open data missions. There are two kinds of missions at the moment: (1) Scraping datasets (2) Finding data sources.

Missions has more than 771 registered users, of which 115 users have claimed a mission. Our main focus for 2015 will be to add more features for community members to enable editing and code review.

Bot writing, though fun, can be quite isolating as people do them in their own time, and when you hit the inevitable snags it can be demoralising on your own. Enter #FlashHacks – a mini hack event that brings together developers and people who care to collaborate on opening up corporate data. We’ve been running these events since July 2014 and it’s been hugely rewarding. Not only is it great to put a face to people who write bots for us, but also to get new people involved and demonstrate the impact of open company data. Our first one saw 60 bot writers gather in Berlin at OKFest and since then, we’ve done monthly #FlashHacks events in London. We’ve now also got events coming in many other countries! So far, we have over 200 bot writers & storytellers in London.

All we have achieved with the community is a good start but we know there is a long way to go before this community is strong and big enough to take on the world of proprietary company data.

What we’ve found to be difficult is articulating the use of open data and stories of how people are using OpenCorporates. We’ll often meet people or see a tweet which shows us people are using OpenCorporates in innovative ways to fight legal cases, hold companies to account, do due diligence, integrate into their workflows and investigating corporate fraud. We haven’t collected these stories yet and this is something we must do in 2015. It’s important for our community to understand the impact of their contributions.

Collaboration

We collaborated with Open Oil, an NGO based in Berlin that works on the transparent extractives industry, to map the corporate network of BP. This was an exciting campaign. Just by examining BP’s public filings, we found 12 layers of approximately 1,200 companies across 84 jurisdictions.

What does this mean? First, that though the information on the corporate structures of the largest companies is technically available, it is not functionally available – i.e. rather than being made publicly available as open data, it is spread across multiple filings, which are almost always PDFs or image documents that require work to convert them into open data.

Second, if companies don’t release information about their corporate structures, then civilians and open data/transparency activists can do it. This project made enough of an impact on BP that they got in touch with Johnny West, the founder of OpenOil, to discuss the results. We will be building on this approach in the future with other large corporations.

Another thing we’ve been talking about for a while and started in 2014 was starting a cross-linking integration with Wikipedia – or more accurately, through Wikidata. OpenCorporates was already being used by many Wikipedia editors as an evidence-base for information in articles on specific companies. We wanted to make this more automatic and easier to do, so working with the Wikipedia and Wikidata community we proposed (and got accepted) the OpenCorporates URI as a property, and the excellent folk at Wikidata have been experimenting with automatically matching companies in OpenCorporates to the companies in their database.

As part of our focus to collect as much data on companies as possible, we launched a campaign called Map the Banks – scraping all the financial licences in the world. In the wake of a global financial crisis which has been estimated to have cost society up to 10 trillion dollars, it seems crucial to have an open data map of the financial industry. The campaign was divided into three stages: (1) Scraping the data (2) Analyzing the data with journalists, analysts and NGOs (3) Publishing results and insights from the analysis.

For the campaign, we are partnering with Civio and OpenNorth . Investigative reporting projects such as OCCRP and Tax Justice Network and transparency NGOs (such as ONE and Transparency International) to analyze data scraped.

The Map the Banks campaign has been viewed over 30,000 times with a big referral following a posting about the campaign on Hacker News. We’ve had many signups from people who are interested in helping out as coders, analysts, and people who are simply interested in the results of the campaign! We have 267 sources to scrape and we want to finish this campaign by the end of this year.

Corporate Transparency and Beneficial Ownership

In 2014 we continued to work to help increase corporate transparency, extending the Open Company Data Index, which measures access to statutory company data, to all 180+ countries in the World Bank’s Doing Business survey.

We also continued our work on Beneficial Ownership. We strongly believe that who controls and benefits from companies should be open data as a matter of routine. This is not just about tackling the pervasive use of corporate structures for money-laundering, organised crime, fraud, corruption, stolen assets, aggressive tax avoidance, important though that is. It’s also basic good business – to know who you’re doing business with.

In 2013 we worked hard with civil society (particularly Global Witness, ONE and other UK NGOs) to campaign for public beneficial ownership registers, and at the end of that year were delighted that the UK government announced it would go ahead with a public beneficial ownership register. In 2014 the focus moved to how that register was going to be implemented, and we worked with those same partners to explain to the government why the information needed to be granular, accurate, timely and most of all open data from the beginning. We had significant successes here, both when the legislation was created, and in its passage through parliament.

As part of that process in December 2014, we also launched Who Controls It – an open-source, proof-of-concept, very much “alpha-version” Beneficial Ownership Register which helps visualise and submit chains of controls for companies. This was initially created in just the space of a couple of days to experiment on how such beneficial ownership data could be captured with minimal burden on the companies submitting the data (we have since iterated it to add real-life beneficial ownership data published under the Extractive Industries Transparency Initiative). We are now talking with partners in the transparency space about how to improve this further

This project will not solve all the problems created by opaque and anonymous companies, but Who Controls It is an important first step and with the help of the wider open data and transparency community, we can show governments and others that collecting this data in a granular and useful way is really not that difficult.

Research & Investigations

One of the things that we are very committed to is providing data and support to any academic researcher and investigative journalists.

In 2014, we provided data using our CC by SA license to a number of academics from universities such as Brunel, MIT Sloan Business School, and The University of Southampton. We’re still in talks with The University of Warwick, The University of Amsterdam, The Indian Institute of Technology, Delhi, and a couple of others to provide data. Once the studies are published, you will be able to read them on the blog. Similarly, we assisted a number of journalists, including from The Financial Times and The Economist, on stories they were writing.

What’s been hard in this area? Two key things: the knowledge gaps and data holes. The knowledge gaps are multiple and varied, from journalists not being comfortable with the coding or data science necessary to make sense of the data, or not understanding the nuances of corporate data; the holes are where key datasets needed to join the existing data together are missing. We would like to address both of these over time, but think there’s no silver bullet.

The business world

In another respect, 2014 was the year OpenCorporates stopped being just an open data project, and started to become part of the business information landscape – albeit a very different one to every other business information provider out there.

This is important for two reasons.

First, it’s critical not just that OpenCorporates is sustainable, but that it can grow into the powerful resource that civil society, journalists, anti-corruption investigators and everyone else needs.

Second, it’s critical that the open corporate data world is not some second-class backwater, compared with the proprietary world, with those users who lack the budgets, power or access forced to feed from scraps. Such a world would fail to create incentives for good corporate behaviour, as well as giving a free pass to criminals, money-launderers, corrupt officials and those who would seek to avoid oversight and scrutiny from the wider society in which they operate.

That’s why we take it as a really important measure of success when the business world starts using OpenCorporates routinely, whether through the website, or by buying data without the normal share-alike requirements. Because they are doing so, not because we’re nice people, not because they want to support the open-data world – but because it offers them something they can’t get elsewhere, whether that is specific data, clear and transparent provenance of the source, data quality, non-proprietary identifiers, or the ability to reuse. And quality and benefit is what we firmly believe will move open data from being a nice idea to being one of the fundamentals of free, open and fair societies.

2014 was a landmark year for us in this respect, as our roster of clients expanded considerably to include such well-known organisations as LinkedIn, the World Bank, Creditsafe, Bureau van Dijk, Avention (One-Source) and many more. We also found out that banks, global law firms and accountants were routinely using OpenCorporates in investigations, due diligence work and onboarding of clients. That’s critical, as getting such groups doing this is a powerful argument not just for OpenCorporates but for public company registers and public beneficial ownership registers too.

What’s next

We have been moved by the stories of impact OpenCorporates have made on so many individuals, projects and organisations. In 2015, we hope to collect these, share them and learn how we can maximize the impact. From our work with NGOs, researchers and journalists, we know that we need to make OpenCorporates more accessible to those whose full-time job is not working with large amounts of complex data but who could be benefiting from using this data.

We’re also expanding ways in which people from different skillsets can contribute open data back to OpenCorporates. There is a way for everyone to help, whether you are a developer, lawyer, activist, accountant, journalist or a concerned citizen. We’re constantly iterating this workflow and enjoy building a more vibrant and inclusive community working on this movement.

2015 will also be the year we aim to add a significant quantity of new data – using the Data Pipeline to add not just core company information, but other data too. If you have a dataset that you’d love to see in OpenCorporates, contact us now.

Other things

Our CEO and Co-Founder, Chris Taggart, was appointed to the board of directors of the Global Legal Entity Foundation, although this does mean he spends far more time than we’d like in LEI-related meetings and conference calls.

Ben (SysAdmin/DevOps) volunteered 10 days of his time in January to provide IT support on the ground for NdiMoyo, an independent charity based in the Salima district of Malawi, that offers palliative care services and training throughout Malawi.

Hera (Community Manager) organised a hackathon in partnership with the UK Foreign Office and Dutch Embassy on developing tech solutions for charities working on the ground in conflict zones such as Syria. This was a part of the End Sexual Violence in Conflict conference chaired by William Hague & Angelina Jolie. Hera also somehow finds time to run a charity called Chayn, that uses simple technology to empower women experiencing violence and oppression.

Peter (Data Curator) worked on Reveal – a database-driven behaviour tracking system which has been designed to improve the quality of life of people whose behaviour can be challenging, and to support those who work with them.

Peter (Software Engineer) visited Vanuatu in the South Pacific, where he had spent a year working with VSO in 2007-8, revisiting some old projects and catching up with old friends. He has also been involved in running PyCon UK, and has been a coach at various events to teach coding to new programmers.

Seb (CTO) spent much of his spare time farming (!) and also squeezed in some time to be on the board of (and treasurer of) the Frome Society, a charity which runs workshops for the local community in Herefordshire. In 2014, Seb helped organise weddings, singing workshops, pilates classes, and folk music sessions. He also donated swing dancing lessons; playing in a ceilidh band and ran the first 10K race of his life.

Shyam spent a lot of his spare time helping doctoral candidates in extracting useful information from hard to get data – including in a study identifying `Leakage in Fuel Subsidies`.

One thought on “OpenCorporates Impact Report 2014

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s