24 months + $3000 + 1 cake: How Virginia’s company data was opened up

This is a guest post by Waldo Jaquith, who runs US Open Data, and this month pretty much single-handedly persuaded the US state of Virginia to open up all their company register data

OpenCorporates’ work is in aggregating openly published corporate registries, but less well known is the work that the organization foments. Moving corporate registers from closed records to open data is a laborious process, requiring work from a bunch of stakeholders. Here’s the story of how one new registry was opened up, effective today.

The U.S. state of Virginia is adjacent to Washington DC, roughly the size of North Korea, has 8 million residents and over half a million businesses. In the United States, corporations are registered with the state, not with the federal government, so instead of a national registry, we have 55 registries, each using different standards and practices.

Like many U.S. states, Virginia has long been a “dark state” for corporate data. The State Corporation Commission, the independent state agency that regulates corporations, does not provide open data about Virginia’s businesses. There’s no API and no bulk downloads. There are a pair of web-based interfaces to look up records, one at a time—one with a limited subset of the data and another that is complete, but with a horrifically bad interface.

However, Virginia has long sold bulk data. They require that a contract be signed and a payment of $450 be made every three months, in exchange for FTP-based access to the data. This data is a mess, to put it gently.

Two years ago, I started buying Virginia’s corporate data to give it away for free. I didn’t have any great plans for it – I was mostly just embarrassed to see my home state as a blank spot in OpenCorporates’ collection. Corporate data needs to exist within OpenCorporates. Period.

Making the file itself freely available was as easy as writing a cron job to copy it into an S3 bucket once a week. But getting the data into useable shape was a much greater challenge. The data published by Virginia’s State Corporation Commission was just a mess, on every level. Here’s a sample:

The data was divided into nine different files, which they concatenated together into a single file for some reason. So using the data first required breaking it up into its individual files. Then the fixed-width data had to be mapped into structured data. Then the character encodings have to be normalized, because the agency has periodically used different character encodings – some of which are truly mysterious – without updating the old ones. (Sometimes they switch encodings mid-record.)

At this point, OpenCorporates was able to harvest the records and incorporate them into the site.

Mission accomplished? Not quite.

I didn’t want to settle for buying public data, indefinitely, for $1,800/year. So I set about trying to convince the Virginia State Corporation Commission to stop selling the data, and to instead give it away. I promoted the fact that they only had six paying customers. I called them out via Twitter with every check that I wrote:

I conducted a study that found that this data had US$100 million in untapped value to localities in Virginia, and made sure that it got lots of press coverage. And then, finally, in April, I had a passive-aggressive sheet cake delivered to the head of the agency:

It’s hard to say which of these things worked – maybe all of them, maybe none of them – but on July 1, the agency announced a 180° change: they would start giving away the data for free, as CSV, effective today, August 1.

This was beyond what I’d hoped for: not only are they giving the data away for free, but they’re providing it as structured data in an open format, available via HTTP. I no longer have to write checks every three months, OpenCorporates can harvest the data directly from Virginia, the state can stop wasting money administering what is surely a money-losing data-sales system, and this data is now available to anybody. Everybody wins. It took a couple of years, but the effort that OpenCorporates set in motion is now complete.

OpenCorporates is more than a repository of corporate registries. Its existence is a wedge that’s opening up corporate registries throughout the world. This isn’t happening quickly—it took two years of non-trivial work just to get one medium-sized U.S. state to publish its data openly. But OpenCorporates is leading the charge towards openness in corporate registries, as evidenced by Virginia’s change today.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s