10 million data points crowdscraped in 10 days? When OpenCorporates launched this target just 2 weeks ago, we thought it was challenging but doable. Doable, that is, provided we could reach out to bot writers online. Doable providing that the culmination of the campaign at OKFest, probably the world’s largest gathering of the open data movement, worked out. Doable provided that there were no problems, such as bugs, patchy venue wifi, and unbearable heat.
Well, we had all those problems, and more, but we are overjoyed and proud to say that not only did we break the 10 million mark yesterday, we smashed it by over 2 million datapoints.
10 days ago, we kicked off #FlashHacks with three goals: (a) Put a face to the passionate bot writers that are the backbone of the open data movement (b) Launch our crowdscraping platform and (c) Show the world the power of the open data community while pressurizing the governments to take action to fulfil their commitments on opening up data.
When we reached OKFest, we still had over 6 million datapoints to be scraped in two days to meet our goal. With the help of pioneering organisations such as Open Knowledge Foundation Germany, Sunlight Foundation and Code For Africa, we put on three #FlashHacks.
During OKFest, we had the pleasure of welcoming over 60 developers who wanted to contribute to the open data space. Many of these developers had never written a bot before and told us they loved to know that a few hours of coding could liberate a dataset for the benefit of all.
Some people asked us why we were doing this and whether freeing these datasets actually makes a difference. Information about public and private sector is of critical importance to understanding and changing the world we live in. Corporations play an increasingly dominant role in our lives and wield unprecedented influence on politics and economy.
Yet company information is often not available and when it is, it is buried under hard-to-use websites and PDFs. Fortunately, the work of the open data and transparency community has brought a tide of change. With the introduction of Open Government Partnership and G8 Open Data Charter, governments are committing to make this information easily and publicly available. However, real action remains slow. And that’s why scraping is at the heart of the open data movement! Where would the open data community be if it had not been for bot-writers spending time deciphering formats and writing code to release data? To borrow a nugget of wisdom from our friends over at CivicPatterns: Don’t wait; scrape.
The kinds of datasets liberated included business licences in infamous Delaware, licensed corporate service providers with even more Cayman Islands, Tenders in Western Cape, South Africa mining licenses, Swiss Aircraft registry, banks in Poland, and UK importers. This data will soon be imported into OpenCorporates where it will be published as open data.
It was an inspiring ten days. More than 60 coders helped us break the target during OKFest and many more online. The campaign was even covered in techPresident and netzpolitik. If you missed the #FlashHacks campaign on line or on twitter or simply want to re-live the moment, have a look at this Storify.
If you would like to be told of future #FlashHacks or contribute to the Missions platform (non-code missions available too) then sign up to our Google group now.
Want to host your own #FlashHacks? Email me at firstname.lastname@example.org and we can make it happen.
This campaign would not have been possible without the help of the Alfred P. Sloan Foundation, which generously provided the funding that makes this possible.