Covid relief funds: How OpenCorporates’ data helped expose exploitation of the US Paycheck Protection Program

The US government’s Paycheck Protection Program (PPP) was intended to be a lifeline for small businesses hit hard by the impact of Covid-19.

In recent months, $659 billion in low-interest loans has been offered to businesses as an incentive to keep employees on their payroll despite the havoc caused by the pandemic.

The US Small Business Administration (SBA) set eligibility criteria to prevent the money being claimed fraudulently or to act as seed money for new businesses – for example, companies must have been in operation on 15 February 2020 to claim a loan.

Yet an investigation by the Anti-Corruption Data Collective and the Miami Herald found that millions of dollars have gone to companies who did not meet the criteria, and that companies have been newly set up or otherwise obfuscated by those looking to fraudulently claim relief funds.

Read on to find out how OpenCorporates’ data played a key role in the investigation.

The PPP: exploitation exposed

The Anti-Corruption Data Collective brings together journalists, academics, programmers and policy advocates interested in exposing corruption.

Whilst the collective is normally focused on corruption in the real estate and private equity sectors, its co-founder David Szakonyi wanted to investigate the PPP because they saw a societal need to understand how the US government was spending the billions of dollars allocated to the program.

He says: “We knew anecdotally that a lot of people are misusing these funds, but we thought: what can we do with data science to catch some of those nibbling away at the system?”

The investigation, he says, was “surprisingly straightforward”. It involved combining two datasets:

  1. The US government’s publicly-available dataset of more than 600,000 businesses that received loans of more than $150,000 – which David says omitted key information like company tax identifiers

  2. OpenCorporates’ dataset: which contains the official data on these companies as filed in their respective US state registries

The investigators merged the two datasets to identify companies in receipt of PPP loans at the legal entity level, and enriched the data they had about them. In addition, they used the API to filter down to a subset of companies based on their incorporation date. This helped them to identify where companies had actually been incorporated after 15 February – and should therefore have been ineligible for the loans.

“We found that millions of dollars from the PPP went to fraudsters of every mould,” says David. “Many started companies, filed fake tax returns and illegitimately took money from the SBA.”

The investigators’ findings were striking:

  • At least 75 companies that failed to meet the eligibility criteria managed to claim PPP loans worth between $20 million and $50 million in total
  • At least $3.5 million was loaned to five businesses seemingly connected to the same man
  • Some businesses receiving loans had owners with questionable records, including bankruptcies and fraud convictions

In reality, the extent of the fraud is likely to be far greater. The collective and the Miami Herald identified up to 200 other companies that they suspected of claiming loans fraudulently. They also noted that the names of PPP recipients of loans under $150,000 have not been released.

An investigation powered by OpenCorporates’ data

David said the collective “simply couldn’t have done the investigation without OpenCorporates’ data”. “It would have been an unimaginable amount of work to visit 50 websites of company registries in different states, extract the data and then manipulate it, clean it and apply our filters,” he says. 

“By accessing OpenCorporates’ API, we achieved in less than a day what would have taken two people between four and six months to do.” 

In addition to speed and efficiency, David said OpenCorporates’ company data had a number of advantages:

  • Provenance
    The collective were able to verify the identity of companies and enrich the data they had about them with ease – as all OpenCorporates company datapoints contain the official source they were collected from, and where available, a URL to the respective record.

  • Freshness & accuracy
    The data is regularly updated and benefits from the ‘many eyes’ effect – where the millions of users that visit each month help identify issues that need to be corrected in the respective registry’s data.

  • US coverage
    Company data for all 51 US jurisdictions (except for Illinois) is made available in one place, and the information is taken directly and only from official state sources.

  • Standardised schema
    OpenCorporates’ data is structured in a consistent way, allowing it to be easily combined with other datasets. The API also contains many different useful filters to narrow down searches.

  • Address data
    OpenCorporates has “an abundance of information on addresses” which makes geo-locating companies easier.

What comes next?

The collective hopes its investigation will shine a light on those who are using companies to exploit the system. “The US Justice Department has charged 58 people with fraud connected to PPP over the last three months, but our investigation uncovered 75 different companies,” says David. “We hope the government can investigate and act on our data.”

More broadly, David hopes this investigation will encourage others to use legal entity data to expose corruption. “On paper legal entities appear official, but in actuality they can be completely abused and misappropriated for malicious ends,” he says. 

“But OpenCorporates collects and standardizes data on these legal entities – which when combined with the right data can allow data science to tell a simple story about fraud.” 

Speaking about what others investigators or journalists can do to get started in using company data to investigate corruption, David added: “There’s lots of low hanging fruit in publicly available data that doesn’t require a tonne of creativity and coding skills to find patterns in. The information is out there, and you’d be surprised at how the simplest of data science, when applied to OpenCorporates’ data, can create insight.”

Interested in using our API in a similar way?

OpenCorporates is committed to ensuring that public benefit organisations such as journalists, NGOs and academics have full access to our data. The Anti-Corruption Data Collective applied for a public benefit API key and programmatically accessed our data through our API using a programming language called R. We thank them for making their code publicly available here. Other useful case studies can be found in our Investigator’s Handbook.

You may also be interested in…

  • Anti-Corruption Data Collective
    Find out more about the collective and their work
  • Case studies
    Over 400 organisations around the world rely on OpenCorporates’ data at scale
    Read our latest case studies >

  • Working on a public benefit project?
    OpenCorporates provides access to our data for free to journalists and others working for the public benefit
    Find out more >

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s