Company data at scale: when to use an API or a whole dataset in bulk? (Or both)

The ability to search for company data record-by-record is invaluable, but what about when you need the data at scale – such as to automate workflows, underpin a tech platform or connect it with other datasets?

Whilst some company registries are still in the business of providing company data on CDs (anybody remember those?), two main delivery methods of data delivery at scale prevail today from registries and data vendors: API integration and bulk data deliveries.

It’s important to understand the benefits and characteristics of each before deciding how best to power your tech product, data platform or internal systems with company data. 

Over 400 organisations rely on our API or bulk data, from government agencies to disruptive tech companies. Drawing on our extensive experience, this blog post will outline the different kinds of business needs which could drive you to choose the bulk or API route (or both) when you require company data at scale.

Company data in bulk

Whilst bulk downloads are the most basic form of data delivery at scale, OpenCorporates is one of the few organisations to provide this for company data. Our bulk data clients receive regular deliveries of company data in an easy-to-use CSV format, which helps them power solutions to some of the most interesting business challenges.

If you’re interested in receiving a sample of the bulk data we provide, you can always get in touch.

Market needs where bulk company data adds value typically include:

Spine data

Bulk data is essential for organisations who are looking for a robust spine of company data to sit at the foundation of their data, tech or internal systems. 

Our transparent legal entity reference data offers platforms, ranging from data lakes to SaaS products, a basic reference point to which other datasets can be compared and appended. For example, Quantifind uses our global bulk dataset to underpin their machine learning-powered anti-financial crime investigations tool. 

Data analysis

A company dataset delivered in bulk can be used for a wide range of analysis, including identifying trends or anomalies – and creating models.

For example, FNA’s network analytics platform uses OpenCorporates’ bulk data to help financial crime investigators take a proactive approach to identifying risk. By combining our company data with a range of other sources such as financial transactions and news reporting, and applying machine learning, the technology identifies anomalous activity patterns that could be early warning signs of risk.

Combining datasets

Bulk data can be combined with other datasets, such as business licenses, procurement data, sanctions lists and more. Combining datasets in bulk like this is often how insights are uncovered.

For example: it was only by combining OpenCorporates’ data with a list of firms provided with Covid-relief funds under the Paycheck Protection Program that the Anti-Corruption Data Collective were able to investigate alleged fraud. Similarly, investigations platforms (such as those used in anti-financial crime) need to know what “dots” (read: companies and their officers) are out there in order to join them and find potential linkages to investigate.

Where simplicity is needed to start with

Bulk data can be advantageous where low technical barriers to entry are important to an organisation starting to use company data at scale. Many NGOs, journalists or law enforcement officials have the ability to deal with a single dataset, but they may not have the skills or resources to write the code to interface with an API, particularly when they need to do so on an ongoing basis.

Security

Many government and law enforcement agencies, as well as some large financial institutions, are restricted in their use of external APIs. They can be bound by rules that, for example, prohibit them from making queries about entities they are investigating. So acquiring the data in bulk allows them to utilise it at scale internally.

Company data via an API

API integrations are increasingly used to enable two systems to exchange data and talk to each other. 

In the world of company data, API integrations are often beneficial for the following business needs:

Single queries

API calls are valuable where single queries need to be made. They are often used when you already have a name and company number for a legal entity and need to verify that against official company data time and time again in a repeatable way. For example: such as in the identification and verification (ID&V) process of Know Your Customer (KYC) due diligence.

Workflow automation

API integrations also enable automated workflows to run. Instead of the whole dataset en masse, APIs can pull in a few records or attributes at a time, such as names, addresses or officers. This saves you (or your product’s users) the time and effort of having to search for these records manually.

Many regulatory technology (RegTech) tools that focus on automated due diligence or business verification already use the OpenCorporates API in this way – such as Exiger’s DDIQ. Their tools typically call on OpenCorporates data to help their users verify the identity of a company or company officer they need to conduct due diligence on, as one of the initial steps in a longer workflow.

Low-latency queries

In some cases, it is critical to have access to very up-to-date data. This is particularly true where anti-money laundering or KYC regulations require entities to be screened against current data, or where out-of-date data will give the ‘wrong’ result.

Where smaller record numbers are needed

An API is the best option for an organisation that finds it problematic, or not cost-effective, to import an entire dataset. If a user needs only a small number of records, for example in an onboarding process, then downloading and importing millions of records might not be as efficient.

Using data without storing high volumes

Similarly, API calls are useful where an organisation does not want to store much data themselves, but prefers to access large datasets for one-off retrievals.

Bulk data & APIs: when to use them together

There are also times when it is beneficial to use both API and bulk delivery mechanisms for your company data. Incorporating bulk data first can provide the foundational layer of company data we mentioned earlier. API calls can then be used to refresh records between bulk deliveries to keep individual records up to date. 

Ready to get started? Explore our data here.

You may also be interested in…

  • Case study
    OpenCorporates’ global bulk data file helps power Quantifind’s financial crime investigations tool.
    Find out how >

  • Building a tech or data platform?
    Ask these 6 questions first.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s