Want to collect the world’s company data by yourself? Read this first!

Data, tech or product professionals that we speak to often tell us a familiar tale. They started off trying to collect the world’s company data themselves to power their new solution or platform. 

They had limited success at first, realised how difficult it is, and then thought: “there must be a better way – and somebody must have done this already”.

In this blog, we’ll explore some of the main obstacles that need to be managed when you’re trying to collect the world’s company data and explain why it may be a wiser choice to rely on the experts who have already done the difficult work for you.

Company data at a global scale: why it helps

First, why would you need to go to the trouble of collecting data on the whole universe of companies that exist in the world in the first place?

Many of the challenges product, tech or data managers are trying to solve require a global view of companies and the people they are connected to – from verification to investigations or master data management. 

Depending on the problem you’re trying to solve, the data can only add value when it is aggregated, connected and accessed at this global scale – not just record-by-record. This is true, for example, if you’re building a graph database or a network analytics tool – as just two examples.

Challenges of getting company data in the way it’s needed

It is obvious, then, why so many have tried to collect, aggregate and standardise global company data. But actually managing to do it – and making it usable on an ongoing basis – is fiendishly difficult. 

Below are just some of the many reasons for that:

  • Siloed data landscape
    Company data is held in hundreds of individual country and local state registries. This means the data is siloed and cannot be compared without a lot of pre-work. Even if you are prepared to go to each one in turn, registries have different capabilities and provide data in varying formats. 
  • Inconsistent data schemas
    Company datasets are provided by official registries according to different schemas in different jurisdictions. These inconsistencies make it confusing to understand what each data attribute means and difficult to create a harmonised view of the universe of companies.
  • Frequent changes
    Registries understandably make changes to their data schemas. This means you’d need to proactively stay on top of these changes and remediate your data accordingly.

  • It’s an ongoing task – not a one-off job
    Every month, thousands of companies are incorporated or dissolved, and many more thousands of officers are added or removed from these. As a result, if you rely on a single historic company dataset, it will soon become stale and out-of-date. You’d therefore need to establish processes to collect, standardise and work with the data on a continuous basis.
  • Resource intensive
    It takes a lot of time and resources to continuously collect and harmonize data from hundreds of disparate and dynamic registries. Do you really want to hire, onboard and train the team of data and tech specialists it’d require just to acquire this dataset?

  • Domain knowledge is needed
    A whole world of niche knowledge is required to understand company data. Legal entity structures and their firmographic attributes differ by jurisdiction. It involves knowing how to access company registries, understanding the differences between them, decoding different company identifier schemes and more. 
  • Operational challenges
    Each registry has its quirks. Some still provide their datasets only via DVD, for example. To navigate these, you’d need to build relationships with registry officials, which takes precious time and effort.
  • Matching isn’t as simple as you’d think
    To unlock the value of global company data, many users are looking to match official company records to their existing data. This process is more difficult than you might expect. Registries identify companies in different ways, and some do not even have a unique identifier for their companies. Without a common identifier system that allows you to identify a company which is mentioned in another dataset, the matching process becomes exceptionally difficult.

  • Lack of data provenance
    Collecting the world’s company data makes it difficult to keep track of where each record came from and when it was collected. But for users of the tool you feed the data into to use it with confidence, you need to accurately reflect this provenance – especially if your users work in risk or compliance. This means setting up your own processes and way of illustrating the provenance of each record.

Let us do this difficult job – so you can focus on what’s important to your users

Amassing all of these resources and knowledge is unlikely to be an economical use of time or energy – surely you’d rather focus on improving your product. 

Let us take care of this difficult job.

For more than a decade, we have been collecting, standardising and making available company data with provenance, freshness and usability critical to hundreds of enterprise clients and millions of monthly website users. 

We have overcome many of the pitfalls and know what best practice looks like, allowing you and your product’s users to put company data to work quickly and with confidence.

So save yourself the pain of trying to do this yourselves and rely on OpenCorporates to power your platform.

You may also be interested in…

Case study: Exiger’s DDIQ

Having tried to build systems to collate information directly from many different company registries, Exiger instead entrusted OpenCorporates as an authoritative source of global company information to power their automated due diligence process.
Read case study >

Learn how to leverage transparent company data at scale 
Subscribe to our emails >

Sample data
Request a trial of our API or a sample Bulk Data file

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s