Data provenance: why it helps you make more defensible decisions

Data about companies is used every day to inform all kinds of critical decisions: from analysing risk to vetting potential suppliers.

But how do you know what data to trust? And how can you justify important decisions you, or your users, take with the data?

The answer is simple: data provenance.

Efforts to utilise data, for whatever purpose, only become truly defensible when you have clear information about its origins.

In this blog post, we’ll explore some key examples where data provenance helps create defensibility – whether you’re a product manager building a cutting-edge platform or product, a compliance manager trying to identify red flags or one of the many others who rely on company data to better inform their decision-making every day.

What is data provenance?

In the context of company data, data provenance means that you have a clear understanding of:

  • Where?
    The specific source the data was collected from.
  • When?
    The data was collected.

Having a transparent line of sight into both of these together will greatly increase the confidence with which you can utilise the data. That’s why OpenCorporates’ data, unlike legacy Black Box providers, provides you with a clear path back to the official public source from where it was collected – along with information about when each company record was collected.

The list of benefits is long, so we have narrowed it down to four. Well-provenanced data allows you to:

1) Risk management: justify decisions to your stakeholders

Onboarding, Know Your Customer or other risk management professionals have to justify their assessment of a potential customer’s risk level. Stakeholders such as the company’s board, regulators, auditors or investors can all ultimately review these decisions – and getting it wrong can have damaging consequences.  

If the data they are using is opaque, then it is impossible to be completely confident when explaining a decision.

By contrast, well-provenanced data complements an audit trail – which risk management professionals can use to justify their decisions with. In the case of OpenCorporates’ data, this stretches directly back to official public sources.

2) Product development: build smarter solutions

Product managers are increasingly integrating company data at scale to power technology products that create efficiencies, help their users to understand their supply chains and much more. 

Building these solutions relies on having the right data, and knowing the ‘when?’ and the ‘where?’ lets users identify any weaknesses in the data. This can inform how the platform is built and used. 

For example, if you know a certain record was collected from an official source very recently, you can use it with confidence to make up-to-date decisions. Equally importantly, if you know a slice of the data is less fresh, you may choose to rely on it for some purposes but not for others. Knowing this lets you build a better, more defensible platform.

With Black Box data providers however, you may be incorporating stale or untrustworthy data without knowing it. As a result, your platform will become less useful and may give inaccurate results.

3) Data analytics: connect the dots more firmly

Data analytics solutions, such as those supporting credit risk, real estate or anti-money laundering decisions, increasingly leverage AI and other technologies to join the dots between multiple datasets and uncover new insights. 

The technology driving them will only give meaningful results if it uses high-quality, authoritative and well-provenanced data.

Caryn McEwen, Head of Global Licensing & Content Operations at LexisNexis, explained this well: “For us as an aggregator of many different datasets, it is critical to have transparency of data because it enhances our ability to link datasets together, rather than having to use data whose provenance is unknown, self-reported, or whose entity identifiers have traditionally been siloed”.

4) Prevents you making potentially costly assumptions

If you are relying on Black Box data, you will inevitably end up making assumptions about its quality and freshness – as you’ll have unsatisfactory data provenance.  

These assumptions will sometimes be right and sometimes be wrong. Either way, nobody wants to operate in the dark like this, and assumptions about data provenance can be costly.

The problem can also compound – as if the data used in your product or process is poor, then this can have an oversized impact, with multiple decisions being made from a faulty foundation.

So what next?

Demand more of your company data providers.

If you ask your data vendors about where the data comes from and hear it’s a ‘trade secret’ or it comes from ‘proprietary sources’ – then that isn’t good enough.

This is just one reason why we’re rapidly bringing about a reference dataset for the universe of legal entities in the transparent provenanced way it’s needed, so everybody benefits.

You may also be interested in…

The value of provenance
Read more about the benefits of well-provenanced, transparent data in our white paper.
Download White Paper >

Learn about OpenCorporates’ data
Find out more about our data.
Explore our data >

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s