Open identifiers: enabling insight & lowering barriers to innovation

Is the era of proprietary identifiers coming to an end?

From April 2022, the US General Services Administration plans to stop using the DUNS number to uniquely identify companies registered in their System for Award Management (SAM). This number, which is the classic example of a closed, proprietary identifier for companies, will be replaced by an open identifier. 

This is just the latest evidence of a global move away from the old model of identifying companies through proprietary identifiers towards instead the use of open identifiers.

This blog will explore the reasons for this trend and the benefits that open identifiers bring. But first, let’s unpick the jargon.

What are open and proprietary identifiers?

An identifier is a unique reference for an entity which is used in a dataset. 

Proprietary identifiers are those which have been assigned by an authority, usually a data vendor, which then owns and controls the identifiers. Use of these identifiers is subject to the terms and conditions set by a particular vendor. 

By contrast, an open identifier uses a non-proprietary identifying schema which is available for anyone to use and is transparent about how the identifier was derived and exactly what it refers to. 

At OpenCorporates, we use open identifiers for our legal entity data. 

Our open identifier is simple but effective – it is derived from the combination of two standards:

  • The International Organisation for Standardisation (ISO) codes indicating which jurisdiction (and sub-national jurisdiction) the company was incorporated in – a non-proprietary standard which is used internationally.
  • A legal entity’s official number assigned in their relevant official national or local company registry.

Why proprietary identifiers hold us all back

So why are proprietary identifiers becoming a thing of the past? 

  • Vendor lock-in
    If you acquire legal entity data from a legacy provider that only provides proprietary identifiers, you have to reference that identifier every time you use the data. This makes it difficult to change your data supplier in the future, leaving you potentially locked into using their data in the future. If you have built a technology and data platform that references these proprietary identifiers, then changing this identifier may require a complete rethink and rewiring of the platform.
  • Difficulty of combining it with other datasets
    Combining datasets is often what reveals new insights by cross-pollinating different clusters of data. But when one dataset uses proprietary identifiers, it becomes more difficult to compare and combine it with others.

    The terms and conditions of using the identifier can include usage restrictions, they can change and you might need to ask the vendor for permission to use their identifier schema in this way. Combining datasets with others then becomes technically more complicated, often adding a hidden cost.

    This is not just theoretical. An inability to uniquely identify legal entities across numerous datasets can expose a company to significant risks. For example: it may even have played a contributing role in holding back the ability of financial services firms to respond to the 2008 financial crisis.

    According to the Data Foundation, Joseph Tracy, former Vice President of the Federal Reserve Bank of New York was asked by supervisors “a very simple question: What is your aggregate exposure to Lehman Brothers?”.

    In response, he said that major financial institutions: “… don’t know and it will probably take weeks if not months to get an answer. The problem was that Lehman Brothers consisted of several thousand legal entities, and there was no design in their data systems that made it easy for them to aggregate all those entities”.
  • Ambiguity and lack of precision
    We recently wrote about why poorly-modelled data is ambiguous and imprecise, making it difficult for you to be confident in how you use that data. Proprietary identifiers compound this problem, because the system explaining how an identifier is formed is usually not explained. In the case of legal entity data, legacy Black Box data providers not only use proprietary identifiers, but they also have poorly defined models – where records could refer inconsistently to something other than a duly-registered legal entity such as a storefront or simply an address. 

The value of open identifiers

By contrast, open identifiers benefit any organisation that uses company data at scale to power their platforms, products or business operations. They bring two key advantages:

  • Lowering barriers to entry
    Open identifiers make it easier for innovators such as start-up companies to innovate with data quickly. This is because they do not need to wrangle with complicated terms and conditions over how they can use the data, or overcome extra technical challenges in using that data.
  • Garnering insight
    Open identifiers make it easier to generate insights. This is because datasets based on open identifiers are easier to combine, allowing interoperability through data analytics tools and technology platforms. We know from speaking to clients in all sectors that our open identifiers offer a bridge for them between canonical legal entity data and other datasets, which accelerated their innovation.

You may also be interested in…

The transparent data revolution
Read more about the drawbacks of proprietary identifiers, and the benefits of transparent legal entity data, in our white paper.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s