Uncovering the truth using comprehensive data analysis – foreign investment in London property market

This is a guest blog by Daniella Tsar who is a Data Scientist at Thomson Reuters Labs.

At Thomson Reuters Labs, we drive innovation through data science and visualisation using novel approaches to create solutions that help our customers — all the while harnessing the power of linked and shareable data. We regularly deliver projects tackling corruption and risk.

Last year, we were approached by Transparency International UK to analyse the London Property market — highlighting the power of data in exposing money laundering risks. With the help of our, ICIJ’s and OpenCorporates’ data, and the expertise of Transparency International, we were able to demonstrate that London land and property are a particular target for those looking to launder the proceeds of corruption. We were able to show that data remains one of the most effective weapons in the ongoing fight against corruption.

The objective of the report was to determine land and property in London that is owned by Politically Exposed Persons (PEPs).

From a data science perspective, the main result is the fact that we have found no information on almost half of the companies that we’ve investigated using 4 different datasets. This means, that even the 4% of land titles that we found to be linked to PEPs are likely to be just a small proportion of the real number.

The intelligent use of data that is available is a highly effective tool in the fight against global corruption. Our research has shown the great power that exists in bringing different sources of data together to deliver a holistic picture of risk.

Data and Methodology

The starting point of our analysis is the Land Registry data of land and property in England and Wales owned by foreign companies.

We found 44k land titles (these include land and property) in London owned by 24k overseas companies. Our task was to find as many further connections to these companies, as possible, to get a holistic view of company structures.

Routes to Uncovering further Connections to Companies

About 10% of overseas companies owning property or land in London had been Mossack Fonseca clients, and hence were in Panama Papers. This number is very high for a single offshore firm representing companies. However that still leaves us with 90% of overseas companies owning Land or property in London on which we have no information without additional datasets.

To match company names from the Land Registry to other records, we used the following datasets:

  • The ICIJ’s database: Contains over 500,000 offshore companies, foundations and trusts that were found in the Panama Papers, the Offshore Leaks and the Bahamas Leaks investigations.
  • Thomson Reuters PermID: Open, permanent, and universal identifiers. Used for matching and cross-referencing with Thomson Reuters datasets for additional links.
  • OpenCorporates: The largest open database of companies in the world, containing over 120 million corporate entities from more than 115 jurisdictions.


All three methodologies required — or used — slightly different matching techniques. Company matching through ICIJ’s Panama Papers was done using exact string matching. We removed some commonly occurring suffixes (e.g. Limited, Ltd) — as the different spelling would not have picked up companies (likely) being the same. We have also incorporated country information into the matching to increase precision in our results.

Both PermID matching and OpenCorporates (through OpenRefine, a tool recommended for matching a large number of companies) use algorithms that give an accuracy/confidence score based on how similar the input string is to the match through APIs. Being two separate systems, the underlying nuances differ, but essentially both use similarity metrics to match a company name to an actual entity within their databases. For both of these databases, just as for ICIJ’s Offshore Leaks, we used country information in the matching to increase confidence in the match.

We ran through all of the companies in all three databases, to make sure that connections in any of the three datasets were not ignored. For example, had we stopped once finding two officers of company ‘X’ in ICIJ’s Offshore Leaks, we would have missed the ten other connections that the OpenCorporates matching might have uncovered subsequently, thus greatly weakening our analysis.

The map below shows the local authority breakdown of unknown companies that we were unable to match to any record in any of the three methodologies used. The postcodes from the Land Registry were geo-referenced against the postcode directory from the Office for National Statistics to geospatially locate records within the Greater London area.

In the report, we show that about 4% of land titles whose owning companies we’ve managed to identify are connected to PEPs (Politically Exposed Persons). This still leaves us with almost half of companies on which we were unable to find any information. It is also worth noting, that, of the majority of these, approximately 1,000 land titles are located in high-value areas of London, notably the City of Westminster, the City of London, and Kensington and Chelsea. And fewer than 6% of these had a monetary value associated to them. This missing data makes it difficult to follow and identify suspect illicit wealth, likely to be significant.

The UK has committed to introduce transparency measures in the form of a beneficial ownership registry for overseas companies owning land and property in the UK. This will not reach its true potential in preventing money laundering if the data is not complete and standardised. The lack of unique and consistent identifiers can be just as limiting as the lack of data in any analysis. Only good quality and complete data can deliver a holistic picture of risk.

Read the full report here: http://www.transparency.org.uk/publications/london-property-tr-ti-uk/