Enfin! Back in January 2017, OpenCorporates quietly started publishing France company data in “beta” mode. We’re now formally announcing it after completing further work to allow us to process daily updates.
It is thanks to the persistence, effort, and hard-work from open data advocates both inside and outside government that French companies are now available as open data. Since January, France’s Institut National de la Statistique et des Études Économiques (INSEE) has published the official “SIRENE” company registration database as open data, and after extensive analysis and processing, we added this data to OpenCorporates. This added over 10 million entities, making it our 120th jurisdiction, and the 2nd largest just behind the UK.
The INSEE dataset is very extensive, covering companies, associations, sole traders / individuals & state bodies, plus all their trading branches, from both mainland France as well as its overseas departments, territories and dependencies (such as French Guiana, Martinique, Reunion, etc). Given that OpenCorporates is primarily a database of legal entities, we have currently excluded the local trading branches as they are out of scope (although we may revisit this in the future, though it would be on a global, not local level). This first cut also (temporarily) excludes the overseas jurisidictions, while we do further investigations and internal mapping, but we will be adding them in due course.
The INSEE dataset from January initially covered just active companies, so we supplemented it with just over 500,000 dissolved companies for 2012-2017 from the open data sourced from Infogreffe (the grouping of chambers of commerce, who actually perform the registration of companies). Combining two such datasets has the potential for causing significant data issues, and so we have performed extensive research on the two, and how they may be combined. Based on this research we found, among other things, that where a company is found in both datasets, the INSEE data tended to be more accurate with regards to the company status, and therefore it has taken precedence over the Infogreffe data for that company.
Since January, INSEE started publishing daily update files which also provide information on newly dissolved entities, and so we don’t need to use the Infogreffe data for dissolved companies moving forward; however, the Infogreffe does include other data of interest, including financials, and so we will be working on including this data in the future.
The main company identifier used in France is the “SIREN” – a 9 digit number (consisting of 8 digits + 1 check-digit). This uniquely identifies registered companies, sole traders and state bodies. This number is issued by INSEE at the point of registration of the entity. The number is never re-used. For sole-traders / individuals, the same number is kept for life.
These are the fields made available in OpenCorporates from this dataset:
- Company Number
- Company Name
- Registered Address – available as Street Address, Locality, Region & Post Code
- Alternative Names (trading, abbreviation)
- Branch Flag – set to “F” for Foreign branches of non-French companies
- Jurisdiction of Origin – set for Foreign branches only
- Company Type
- Incorporation Date
- Dissolution Date – see below
- Industry Codes – codes are in NAF 2008 v2 format, equivalent to EU NACE Rev. 2 codes.
- Restricted for Marketing – set for sole traders when they withdraw their permission for use of their data for commercial prospecting / marketing purposes
- Number of Employees (in a given range)
- 3rd Party Identifiers – France National Associations Register Identifier set where supplied
- Date Retrieved
- Registry URL
Some additional notes regarding the dataset
- Dissolution Date – in France, a company legally ceases to exist as an entity when it decides to do so via a company resolution, accompanied with a dissolution filing made to the company register. It also takes place on the sale of business, by court order, or on the death or retirement of a sole trader. After dissolution, the company is then “struck off” (radiée) from the company register, which can take place several months after legal dissolution. Due to the slight difference in the data files offered, the INSEE-sourced companies show the legal dissolution date, whereas the Infogreffe-sourced companies show the striking-off date. The latter can be clearly identified via the Company Source information and Registry URL.
- Jurisdiction of Origin – unfortunately the SIRENE data incorrectly places Jersey, Guernsey and the Isle of Man in the United Kingdom, and the data does not easily distinguish between Great Britain and its offshore territories of Jersey, Guernsey or Isle of Man, except by using the company address which is not provided in a consistent format. We have correctly identified the vast majority of these, but there are a very small number where it’s not possible to derive the jurisdiction from the address, and these companies will still be tagged as “GB”.
This blog post actually marks the end of the second phase of our plans relating to this dataset, and we have a few more steps planned – we’ll also import legal entity data for the French overseas departments and territories:
- French Guiana
- New Caledonia
- French Polynesia
- Saint Pierre and Miquelon
- Wallis and Futuna
Many thanks to our friends at Etalab for their support and assistance, especially for the invitation to the hackathon they organised in Paris in November 2016 which gave us very useful early insight into the open dataset, and of course to all in the French open data community for fighting so long to help make this happen.
Photo: Félix Potin delivery carriage, circa 1900. Félix Potin is one of France’s oldest grocers, trading since 1844, and still in existence today. Public domain image.