Our latest jurisdiction to be added to OpenCorporates is Japan. Over 4.4 million companies have been added (making it our 5th largest company dataset), using the Open Data download files made available from the Japanese National Tax Agency. This will be our 119th jurisdiction, and brings our current number of companies in OpenCorporates to well over 115 million!
Prior to 2015, there were no unique identifiers for companies registered in Japan, making it hard for companies operating there to validate who their suppliers or clients were. The National Tax Agency has since assigned a unique Corporate Number to each corporation falling into one of the following three categories:
- Corporations which have been registered for incorporation under provisions of the Companies Act or other laws and regulations;
- National government bodies
- Local government entities, other corporations or associations that are in scope for corporation, consumption or income taxes
Falling into that last category are around 70,000 religious shrine associations (based on records containing “神社” in their name) in the dataset.
Some details of the fields available in OpenCorporates in this dataset:
- Company Number
- Company Name
- Registered Address – available as Street Address, Locality, Region & Post Code
- Incorporation Date – see below for notes
- Dissolution Date – where available
- Company Type – based on the translated type, plus original Japanese – e.g “Stock Company (株式会社)”
- Current Status – see below for notes
- Branch flag – “F” for Foreign branches of non-Japanese companies
- Home Company – set for Foreign branches only.
- Headquarters Address – set for Foreign branches only.
- Registry URL – link back to the original record
- Retrieved Date
Some additional notes regarding the dataset
- The laws regarding company formation permit the same company name to be used by different companies (as long as the registered address is different), so there is a significant number of apparent duplicate companies in the data, with the most common being “八幡神社” (Hachiman Shrine). The company’s registered address can be used to disambiguate between similarly named companies.
- There are three writing scripts used in Japan – Kanji, Hiragana and Katakana. The Company Name can be in a mixture of these as well as the latin alphabet and numbers. We are currently showing the names in their original script(s) where possible. Latin characters are shown in their original “wide” format, matching the original source (e.g “株式会社ＯＳＨＩ・ＦＵＮＤ・ＴＲＵＳＴ“, however this doesn’t affect search if normal width latin characters are used. There are a few techniques available to transliterate Japanese to latin characters (called “Romanization“) which we will explore for a future iteration of this dataset.
- One technical note to add regarding company names and our Unicode support. We use MySQL to store our data, using the “utf8” character set, which supports only Basic Multilingual Plane (BMP) Unicode characters (up to 3 bytes in length). Some of the names provided by the register are 4-bytes in length – they form part of the Supplementary Ideographic Plan character sets – which are not supported by the “utf8” MySQL character set. We are therefore planning to migrate to the “utf8mb4” character set which will allow us to fully support a much wider set of Unicode characters (including emoji 😃). As an interim measure, we are using the official “JIS degeneracy map (Ver.1.0.0)” (zipped .xlsx) and guidance published by the National Tax Agency to map 4-byte characters to known 3-byte variants wherever possible. There are a few companies with name characters with no variants available, and we are using “_” as a temporary replacement character for these.
- The incorporation dates used in OpenCorporates are the dates the Corporate Numbers were assigned by the National Tax Agency. Given that the majority of numbers were assigned between 5th-9th October 2015 to existing companies, it’s a reasonable assumption to make that numbers assigned after this date are for new companies, so we’ve kept those dates only for the numbers assigned on or after the 10th October 2015.
- The vast majority of companies in the dataset are Active – this is because Corporate Numbers weren’t retro-actively assigned to inactive / dissolved companies.
- The current status provided in the source data reflects the last change made to the company record (e.g “Name has changed”), and not its overall status. We are able derive the overall status as “Active” or “Inactive” based on these.
- As the National Tax Agency is publishing daily update files, this dataset will be kept fully up-to-date via daily refresh.
Finally, we’d like to send our friends at Open Knowledge Japan a massive “どうもありがとうございます” – they provided very useful assistance with language translation and helped us get to grips with the corporate registration landscape.
Illustration: “Nihonbashi bridge in Edo” (1830) by Katsushika Hokusai, in public domain.