BigGorilla
BigGorilla is open-source components for data integration and preparation, which began in 2016 jointly by Recruit and University of Wisconsin at Madison. It documents existing technologies and our original technologies to solve the problem. I created a couple of components of BigGorilla, and evangelized them. I also applied these technologies into 8 companies within Recruit, and showed that BigGorilla is effective across the company’s diverse range of businesses: the extraction of store names (or person names and location information) from unstructured data, merging of lists from multiple data sources, etc. For example, with BigGorilla, we obtained 98.9% accuracy on the task of de-duplicating approximately 10,000 store names (Here is the press release at that time).