This poster presents a technical report and a method for corpus expansion in the humanities, with an application to early modern philosophy, alongside a case study of dealing with heavy data redundancy in several Latin, English, and French title corpora. It enlarges on the steps taken during the initial stages of a data-intensive research project that aims to go beyond established writers and views in natural philosophy between 1600 and 1800 and it reflects on the collaboration between a humanist and a data scientist with respect to web-scraping and redundant multilingual data taming in Python.