My current postdoc project consists in building a large database with (meta)data for all History PhD and MA dissertations defended in Brazil from 1942 to 2000. It has been a while since the last one was published, and the data from the Ministry of Education are inconsistent (to say the least) for entries older than the 2000’s. As a historian of historiography, this inconsistency has made me waste much time trying to cross-check information about specific dissertations – and I know many colleagues who had to do all the same things for their own works. So I decided to bring everything together and build this tool. It should gather important library metadata for all items (following Dublin Core specifications, except for the abstract, for reasons of time and budget constraints) but also implement some features specific for the target-audience – other historians of historiography.
There are two country-wide catalogs in print. One was published in 1985 and spans from 1973 to 1984; the other was published in 1995 and spans from 1984 to 1994. The information they collected is inconsistent, though, and many entries are missing. Many Graduate Programs have published their own catalogs too, some in print, some on their websites. But as those are filled by different people, and people have this thing of (1) not following input standards and (2) not caring for incomplete data, I noticed these catalogs vary greatly in terms of quality. I have managed to acquire most of the print stuff and saved locally all the online ones.
After getting all the available catalogs, I started inputting the information to a Excel spreadsheet, just to check for inconsistencies (different spellings, dates, etc.) and missing data. Right now I have over 3000 entries. Since then I have tried to normalize the names of individuals by developing an authority control (some of the individuals are harder to find than others), and now I am in the process of visiting the university libraries that host the texts and checking all the information with the works themselves. However, a major difficulty is that recording the thesis committee and the defense date was not mandatory until recently. So I have some detective work going forward for the older works.
My next post will be about the technological aspect of the project: the website, the database, and the analytics.