A Catalog of History Dissertations, part 2

Part of the detail view for dissertations.

Data on history dissertations from the old days is tricky to deal with. The existing analog catalogs are not always complete and often contain incorrect information. And while university libraries have incorporated these dissertations to their online catalogs, each library has done so with varying degrees of precision, depending on many variables. After considering some alternatives, I have decided to manually check for names and titles when possible (I will write about this in a later post). So then we come to the problem of how to store and make available all the data.

My first option was to use some existing software for digital cataloging or content management. After browsing around, I have found some applications that could do the job reasonably well, even if some customizing was always necessary. I thought Omekawas a bit of an overkill, since I was not planning on digitizing all the content of the dissertations – and even if I was, I would then have to deal with copyright issues (as far as I know, Brazil has no “fair use” doctrine, and even if it had, I don’t believe it would apply to my database). Other library catalog and institutional repository applications, such as D-Space and others, would be cumbersome to customize for some extra information I would like to have included in the database.

Then I considered building a system myself, containing just what I needed for the project and no extra bells and whistles. As I had some knowledge of Python, I followed the suggestion of a friend and started learning Django. Django is a robust web framework that enables quick and relatively easy development and deployment for web applications. The basics were pretty straightforward. The learning curve is smooth and the community is very responsive, so most doubts can be solved with a quick Google/Stack Overflow search. For the database itself, I went with MySQL, which integrates well with Django. PostgreSQL remains an option for deployment, as there are some extra functionalities which might be useful in the future.

There is a third way that I have not entirely discarded, but that will remain in the “might do if I have time to” pile, is messing with node.js – specifically, separating the back and front end of the digital catalog. During the last few months I have learned the basics of RESTful APIs using JavaScript, but I had no previous experience with JS… So there is this extra learning that has to be taken into account.

All in all, programming is definitely not the hardest part of the project. That might be so because I am a historian who had some programming experience in his teens. Nevertheless, decisions over what metadata to include, how to normalize names and titles, or how to structure the tagging system, all of these are questions with far greater weight on the probability of success of the catalog.