Digital humanities is at the interface between humanistic studies and computational power. Hence, when building tools in this space, we need to consider, firstly, the humanities scholar who desires an accessible and flexible data analysis tool, and secondly, the developer who would prefer efficiency in creating the tool. Broadly, in this project, we redefine "flexibility" and "efficiency" in the space of digital humanities tools: “flexibility” as flexibility of code adaptation via code-to-feature correspondence, and “efficiency” as efficiency of deployment via intentional choice of development/deployment platform. Concretely, we (1) create a toolkit that uses statistical methods (word density, lexical density dispersion plot, etc) and machine learning (Word2Vec, Latent Dirichlet Allocation) to distill historical trends and anomalies from Chronicling America, a dataset of American newspapers ranging from the 1700s to 1900s, and (2) identify Google Colab as an “enabler platform” that enables flexibility and efficiency in the tool development process, as defined in our project. In redefining “flexibility” and “efficiency” and producing our own data analysis tool for Chronicling America, our work sets forth a new lens through which we can evaluate digital humanities tools and allows historians to unveil novel historical anomalies and trends that help teach us about ourselves and the potential future.
- Tags
-