Finding Trends and Deforming Texts

In digital humanities, one of the foundational texts is Franco Moretti’s Distant Reading, which was among the first to demonstrate the possibilities of text mining large corpuses of texts.

While the concept of distant reading is increasingly being complicated and debated, Moretti’s initial insight that transforming texts can provide new insights remains fundamental to digital humanities. Part of this debate over distant reading is the question of whether humanists should be able to write their own statistical models or if we can simply rely on processes developed for other disciplines. In the most recent Debates in Digital Humanities, Ben Schmidt argues that while most humanists will not be able to write algorithms or statistical models, they should understand how these applications transform their texts. This assertion sounds straightforward, but testing the parameters of a statistical model can be incredibly laborious, involving multiple excel sheets and datasets.

Before jumping into these more complex models, a great way to start finding patterns and transforming texts is to use Voyant Tools, which is a free data visualization/text mining tool. I used the scatterplot function to help understand clustering within two boxes, and try to understand the contextual meanings of words like Congo or Bandung.

Below you can see the scatterplots, and also try out some of the other features of Voyant Tools with my datasets.

Record Group 84, Egypt US Embassy Records 1959-1961 Unclassified Box 9

Record Group 84, Egypt US Embassy Records 1959-1961 Unclassified Box 10