Big Data and the Cold War

Big Data and the Cold War are two things that seem like they should go together. Both are big unwieldy entities. And both seem to drive scholars a bit batty. But with the exception of a few digital history projects, I haven’t really ever come across the two together.1

This all changed the past month, when big data and Cold War history did sort of come together for me as I was able to attend two separate conferences on each topic.[^2] Even though the two conferences were quite different in format and subject, I wanted to write and think about them together - kind of a forced engagement - even if it’s just in my head, and now on my website. I’ve written down some of the major questions/problems/shifts that were discussed at these conferences, but what follows is in no way a faithful representation of these conferences (sorry, you’ll have to try and attend them next year). Rather, think of this post as my conference doodles come to life - incoherent and shoddily sketched, but hopefully the beginning of something much more.

I still wanted to write a bit about each conference so I’ve created little subheadings. Feel free to jump ahead if you just want my ideas on the intersections between big data and Cold War history.

Jump to Making Big Data Human | European Summer School on Cold War History | Big Data and Cold War history


The first conference was one I found on twitter, called Making Big Data Human. [^2]

View from the conference

Luckily, someone canceled last minute so I was able to attend as a participant and head to Cambridge for the first time. The conference was hosted by a great group called Doing History in Public. The conference was a day long and covered a whole host of topics - from web archiving to database and keyword searches to human geography and visualization. Marta Musso storified the twitter exchange here, which gives a good live feed of the conference, as well as some of my thoughts that I won’t rehash here.

So here are the (my) big thoughts from the conference:

1. How do we deal with inconsistency and uncertainty in big data?

In the opening keynote, Jane Winters talked about the problems inherent to the patchiness of data collection, especially with respect to web archives. How do we know what’s missing in massive data set or the degree of accuracy in the data collection? Compounding this difficulty is the reliance of big data on keyword searching, algorithms, and curated web archives, which can be problematic when you consider how these entities may fail to show the gaps in data and flatten complexities though the presentation of materials (i.e. think a list of google search hits versus searching in physical archives). I don’t think this problem is actually an obstacle, but rather an opportunity to reconsider how we think and talk about our data in the humanities, allowing for some space between object and interpretation.

We also had a great discussion about the limitations of databases and queries. I think these discussions are signs of a growing scholarly critique of how data has been and continues to be collected and organized. Yet, we still have a long way to go for how we incorporate the reality of the patchiness of data into our scholarly research and interpretations. For example, we all know archives are abstractions but more often than not we hide many of the limitations of our studies in the footnotes. This approach is less productive in the context of big data, which requires fairly explicit discussion regarding the dataset’s specific parameters. Hopefully, the broader scope of evidence available through “big data” can help historians become a bit more honest about their data and analyses.

2. How do we identity and work with institutional structures and overcome institutional obstacles to big data in the humanities?

A secondary and largely connected theme was that big data projects require] collaborations across disciplines, career stages, and institutions. The conference was great in actually bringing together scholars from across the sciences and the humanities. But it can be difficult to find collaborators in other departments. Furthermore, collaborating raises questions about the intellectual ownership of these digital projects, particularly for humanists who cannot program. In the UK and the EU more so than in the US, there’s a tendency to fund large scale big data initiatives, which is great for the scope of the project, but poses challenges regarding the division of labor between a single Primary Investigator and many postdocs and students. How this will be resolved remains to be seen, but clearly some of the best collaborations are happening on campuses were some type of infrastructure exists to encourage and facilitate cross-disciplinary collaborations.

3. How do we define big data?

Lastly, we often returned to the theme of definitions and nomenclature in big data. How do we define big data? Is it merely a large data set? Or is a set of tools? Or even more profoundly, a new methodological and intellectual approach for the humanities? At the end of the conference, we still hadn’t come to one definitive answer, and like most things in the humanities, I doubt we ever will (which I actually think is a good thing).

Many of the speakers and participants in the conference almost immediately started slipping between big data and digital history, which for me raised questions of the overlap between the two. There was also a fairly strong critique over whether big data was anything new or simply the latest iteration of the computational approach first pioneered in the 60s and 70s. Personally, I believe that the changes in web technology and the ability to store and manipulate data has fundamentally altered the potential for big data and digital history, but placing this shift within the context of the early computing revolution is important for not overly exaggerating our current potential.

Connected to this question of definition is also a clear need for some way to evaluate the quality of big data and digital history projects. With the AHA releasing its guidelines this summer, clearly the profession is moving in that direction. However, at the conference, many of the critiques revolved around the absence of applying the same scholarly critiques to big data that are applied in more traditional scholarship. Bridging this gap is going to be tough, but increasingly departments are encouraging students and faculty to experiment with digital projects, which will hopefully lead to a more widespread engagement with big data across the humanities.


With these questions in mind, I jetted off to Rome for the European Summer School on Cold War History . Organized by a consortium of European Universities, this year’s hosts were Università Roma Tre e Università Roma Tor Vergata. At the conference, I presented a paper on the impact of the Congo Crisis on Cairo, which I’ll hopefully blog about at a later date. I unfortunately didn’t tweet at all during the conference (I’m not sure anyone did actually), so I don’t have an overview of all the papers or discussions.

The conference itself was at the Societa Geografia Italiana, which is in a gorgeous old italian villa in Rome.

View from the conference

View from the conference View from the Conference

Given that the Cold War spanned a huge swath of time, the papers at the conference covered a wide variety of topics. I won’t cover each one here, but just touch on some broader questions/topics.

1. How do we define the Cold War?

Quite a bit of ink has been spilt on this question, but I was struck by how little we actually discussed the Cold War as an entity, given that this conference addressed the topic (which I actually think is a positive development, instead of getting stuck in circular arguments). I think what the Cold War means really depends on your particular research question. For my work, the Cold War is both a temporal marker and also a force that limits the space for third world solidarities. For others, the Cold War is a historical object that was constructed or a historical actor that shaped cultural identities. I think this multiplicity is a sign of the richness of the scholarship, but the conference did make me think more deeply about what the Cold War means for my work, and how a Cold War lens can illuminate different dynamics.

2. Rise of spatial and intellectual history lenses

I was also quite pleased to see a number of scholars using spatial and intellectual history lenses, though often implicitly. I think that with the plethora of sources available to modern historians, we sometimes are a little more lazy with respect to the theoretical and analytical framing of our research. Yet, at the conference many of the papers engaged with these analytics, with a focus on spaces like ports and islands, as well as an emphasis on tracing intellectual histories of expertise and discourses. I’m always struck by the “emergent-ness” of scholarship around a particular lens. Did everyone just hear about these topics in their graduate seminars, or are we all truly the product of our times? Suppose the answer is probably a bit of both. I hope that this trend continues in part selfishly, because it’s the type of work I find most interesting, but also because I think it will truly open up new fruitful subfields of scholarship.

3. Pushback on transnational as a framework

Although the term “transnational” was thrown around rather liberally at the conference, on the whole, most of the papers actually focused on local histories within internationalized dynamics, rather than transnational movements. While some stories are inherently transnational (international organizations for example), I think qualifying and limiting the use of transnational is important. Personally, I often feel that transnational histories tend to skew towards a neoliberal imaging of movement, which often ignores power dynamics and states, as well as ignoring the instrumentality of international solidarities/events for local spaces. In simpler terms, very few topics are truly transnational whereas I think most historians are actually working on local spaces that become internationalized, even if they’re studying social movements or the movements of goods. So I was quite pleased that even though I was at a conference on something profoundly global like the Cold War, the term transnational was not thrown around as liberally or loosely as I’ve seen in the past.


So now coming back to this question of how these two incredibly broad topics relate to one another.

The most obvious overlap is the corpus of materials from the Cold War, and how big data projects might help us understand these archives. This idea is increasingly becoming a reality with the use of digital cameras. However, unlike 19th century historians, Cold War historians still have to contend with copyright laws. In my research, this reality produces counterintuitive emphasis on declassified government documents over press materials, which are still under copyright law.

Big data methods also open the opportunity for truly global histories of the Cold War, using a scale of materials beyond the capacity of one researcher to synthesize. Many of the papers at the ESSCWH used multinational and multilingual sources. However, many of the applications for OCR and text mining privilege romantic languages, which at least for now limits and could potentially skew these types of projects. I’m still struggling with OCR on Arabic script, which makes it difficult to see large-scale patterns in the Arabic daily newspapers I work with.

Excepting these limits, big data provides incredible opportunities to visualize narratives in new ways, as well as to trace patterns over time and space. Ultimately, I think big data will help historians of the Cold War be more honest about the gaps in our data and our archives - an important and necessary shift for historians.

I also think that thinking about the Cold War can make us more critical about big data. First and foremost, much of higher education was shaped during the Cold War, especially disciplinary and departmental divisions. Historicizing these divisions is critical for opening up spaces for collaborations. At the moment most historians using big data are either using previously digitized or digitally born materials as the basis for their corpus. However, as OCR and cloud services improve, the potential for big data from text sources will increasingly become a reality. Most humanities scholars, and especially historians, are not equipped with the skills to work with these resources and so finding collaborators in the social and computer sciences is going to become critical.

Thinking about these questions together has, for me at least, started the wheels turning about the obstacles of copyright, collaboration, and data collection that I think has prohibited Cold War historians from adopting more of the the digital history/big data methods employed by historians of other time periods. I hope that in the future, digital historians will more clearly illustrate the overlap between these two topics. Until then, I guess I’ll have to keep thinking about them, and hope you do too.

-Z


Jump back to top

  1. When I say a few, I’m really thinking of two in particular: Micki Kaufman’s Quantifying Kissinger and Matthew Connelly’s History Lab. If you know of any please let me know! 




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Programming Over Projects: Teaching Machine Learning for Humanities at an iSchool
  • Why an iSchool
  • Guide to Publishing Humanities Data Analysis
  • NLP and DH
  • Generalizing Static Sites