Data Curation and Collection


  • In this section, I would describe how I collected and curated the data for this project. This could include everything from how I web scraped the listserv archives to how I cleaned and transformed the data. In particular, I would try to detail how I'm meeting this requirement of the project: "some form of hand curated or computationally derived data". In the case of the humanist listserv, I would likely detail how I would used named entity recognition and manual data cleaning to extract technologies and topics from the listserv archives.
  • So I would try to have the following figures and details:
  • Show a sample of the website structure to contextualize and explain how I scraped the website, as well as detailing some of the code for scraping.
  • Show a sample of the initial scraped dataset and detail choices on how to structure the data, including any choices I eventually revised.
  • Introduce named entity recognition and how I used it to extract technologies from the listserv archives. Show some of the pipeline and also any validation or iterations I had to make.
  • Show a sample of the final dataset and how I structured it for analysis, and detail the final documentation choices and discuss how the envisioned audiences who might use these datasets informed those choices.
  • In this section, I would also include, if relevant, a brief discussion of how others have collected and curated data for similar projects. So for example, I might include the following projects/examples:
  • Scholars who have explored Usenet archives, which are early internet discussion forums, and how they have used computational methods to understand these archives.
  • Projects that have used named entity recognition to extract topics from large text corpora, and how they have structured and cleaned these datasets for analysis.
  • And, I would include also any citations or quotations to readings from the course that are relevant:
  • So for example, I might include some references to Data Feminism because that influenced my approach to organizing this data.
  • Finally, I would also include any reflections on how my thinking about data collection and curation process changed through this process.