Skip to content

NyeJones/henslow-topic-modelling-network-analysis

Repository files navigation

Tracing Ideas Across Networks: Topic Modelling and Social Network Analysis with the Letters of John Stevens Henslow

This series of Jupyter Notebooks contains the code for a Cambridge Digital Humanities funded project exploring the use of DH methods in relation to the letters transcribed as part of the John Stevens Henslow Correspondence Project. So far 1,163 transcriptions have been completed, providing new insights into the life and ideas of this key figure in the development of botany, geology and the natural sciences in general. In addition to his own contributions to nineteenth-century science, including the creation of the Cambridge University Botanic Garden at its present site, Henslow was pivotal in the development of Charles Darwin and it was upon his recommendation that Darwin was invited to take part in the voyage of the HMS Beagle.

The project was conceived with the aim of using the transcribed letters of Henslow as a dataset for experimenting with different types of topic modelling, before using the results to perform Social Network Analysis on the resulting topics. One key aim of the project was that it would involve a collaborative approach, working alongside a non-technical academic researcher and making any outputs available to them in an accessible format. For this purpose, the expertise of Professor John Parker, the head of the Henslow Correspondence Project, was drawn upon in analysing results and deciding on the direction of the work as it unfolded.

In terms of topic modelling, we explored the efficacy of established methodologies such as Latent Dirichlet Allocation (LDA) and Term Frequency-Inverse Document Frequency (TFIDF), together with newer packages such as BERTopic, in revealing the underlying topics within the letters. It quickly became apparent to us that BERTopic was producing the most interesting results, so we took the outputs of this forward into the mapping of social networks around the topics. All Social Network Analysis was performed using Palladio, a great online tool for this purpose developed at Stanford University.

The first notebook contains the code for the data cleaning process and some preliminary data analysis to get a general feel for the Henslow correspondence corpus. This is followed by three notebooks on the different types of topic modelling we used and a final notebook on Social Network Analysis.

See Pipfile for requirements.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published