https://github.com/pseudomanifold/shakespeare
Code and data for extracting co-occurrence networks from Shakespeare's plays
https://github.com/pseudomanifold/shakespeare
co-occurence complex-networks networks shakespeare shakespeare-plays social-network social-network-analysis
Last synced: 4 months ago
JSON representation
Code and data for extracting co-occurrence networks from Shakespeare's plays
- Host: GitHub
- URL: https://github.com/pseudomanifold/shakespeare
- Owner: Pseudomanifold
- License: mit
- Created: 2016-08-03T11:17:19.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2025-01-16T21:40:15.000Z (over 1 year ago)
- Last Synced: 2025-03-24T15:47:23.038Z (over 1 year ago)
- Topics: co-occurence, complex-networks, networks, shakespeare, shakespeare-plays, social-network, social-network-analysis
- Language: Python
- Homepage: https://pseudomanifold.github.io/Shakespeare/
- Size: 4.81 MB
- Stars: 14
- Watchers: 2
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
This repository contains the code used to extract co-occurrence networks
from a tagged corpus of Shakespeare's plays.
The networks have been analysed using *persistent homology*, a technique
from computational topology. Please refer to our paper
[*Shall I compare thee to a network?* – Visualizing the Topological Structure of Shakespeare's Plays](http://bastian.rieck.me/research/Vis2016.pdf)
for more details.
# Data
* The folder `Corpus` contains the original corpus that was used to
calculate co-occurrence networks. Additional information about the
amount of speech between certain characters has been added. Please
refer to [lexically.net](http://lexically.net/wordsmith/support/shakespeare.html) for the original data.
* The folder `Networks` contains the co-occurrence networks for all the
plays that we used in the paper. Networks are categorized into
*speech-based* and *time-based* filtrations. Please refer to the paper
for more details.
* The folder `Plays` contains the corrected variants of the plays,
sorted into three broad categories.
# Usage
The main script is called `co-occurrence.py`. Given the filename of
a tagged play, it automatically produces a co-occurrence network using
the *speech-based filtration* we described in the paper. The network
will be stored in the current directory. To batch-process all networks
automatically, you could for example use:
find ./Plays/ -name "*.txt" -exec ./co-occurrence.py {} \;
This traverses the folder `Plays` and executes the extraction script for
every file. If you want the *time-based filtration* instead, use the
parameter `-t`, i.e.:
find ./Plays/ -name "*.txt" -exec ./co-occurrence.py {} -t \;
Again, this will result in a set of networks. Note that all existing
networks will be overwritten in the current folder.
# Demo
A [demo](https://pseudomanifold.github.io/Shakespeare) of all the extracted
networks is available. The demo uses a simple force-directed graph
layout to visualize the network.
# Licence
The data and the code is are released under an MIT licence. Please refer
to the file `LICENSE` for more information.