https://github.com/pseudomanifold/shakespeare

Code and data for extracting co-occurrence networks from Shakespeare's plays
https://github.com/pseudomanifold/shakespeare

co-occurence complex-networks networks shakespeare shakespeare-plays social-network social-network-analysis

Last synced: 4 months ago
JSON representation

Code and data for extracting co-occurrence networks from Shakespeare's plays

Host: GitHub
URL: https://github.com/pseudomanifold/shakespeare
Owner: Pseudomanifold
License: mit
Created: 2016-08-03T11:17:19.000Z (almost 10 years ago)
Default Branch: master
Last Pushed: 2025-01-16T21:40:15.000Z (over 1 year ago)
Last Synced: 2025-03-24T15:47:23.038Z (over 1 year ago)
Topics: co-occurence, complex-networks, networks, shakespeare, shakespeare-plays, social-network, social-network-analysis
Language: Python
Homepage: https://pseudomanifold.github.io/Shakespeare/
Size: 4.81 MB
Stars: 14
Watchers: 2
Forks: 6
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.md

Awesome Lists containing this project

README

This repository contains the code used to extract co-occurrence networks
from a tagged corpus of Shakespeare's plays.

The networks have been analysed using *persistent homology*, a technique
from computational topology. Please refer to our paper

[*Shall I compare thee to a network?* – Visualizing the Topological Structure of Shakespeare's Plays](http://bastian.rieck.me/research/Vis2016.pdf)

for more details.

# Data

* The folder `Corpus` contains the original corpus that was used to
calculate co-occurrence networks. Additional information about the
amount of speech between certain characters has been added. Please
refer to [lexically.net](http://lexically.net/wordsmith/support/shakespeare.html) for the original data.
* The folder `Networks` contains the co-occurrence networks for all the
plays that we used in the paper. Networks are categorized into
*speech-based* and *time-based* filtrations. Please refer to the paper
for more details.
* The folder `Plays` contains the corrected variants of the plays,
sorted into three broad categories.

# Usage

The main script is called `co-occurrence.py`. Given the filename of
a tagged play, it automatically produces a co-occurrence network using
the *speech-based filtration* we described in the paper. The network
will be stored in the current directory. To batch-process all networks
automatically, you could for example use:

find ./Plays/ -name "*.txt" -exec ./co-occurrence.py {} \;

This traverses the folder `Plays` and executes the extraction script for
every file. If you want the *time-based filtration* instead, use the
parameter `-t`, i.e.:

find ./Plays/ -name "*.txt" -exec ./co-occurrence.py {} -t \;

Again, this will result in a set of networks. Note that all existing
networks will be overwritten in the current folder.

# Demo

A [demo](https://pseudomanifold.github.io/Shakespeare) of all the extracted
networks is available. The demo uses a simple force-directed graph
layout to visualize the network.

# Licence

The data and the code is are released under an MIT licence. Please refer
to the file `LICENSE` for more information.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/pseudomanifold/shakespeare

Awesome Lists containing this project

README