https://github.com/centre-for-humanities-computing/memo-canonical-novels
https://github.com/centre-for-humanities-computing/memo-canonical-novels
Last synced: 11 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/centre-for-humanities-computing/memo-canonical-novels
- Owner: centre-for-humanities-computing
- License: mit
- Created: 2024-08-07T13:32:11.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-11-14T19:05:29.000Z (over 1 year ago)
- Last Synced: 2025-03-20T09:49:53.677Z (about 1 year ago)
- Language: Jupyter Notebook
- Size: 15.1 MB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# memo-canonical-novels 📚
[](https://cookiecutter-data-science.drivendata.org/)
[](https://aclanthology.org/2024.nlp4dh-1.14.pdf)
###
This repository contains code for embeddings, plots and results for our paper:
"Canonical Status and Literary Influence: A Comparative Study of Danish Novels from the Modern Breakthrough (1870–1900)" presented at NLP4DH at EMNLP 2024.
## Useful directions 📌
Some useful directions:
- `memo_canonical_novels/` the main folder contains the source code for the project, here you will find the makefile to create embeddings
- `notebooks/` contains the notebooks used for the analysis, `analysis.py` is the main notebook, `tfidf_comparison.py` is the notebook used to compare the embeddings with tf-idf. Other notebooks contain sanity checks.
- `figures/` contains the figures generated by the notebooks
- `data/` contains saved embeddings (.json) used for the analysis (and will contain generated embeddings if you generate them)
## Data & paper 📝
The dataset used is available at [huggingface](https://huggingface.co/datasets/MiMe-MeMo/Corpus-v1.1)
Please cite our [paper](https://aclanthology.org/2024.nlp4dh-1.14.pdf) if you use the code or the embeddings:
```
@inproceedings{feldkamp-etal-2024-canonical,
title = "Canonical Status and Literary Influence: A Comparative Study of {D}anish Novels from the Modern Breakthrough (1870{--}1900)",
author = "Feldkamp, Pascale and
Lassche, Alie and
Kostkan, Jan and
Kardos, M{\'a}rton and
Enevoldsen, Kenneth and
Baunvig, Katrine and
Nielbo, Kristoffer",
editor = {H{\"a}m{\"a}l{\"a}inen, Mika and
{\"O}hman, Emily and
Miyagawa, So and
Alnajjar, Khalid and
Bizzoni, Yuri},
booktitle = "Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities",
month = nov,
year = "2024",
address = "Miami, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.nlp4dh-1.14",
pages = "140--155"
}
```
## Project Organization 🏗️
```
├── LICENSE <- Open-source license if one is chosen
├── Makefile <- Makefile with convenience commands like `make data` or `make train`
├── README.md <- The top-level README for developers using this project.
├── data
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
│
├── notebooks <- Jupyter notebooks.
│
├── pyproject.toml <- Project configuration file with package metadata for
│ memo_canonical_novels and configuration for tools like black
│
├── figures <- Generated graphics and figures to be used in reporting
│
├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
│ generated with `pip freeze > requirements.txt`
│
├── setup.cfg <- Configuration file for flake8
│
└── src <- Source code for use in this project, making embeddings.
│
├── __init__.py <- Makes memo_canonical_novels a Python module
│
├── config.py <- Store useful variables and configuration
│
├── dataset.py <- Scripts to download or generate data
│
├── features.py <- Code to create features for modeling
│
├── modeling
│ ├── __init__.py
│ ├── predict.py <- Code to run model inference with trained models
│ └── train.py <- Code to train models
└── pooling.py <- Code to create average embeddings from raw embeddings
│
└── plots.py <- Code to create visualizations
```
--------