https://github.com/kingsdigitallab/eb-pre
Computing Britannica - Exploratory work
https://github.com/kingsdigitallab/eb-pre
encyclopedia proof-of-concept semantic-search topic-classification
Last synced: about 1 month ago
JSON representation
Computing Britannica - Exploratory work
- Host: GitHub
- URL: https://github.com/kingsdigitallab/eb-pre
- Owner: kingsdigitallab
- Created: 2023-04-25T20:10:00.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2026-04-02T12:30:08.000Z (2 months ago)
- Last Synced: 2026-04-03T01:57:47.474Z (2 months ago)
- Topics: encyclopedia, proof-of-concept, semantic-search, topic-classification
- Language: Jupyter Notebook
- Homepage: https://kingsdigitallab.github.io/eb-pre/
- Size: 130 MB
- Stars: 1
- Watchers: 5
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
[Experimental prototypes](https://kingsdigitallab.github.io/eb-pre/) based on the dataset produced by the [Nineteenth-Century Knowledge Project](https://tu-plogan.github.io/source/c_about.html) led by Peter M. Logan.
[Introduction to this prototype on KDL website](https://kdl.kcl.ac.uk/projects/encyclopedia-britannica-exploratory-prototypes/)
[Documentation](https://github.com/kingsdigitallab/eb-pre/wiki)
## How to reproduce this proof of concept?
To reproduce the POC from this repository and the corpus.
### get the code & data
1. create a new folder poc
2. clone this repository into poc/eb-pre
3. clone [the Encyclopedia repository](https://github.com/TU-plogan/kp-editions) in a separate folder poc/kp-editions
### link the data into the code base
4. `cd poc/eb-pre/data`
5. `ln -s ../../kp-editions`
And remove superseded copies of the encyclopedia entries:
6. `rm -rf kp-editions/eb07/TXT_*/ kp-editions/eb07/XML_*/`
Note that as of 2025Q2, eb07/TXT and /XML will always contain the latest version. Other TXT_* and XML_* folders should be ignored.
However for eb09, the latest (and only) version is currently in TXT_v1 and XML_v1.
## create & activate the python environment
7. `cd poc/eb-pre`
8. `python3 -m venv venv`
9. `source venv/bin/activate`
10. `pip install -U pip`
11. `pip install -r build/requirements.txt`
## (re-)index the entries with linguistic properties
12. `cd poc/eb-pre/tools`
13. `rm ../data/DOMAINS_SET/index.json` # see value for DOMAINS_SET in settings.py
14. `python prep.py`
## (re-)create the embeddings and classify entries into domains
15. `cd poc/eb-pre/tools`
16. `rm ../data/semantic_search/*`
17. `python classify.py`
18. `python compress.py ../data/semantic_search/semantic_search-edition_7-doc2vec-learn-mc_40-ng_1-tm_0.5-ch_sentence.tv2.json 2`
## launch & visit the web application
19. `cd poc/eb-pre/docs`
20. `npm ci`
21. `cd ..`
22. `python3 -m http.server 8000`
23. visit the following URL with your browser: http://localhost:8000/docs/