https://github.com/poseidon-framework/paper-directory
https://github.com/poseidon-framework/paper-directory
Last synced: 1 day ago
JSON representation
- Host: GitHub
- URL: https://github.com/poseidon-framework/paper-directory
- Owner: poseidon-framework
- Created: 2024-11-06T21:37:22.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2026-04-02T06:16:42.000Z (4 days ago)
- Last Synced: 2026-04-02T19:57:16.150Z (3 days ago)
- Language: Python
- Homepage: https://www.poseidon-adna.org/paper-directory
- Size: 347 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
## π‘ What does this script do?
This script reads a list of DOIs from `list.txt`, fetches metadata from **CrossRef API**, and checks if those papers exist in **Poseidon Archives** (`community-archive`, `aadr-archive`, `minotaur-archive`). It then generates an **HTML table (`index.html`)** displaying:
β Paper title
β Publication year & exact date
β First authorβs name
β Journal name
β Availability in Poseidon archives (β or β)
β A **search bar** for filtering by title
β **Dropdown filters** for the archives
Every time `list.txt` is updated and a commit is pushed, this script runs and updates `index.html` on GitHub Pages.
---
## βοΈ Technical Requirements
- **Python Version:** Python 3.x
- **Required Libraries:**
```bash
pip install requests Jinja2
```
- **Files:**
- `list.txt` β List of DOIs (one per line)
- `base_script.py` β The main script
- `index.html` β The generated output file
---
## π How the Functions Work
### 1οΈβ£ `get_crossref_metadata(doi, index, total)`
Fetches metadata from CrossRef API.
Extracts **title, year, journal, date, first authorβs name**.
Formats publication date into **YYYY-MM-DD**.
Prints **progress updates** like:
```
(1 / 100) Querying metadata for 10.1002/ajpa.23312
```
### 2οΈβ£ `fetch_poseidon_bibliography(archive_name)`
Calls Poseidon API to check available DOIs for a given archive.
Extracts **DOI list** from `community-archive`, `aadr-archive`, and `minotaur-archive`.
Prints **status messages** while fetching:
```
Fetching DOI data from community-archive...
```
### 3οΈβ£ `load_poseidon_doi_map()`
Collects all available DOIs from **all Poseidon archives**.
Stores data in a dictionary mapping **DOIs β available archives**.
### 4οΈβ£ `preprocess_doi(doi)`
Cleans up DOI format by removing extra spaces & "https://doi.org/".
### 5οΈβ£ `check_for_duplicates(dois)`
Checks `list.txt` for duplicate DOIs.
If duplicates are found, it **prints a warning**:
```
WARNING: Duplicate DOIs found:
- 10.1002/ajpa.23312
```
### 6οΈβ£ `generate_html(papers, output_file)`
Creates **index.html** using a **Jinja2 template**.
Adds search bar to filter by **title**.
Adds dropdown filters to show/hide papers based on Poseidon archive availability.
Formats clickable DOI links like this:
```
10.1002/ajpa.23312
```
Prints progress while updating:
```
Updating index.html...
index.html successfully updated!
```
---
## π How to Run the Script
1. Add **DOIs** to `list.txt` (one per line).
2. Run the script:
```bash
python base_script.py
```
3. Open `index.html` to see the results!
---
This is a **fully automated workflow** that updates the table and deploys it to **GitHub Pages** whenever `input.txt` changes.
**GitHub Actions Workflow** runs everything behind the scenes. No manual updates needed!
## Testing locally
To run the script locally, you can try `python3 base_script.py`. Likely you will be required to first install libraries `requests` and `jinja2`. You can do that by creating a virtual environment, for example:
```{bash}
python3 -m venv ~/venv/paper-directory
source ~/venvs/paper-directory/bin/activate
python3 -m pip install requests
python3 -m pip install jinja2
```
Then `python3 base_script.py` should generate the page.
You can then run a test server:
`python3 -m http.server --directory docs 8000`
and open `http://localhost:8000` in your browser.